-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Stage 2 in Julia #348
Comments
Just to add a little more information about the replicability issue, @chusloj and I have been going back and forth about how the @chusloj did some comparisons and it seems like the differences between files are very small, but they're different nonetheless. Given all of this and the fact that the EDIT: the file differences were small in the sense that only a small percentage of records were affected, but those that were often had large differences between them. |
I have been running As I watched This led me to look at the code in If this is really what is happening, I am not sure it is the best approach either from a conceptual standpoint or from the standpoint of computer effort. From a conceptual perspective, it seems to me like we would generally expect the distribution of returns to be more like that of the prior year than that of many years ago, and that we might not be surprised if some kinds of returns had far lower weights in later years than earlier years. That is probably why the objective function gets so much larger in later years: we have to change weights very much from the initial values. That suggests to me that we might rather penalize changes from the immediately prior year than the initial year. Doing that would no doubt make solution easier. A lot of hard work might be done solving for weights that achieve distributional targets in 2012, and once those are hit, targets for each later year might be relatively close to the previous year. On the other hand, some individual interim years might be oddballs and we might not necessarily want to penalize changes from such a year. For example, a year in which deductions were accelerated by high-income taxpayers might be an oddball year, and we might not necessarily want to penalize large changes in the next year vs. this oddball year - but whether that means it is best to penalize changes from the initial year of 2011 is an open question. Perhaps this was behind JCT's thinking. If you look at pages 52-55 of the following document:
you will see that they penalize differences from both the initial-year weights and the previous-year weights, downweighting the importance of the initial year as later years are solved, and upweighting the importance of the previous year. Whether that is worth the extra effort, I don't know. It might be worth talking to someone at JCT about this. There are a few other issues you might want to consider:
Anyway, those are my thoughts based on trying to run taxdata today. Don |
There is a replicability issue using the
CVXOPT
solver in Python to calculate the PUF and CPS weights - themd5
hash for the weight file changes every time the solver is run. Because most of the commonly used LP solvers do not have clean APIs for Python, thestage2
andsolve_lp_for_year
scripts should be re-written in Julia. The language has a clean optimization & modeling interface calledJuMP
that can be used for anyLPoptimization model that has a Julia implementation.There are a few reasons why using Julia would be advantageous to Python for the solver stage:
pandas
interface, so refactoring the data processing portions of the code shouldn't be too time consuming.JuMP
, any new solver can be used with the same code becauseJuMP
is model-agnostic.Coding the Julia version of the code should take substantially less time than finding the
md5
replicability bug.The text was updated successfully, but these errors were encountered: