Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: finalprep.py on Windows 10 64 results in Runtime Warning: overflow encountered in exp - should that worry me!? #352

Closed
donboyd5 opened this issue Aug 13, 2020 · 2 comments

Comments

@donboyd5
Copy link

Newcomer trying to run taxdata puf-files from scratch on Windows 10 machine in bash command window.

Trying to run it piece by piece based on my reading of Makefile.

Full workflow described in a minute, but bottom line is that when running python finalprep.py I encountered this warning:

C:\Users\donbo.conda\envs\taxdata-dev\lib\site-packages\statsmodels\discrete\discrete_model.py:1747: RuntimeWarning: overflow encountered in exp
return 1/(1+np.exp(-X))

Entire workflow and messages copied further below.

My full workflow on Windows 10 Pro 64-bit machine, AMD FX-8350 8-core processor 32 gb RAM:

  • copy puf2011.csv to ../taxdata/puf_data/StatMatch/Matching
  • copy asec2016_pubuse_v3.dat to ../taxdata/puf_data/StatMatch/Matching
  • cd taxdata; python runmatch.py
    -- produces cpsmar2016.csv without warnings after 5 hours (!?); file looks sensible in a text editor
    -- produces cps-matched-puf.csv without warnings after another 2 hours; file looks sensible in a text editor
  • cd puf_data; python finalprep.py
    -- produces puf.csv with the overflow warning noted above after just a few minutes; file looks sensible in a text editor; 89 columns, 248,592 rows (including header)

I have not yet completed any additional steps.

My questions:

  • should I be concerned about the overflow warning message?
  • if so, any advice on how to fix it?
  • or how to investigate further for fields or values that are probably the cause?
  • (and does 5 hours seem reasonable for producing cpsmar2016.csv from asec2016_pubuse_v3.dat?)

Many thanks! Full bash window contents below.

Don

donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data/StatMatch/Matching (master)
$ python runmatch.py
100%|##########| 360194/360194 [5:08:11<00:00, 19.48it/s]
100%|##########| 69484/69484 [1:58:30<00:00, 9.77it/s]
100%|##########| 23/23 [00:04<00:00, 5.27it/s]
Converting .DAT to .CSV
Creating Records
Exporting Data
Reading PUF Data
Creating CPS Tax Units
CPS Tax Units Created
Adjustment Complete
Start Phase One
Start Phase Two
Creating final file
(taxdata-dev)
donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data/StatMatch/Matching (master)
$ cd puf_data
bash: cd: puf_data: No such file or directory
(taxdata-dev)
donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data/StatMatch/Matching (master)
$ cd ..
(taxdata-dev)
donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data/StatMatch (master)
$ cd ..
(taxdata-dev)
donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data (master)
$ python finalprep.py
C:\Users\donbo.conda\envs\taxdata-dev\lib\site-packages\statsmodels\discrete\discrete_model.py:1747: RuntimeWarning: overflow encountered in exp
return 1/(1+np.exp(-X))
(taxdata-dev)
donbo@Don-Business MINGW64 ~/Documents/GitHub/taxdata/puf_data (master)

@andersonfrailey
Copy link
Collaborator

@donboyd5 no need to worry about the overflow warning. It's never taken me five hours to create cpsmar2016.py, but the bash contents you shared looks good. On my machine (a mac running macOS 10.15.4) it takes about 45 minutes. But I'm working on some refactors that should shorten that run time for everyone.

@donboyd5
Copy link
Author

Thanks, @andersonfrailey. Everything looks good to me after I created final files. I'm thinking maybe I was writing the files to a folder that was synced with Dropbox or Google Drive and that may have slowed it down. I'll figure it out the next time I run taxdata.

Don

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants