Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download satellite imagery #5

Closed
luckystarufo opened this issue Jul 1, 2020 · 2 comments
Closed

download satellite imagery #5

luckystarufo opened this issue Jul 1, 2020 · 2 comments

Comments

@luckystarufo
Copy link

Thanks for your nice work and the updates!

Two questions:

  1. The original repo (from Neal) seems to download the satellite imagery using the Google Map Static API, in which they set the 'zoom' parameter as 16. Is there a correspondence on planet API? (i.e. how do we ensure we are downloading the same image sets?)

  2. The reproduced R^2 is significantly lower than the ones reported in the original work. This happens even if I change your code to the ways that they compute the R^2 (i.e. metrics.r2_score --> spicy.stats.pearsonr()[0]**2). Though I realize you are using data sets from different years, any other ideas of why that's happening?

Thanks!

[A side note: from https://developers.google.com/maps/billing/gmp-billing, it looks like the Google Map Static API is NOT free even for the first 100K images now?]

@jmathur25
Copy link
Owner

jmathur25 commented Jul 1, 2020

Hey again. To address your questions:

  1. We are actually downloading different images than the original script. This is the original image download script: https://github.com/nealjean/predicting-poverty/blob/master/scripts/get_image_download_locations.py#L13. The method I use generates the same bounding box but is more generalized/consistent with image choice within that box. An older version of this repo used that original function, but I figured the new way was better. As for correspondence with the Planet API, I manually played around with zoom levels and I found anything above zoom=14 was too low quality. zoom=16 and Google's images are higher quality and resolution. But in my tests, I found switching to Planet did not make a huge difference.

  2. The original paper is reporting the R2 values on the log expenditures. See the table in the README for replication comparison. Otherwise, there are some small differences between the paper and this reproduction that I've summarized below:

  • metrics.r2_score --> stats.pearsonr()[0]**2 in the evaluate_fold and find_best_alpha functions inside utils/ridge_training.py. If I do that, the Malawi r2 goes from 0.26 to 0.29 and the Nigeria r2 from 0.19 to 0.22 (these are the two countries shared with the replication; note this is on directly predicting the expenditures vs the log). Not a huge change, but also more than one might expect.

  • We use different years than the original paper. The data distribution does seem different over the years for these two countries (just by looking at the graph in papers/jean_et_al.pdf vs my own), so that could be anywhere from a small to a huge factor.

  • Some details of the training procedure / nightlights filtering procedure are not available in code. They were somewhat described in papers/aaai16.pdf, but the results discussed in that more detailed paper are different and I couldn't find more explanation.

  • The model I initialize is slightly different (they use a fully convolutional one).

  • Difference in preprocessing. The LSMS survey documentation for Malawi calls "rexpagg" and "rexpaggpc" both per capita consumptions, but as indicated by the name, "rexpaggpc" is actually per capita and "rexpagg" is per household. To compute the consumption per capita in a cluster, you need to sum (not average) the consumptions per household, then divide by the total number of people surveyed in the cluster. Jean et al. does this differently by averaging the households instead of summing Also, they use an adult equivalent adjustment whereas I do per capita.

Given all of this, once I got the results I did here I figured it was "close enough". As for the last point, dang that really sucks. That is definitely a recent change. Being able to do this almost instantly at very little cost was huge. :(

@luckystarufo
Copy link
Author

Thanks so much for the detailed explanations, now I kind of see how it goes ... these are very insightful notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants