Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data preprocessing #5

Open
iokanyalcin opened this issue May 7, 2021 · 1 comment
Open

data preprocessing #5

iokanyalcin opened this issue May 7, 2021 · 1 comment

Comments

@iokanyalcin
Copy link

iokanyalcin commented May 7, 2021

Hello, I am trying to run the project but i have encountered several issues Especially in preprocessing part.

After finish all the steps downloading part and export the Earth Engine data to google cloud storage i go to process_tfrecords notebook the main issue here is my exported earth engine file names in format like this: {country_name}{year_range}.tfrecord.gz
But in notebook process_tfrecords_dhs.ipynb name should be in this type : /lx_median
{year_range}_{country}_dhslocs_ee_export.tfrecord.gz

I have change the name format and moved on but last part (Process TFRecords) none of the run processing functions are working i am getting error like:

  • list index out of range
  • There is no such file:

for instance angola's data stored as angola2011_xx.tfrecord.gz to angola2015_xx.tfrecord.gz in cloud storage. But notebook tries to find angola2009-11.tfrecord.gz

  • Cluster index not foud in tfds file: in REQUEIRED_KEYS list there is "cluster index" but some of my tfds files not inclues this.

I couldn't figure out where is the mistake or did i miss a step to create lx_median_{year_range}_{country}_dhslocs_ee_export.tfrecord.gz Can you please explain and help about this issue ?
Thanks

Edit:
I am inspecting the code most probably issues happens due to lacking of cluster indexes in tfrecord files.
And maybe i should concatenate the tfrecords files.

@chrisyeh96
Copy link
Collaborator

chrisyeh96 commented May 9, 2021

Hi, repo author here. I apologize for these data preprocessing issues, which are known. I am working on creating an updated data preprocessing pipeline. See the chrisyeh96/africa_poverty_clean repo for the latest preprocessing pipeline, which should resolve your issue.

Once chrisyeh96/africa_poverty_clean is fully ready, I will merge these two repos. Hopefully I will have time to do this over the next couple of months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants