Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow regions to be string format #28

Merged
merged 14 commits into from
Sep 14, 2022
Merged

allow regions to be string format #28

merged 14 commits into from
Sep 14, 2022

Conversation

nnguyen622
Copy link
Collaborator

This PR is to allow grid to be other format in file name. Instead of an integer of six character length, this allows string format, integer format, etcc...

  • scripts/config.yml - remove region part + add new features

  • scripts/create_dataset.py – in persist_dataset, change regions into a list of hash polygons instead of CONFIG['regions']

  • src/cultionet/scripts/cultionet.py – in persist_dataset, change regions into a list of hash polygons instead of CONFIG['regions']

  • src/cultionet/utils/model_preprocessing.py – in class TrainInputs, attrs_post_init(self), remove region_list as list of range number into region_list = self.regions

@nnguyen622 nnguyen622 marked this pull request as draft August 30, 2022 16:26
with newer version of pytorch-lighting, some parameter for ModelCheckpoint becomes invalid
@nnguyen622 nnguyen622 marked this pull request as ready for review September 7, 2022 18:27
@jgrss jgrss self-requested a review September 7, 2022 18:31
@@ -99,7 +99,7 @@ def fit(
mode='min',
monitor='loss',
every_n_train_steps=0,
every_n_val_epochs=1
every_n_val_epochs=1 # this parameter will become invalid with newer version of pytorch-lightning
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be a follow-up PR to change this to every_n_epochs and upgrade pytorch-lightning.

regions:
- 1
- 1
- green
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnguyen622 can you explain this one? This would impact the example data, e.g.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss , In this example, I produced data with evi2, gcvi, kndvi, and green indices. I removed region part because I hard-coded the geo_id list (as seen below)
However, I realized that the config.yml can be changed whenever users generate training files. As such, maybe we don't need to change the config.yml in this PR?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's my fault. I completely missed that the 'regions' was removed. Let's see if we can leave the config 'regions' as an option and add your request from the CLI. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss , I should have provided more context. I think your suggested code will allow regions to still be in the yml file 👍
I will update once I test out the code generating training files

import yaml


logger = logging.getLogger(__name__)

geo_id_data = pd.read_csv('~/geo_id_grid_list.csv')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding this here could be problematic. Checking to see how it's used down below....

@@ -213,7 +217,7 @@ def persist_dataset(args):
ref_res_lists = [args.ref_res]

inputs = model_preprocessing.TrainInputs(
regions=config['regions'],
Copy link
Owner

@jgrss jgrss Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the flexibility of being able to use a different source for the region ids. However, pinning a file and opening as a global overrides the existing configuration that works with the example data and with other ongoing projects.

My suggestion would be to add a CLI parameter, or parameters, that would allow us to pass a region id file that would override the configuration integer range.

I think something like the following:

if hasattr(args, 'region_id_file'):
    if not Path(args.region_id_file).is_file():
        raise IOError('The id file does not exist')
    id_data = pd.read_csv(args.region_id_file)
    # How do you feel about an expected column name of 'id'?
    regions = id_data['id'].unique().tolist()  
    # Otherwise, 
    # regions = id_data[args.id_column].unique().tolist()
else:
    regions = list(range(config['regions'][0], config['regions'][1]+1))


inputs = model_preprocessing.TrainInputs(....
    regions=regions,
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss , I agree. Hard-coding the geo_id file in can be problematic.
I really your idea of passing the file in through CLI. I will implement following your suggestion 👍

start_region = self.regions[0]
end_region = self.regions[1]
region_list = list(range(start_region, end_region+1))
region_list = self.regions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Owner

@jgrss jgrss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of additional methods to pass the region ids as long as it doesn't remove the existing method. See comments for suggestions.

@nnguyen622
Copy link
Collaborator Author

I like the idea of additional methods to pass the region ids as long as it doesn't remove the existing method. See comments for suggestions.

Thanks @jgrss , I will make changes accordingly 👍

remove hard-coded geo_id list
pass id list through a file location in CLI
@nnguyen622 nnguyen622 marked this pull request as draft September 8, 2022 21:03
Comment on lines 10 to 11

region_id_file:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss , what do you think if we add region_id_file here in the config file instead of adding additional argument to the CLI?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnguyen622 that could work with a default of !!null. And then you'll add a check for both of those options?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @jgrss, good point. I will add in check for both of these options. 👍

add check to make sure only one option of region is submitted


if region_as_file:
file_path = config['region_id_file']
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- one very minor comment. It looks like the indentation is 2 spaces instead of 4? We should probably just add an automatic formatter like Black.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss , good catch. Let me fix that right now.
Next time I will run black before comitting 👍

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about running Black. We can add it as a pre-commit.

@jgrss jgrss added the enhancement New feature or request label Sep 13, 2022
@nnguyen622 nnguyen622 marked this pull request as ready for review September 14, 2022 15:41
add null value to region_id_file 
remove green index from image_vis, users can alter their own config.yml file
add comments for each parameter
@jgrss jgrss merged commit e4e0b13 into jgrss:main Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants