-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow regions to be string format #28
Conversation
with newer version of pytorch-lighting, some parameter for ModelCheckpoint becomes invalid
@@ -99,7 +99,7 @@ def fit( | |||
mode='min', | |||
monitor='loss', | |||
every_n_train_steps=0, | |||
every_n_val_epochs=1 | |||
every_n_val_epochs=1 # this parameter will become invalid with newer version of pytorch-lightning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should be a follow-up PR to change this to every_n_epochs
and upgrade pytorch-lightning.
src/cultionet/scripts/config.yml
Outdated
regions: | ||
- 1 | ||
- 1 | ||
- green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnguyen622 can you explain this one? This would impact the example data, e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgrss , In this example, I produced data with evi2, gcvi, kndvi, and green indices
. I removed region part because I hard-coded the geo_id list (as seen below)
However, I realized that the config.yml can be changed whenever users generate training files. As such, maybe we don't need to change the config.yml in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's my fault. I completely missed that the 'regions' was removed. Let's see if we can leave the config 'regions' as an option and add your request from the CLI. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgrss , I should have provided more context. I think your suggested code will allow regions
to still be in the yml file 👍
I will update once I test out the code generating training files
src/cultionet/scripts/cultionet.py
Outdated
import yaml | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
geo_id_data = pd.read_csv('~/geo_id_grid_list.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coding this here could be problematic. Checking to see how it's used down below....
@@ -213,7 +217,7 @@ def persist_dataset(args): | |||
ref_res_lists = [args.ref_res] | |||
|
|||
inputs = model_preprocessing.TrainInputs( | |||
regions=config['regions'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the flexibility of being able to use a different source for the region ids. However, pinning a file and opening as a global overrides the existing configuration that works with the example data and with other ongoing projects.
My suggestion would be to add a CLI parameter, or parameters, that would allow us to pass a region id file that would override the configuration integer range.
I think something like the following:
if hasattr(args, 'region_id_file'):
if not Path(args.region_id_file).is_file():
raise IOError('The id file does not exist')
id_data = pd.read_csv(args.region_id_file)
# How do you feel about an expected column name of 'id'?
regions = id_data['id'].unique().tolist()
# Otherwise,
# regions = id_data[args.id_column].unique().tolist()
else:
regions = list(range(config['regions'][0], config['regions'][1]+1))
inputs = model_preprocessing.TrainInputs(....
regions=regions,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgrss , I agree. Hard-coding the geo_id file in can be problematic.
I really your idea of passing the file in through CLI. I will implement following your suggestion 👍
start_region = self.regions[0] | ||
end_region = self.regions[1] | ||
region_list = list(range(start_region, end_region+1)) | ||
region_list = self.regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of additional methods to pass the region ids as long as it doesn't remove the existing method. See comments for suggestions.
Thanks @jgrss , I will make changes accordingly 👍 |
remove hard-coded geo_id list pass id list through a file location in CLI
src/cultionet/scripts/config.yml
Outdated
|
||
region_id_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgrss , what do you think if we add region_id_file here in the config file instead of adding additional argument to the CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnguyen622 that could work with a default of !!null
. And then you'll add a check for both of those options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @jgrss, good point. I will add in check for both of these options. 👍
add check to make sure only one option of region is submitted
src/cultionet/scripts/cultionet.py
Outdated
|
||
|
||
if region_as_file: | ||
file_path = config['region_id_file'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- one very minor comment. It looks like the indentation is 2 spaces instead of 4? We should probably just add an automatic formatter like Black.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgrss , good catch. Let me fix that right now.
Next time I will run black before comitting 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about running Black
. We can add it as a pre-commit.
add null value to region_id_file remove green index from image_vis, users can alter their own config.yml file
add comments for each parameter
This PR is to allow grid to be other format in file name. Instead of an integer of six character length, this allows string format, integer format, etcc...
scripts/config.yml - remove region part + add new features
scripts/create_dataset.py – in persist_dataset, change regions into a list of hash polygons instead of CONFIG['regions']
src/cultionet/scripts/cultionet.py – in persist_dataset, change regions into a list of hash polygons instead of CONFIG['regions']
src/cultionet/utils/model_preprocessing.py – in class TrainInputs, attrs_post_init(self), remove region_list as list of range number into region_list = self.regions