Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Use case #9 using the CLI

Marco Zühlke edited this page Aug 31, 2016 · 51 revisions

From thinking about how we let users perform the use case #9 workflow using the ECT CLI we propose the following CLI changes and extensions:

  • Omit ect list, instead make list a sub command of main commands, e.g. ds, op, etc.
  • Introduce workspaces, which we could use for the GUI as well. This way users can switch between CLI and GUI and use the same input and intermediate and output datasets as they would be persisted in the file system.

A summary of all used command can be found at the bottom of the page.

1.

ect ds list --pattern aero
ect sync AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY  --time 2007
ect ds list --pattern cloud
ect sync CLOUD_L3S_MERGED_PHASE1_V1.0  --time 2007

Here, we had the idea of creating named datasets in the current workspace by providing a target name and an operation with arguments, e.g.:

ect set aero2007 extract name=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY time_range=2007
ect set cloud2007 extract name=CLOUD_L3S_MERGED_PHASE1_V1.0 time_range=2007

2. Co-Registration

ect op list --tag geom

will list all commands that have a tag that matches 'geom'.

Here, we recognised that we need to equip operations with tags (in the API) that can be used to categorize them.

3.

ect op help ect.ops.coregister

will print help and list all parameters of operation 'ect.ops.coregister'.

4.

ect run ect.ops.coregister master=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY,2007 slave=CLOUD_L3S_MERGED_PHASE1_V1.0,2007 method=nearest

according what has bee discussed on page Reading and writing datasets with the CLI. When taking into account the suggestion made in point 1., we could write

ect run ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest

Following more consequently the suggestion made in point 1., we could also write:

ect set cloud2007ga ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest

which doesn't write any data but remembers how to compute cloud2007ga when required. This concept could be extended to derive a workflow JSON at any time from current variable assignments made by the ect set command.

5.

n.a.

6. to 8. Spatial Filtering

according to 2. to 4.

9.

n.a

10. to 12. Temporal Filtering

according to 2. to 4.

13.

n.a.

14. to 17. Time Series Plot

ect op list --tag visual
ect op help ect.ops.timeseriesplot
ect run ect.ops.timeseriesplot aero2007 cloud2007 lat=13 lon=42 all_in_one_graph=true file=ts.png

Here no difference between the two ways, because a file is written immediately.

ect set tsplot ect.ops.timeseriesplot aero2007 cloud2007 lat=13 lon=42 all_in_one_graph=true
ect write ts_plot file=ts.png

This would be an alternative way to get to the same result. Or with a bit different syntax:

ect run image_writer ts_plot file=ts.png

18. to 22. Product-Moment Correlation (Pearson)

ect op list --tag compare
ect op help ect.ops.pearson
ect run ect.ops.pearson aero2007 cloud2007ga mode=scatterplot file=pearson.png

See 14. to 17.

23. to 27. Spatial Filtering

ect op list --tag filter
ect op help ect.ops.spatial_filter
ect set aero2007_sub ect.ops.spatial_filter aero2007 region=POLYGON((......))
ect set cloud2007_sub ect.ops.spatial_filter cloud2007ga region=POLYGON((......))

See 2. to 4.

28. to 32. Animated Map

ect op list --tag animation
ect op help ect.ops.animated_map
ect run ect.ops.animated_map aero2007 cloud2007ga mode=multiple file=animation.gif

See 14. to 17.

33. to 37. Product-Moment Correlation (Pearson)

ect run ect.ops.pearson aero2007_sub cloud2007_sub mode=grid_map stat_file=correlation_statistics.txt map_file=grid_correlation.png

Here the alternative pattern would be:

ect set correlation ect.ops.pearson aero2007_sub cloud2007_sub mode=grid_map
ect write correlation.map file=grid_correlation.png
ect write correlation.stat file=correlation_statistics.txt

It has to be seen if the specification of an attribute on the correlation object is really required or if the output type can be inferred from the file type:

ect write correlation file=grid_correlation.png
ect write correlation file=correlation_statistics.txt

Summary of commands

ect ds list --pattern aero
ect sync AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY  --time 2007
ect set aero2007 extract name=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY time_range=2007
ect op list --tag geom
ect op help ect.ops.coregister
ect set cloud2007ga ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest
ect set tsplot ect.ops.timeseriesplot aero2007 cloud2007 lat=13 lon=42 all_in_one_graph=true
ect write ts_plot file=ts.png

Idea: instead of having commands named extract and write commands named load and save would be a better fit: (Other ideas are load + store, read + write, open + save)

ect ds list --pattern aero
ect ds sync AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY  --time 2007
ect set aero2007 load name=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY time_range=2007
ect op list --tag geom
ect op help ect.ops.coregister
ect set cloud2007ga ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest
ect set tsplot ect.ops.timeseriesplot aero2007 cloud2007 lat=13 lon=42 all_in_one_graph=true
ect save ts_plot file=ts.png

The run command could be used for executing pre-defined workflows (from *.json files) or operators. It would not alter the state of the current workspace.

The CLI and the workspace

The idea is to have a workspace that is shared between the CLI and GUI. Interaction with the workspace is possible using the following commands:

  • The set operation would add a new (python) object to the workspace. These are usually datasets, plots, statistics... This happens by executing an operation (in the example above: "load", "ect.ops.coregister", ..) and assigning the result to an workspace object name. If the operation produces multiple results (a dataset and a plot, e.g.) either a tuple is returned (this means two names can be defined: ect set map,stat ect.ops.pearson aero2007_sub cloud2007_sub mode=grid_map or the workspace objects gets attributes: ect set res ect.ops.pearson aero2007_sub cloud2007_sub mode=grid_map followed by ect write res.map file=map.png.
  • The del operation would remove an object from the workspace. If other objects are depending on such a removed object they get unusable until a new object of the same type and with the same name is created using the set operation.
  • The ws (or workspace) operation would list all objects together with their definition. Currently unusable objects are marked. (See DeDop for further workspace operations)
  • The save operation write the content of an object to a file. For this either the __to_format_X of the object is invoked or a registered Writer is taken from a registry.

In the GUI all objects are listed and can be used for operations. This includes the operations mentioned above plus visualization operations.

Problems to consider

When creating a workspace using the CLI, the operations are recorded but not executed before a dedicated write operation is triggered. This has the positive effect that computations are done as late as possible. But if after some assignments multiple write operations are happening the computation has be be started every time from the beginning as the CLI has no state between invocations.

Example:

ect set aero2007 load name=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY time_range=2007
ect set cloud2007 load name=CLOUD_L3S_MERGED_PHASE1_V1.0 time_range=2007
ect set cloud2007ga ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest
ect set movie ect.ops.animated_map ds=aero2007,cloud2007ga mode=multiple
ect save movie file=movie.gif
ect set correlation ect.ops.pearson ds1=aero2007_sub ds2=cloud2007_sub mode=grid_map
ect save correlation.map file=grid_correlation.png
ect save correlation.stat file=correlation_statistics.txt

Solutions

write command can write multiple output at once

A Solution would be to trigger all write action in a single step:

ect set aero2007 load name=AEROSOL_ATSR2_ORAC_L3_V3.02_MONTHLY time_range=2007
ect set cloud2007 load name=CLOUD_L3S_MERGED_PHASE1_V1.0 time_range=2007
ect set cloud2007ga ect.ops.coregister master=aero2007 slave=cloud2007 method=nearest
ect set movie ect.ops.animated_map ds=aero2007,cloud2007ga mode=multiple
ect set correlation ect.ops.pearson ds1=aero2007_sub ds2=cloud2007_sub mode=grid_map
ect save movie file=movie.gif; correlation.map file=grid_correlation.png; correlation.stat file=correlation_statistics.txt

Disadvantage is that the save command line gets very long and unreadable.

persistent set command

Another option would be to keep intermediate results (either by default or on request) by using the setp (set persistent) instead of the set command. This would persist the object in the workspace. This may only be possible for specific data types e.g. xarray.Dataset and not for matplotlib.plot objects. The user would have to know for which operations this is possible, also the amount of data could be huge if applied to the wrong object (e.g. after opening a big dataset and before re-sampling it to a coarser resolution).

backend as a service

A solution would be to have the CLI talk with the exact same backend service as the GUI would do using REST calls. The backend would run all the time and keep the the object once evaluates in memory. This would harmonize the code paths for GUI and CLI, but lead to problems in batch processing:

  • Each pair of frontend and backend would need it's own port for communication
  • Always 2 processes would have to be launched
  • The simple CLI could get very complicated

No CLI at all

See To CLI or not to CLI