[KED-1740] Name-spaced Parameters #399

benhorsburgh · 2020-06-02T14:30:29Z

Description

First of all - I love Kedro 0.16.x namespacing!

When I create a new modular pipeline, I can very easily re-use this across different name spaces, which is intuitive to use thanks to the new conf sub-directories, and pipeline structure. However, the name-spacing of parameters can only be achieved using messy work arounds.

Let me explain using an example.

Let's say I create kedro pipeline create clean_timestamps as a modular pipeline. In this, I create the following pipeline:

def create_pipeline(**kwargs):
    return Pipeline([
        node(
            func=round_timestamps,
            inputs=dict(params="params:clean_timestamps_params", data="data"),
            outputs=dict(rounded="rounded"),
            name="round_timestamps"
        )
    ])

In my conf/base/pipelines/clean_timestamps/catalog.yml file I can make the datasets very obvious, along with name-spaced versions of my datasets.

# no namespace
#data:
#  type: MemoryDataSet
#
#rounded:
#  type: MemoryDataSet

# hour namespace
hour.data:
  type: pandas.CSVDataSet
  filepath: data/01_raw/hour.csv

# minute namespace
minute.data:
  type: pandas.CSVDataSet
  filepath: data/01_raw/minute.csv

minute.rounded:
  type: pandas.CSVDataSet
  filepath: data/05_model_input/minute_rounded.csv

This gives me very clean and clear flexibility to keep the data that I need, and set up new namespaces. Using this, I can create the default and namespaced pipelines very easily.

clean_timestamps_pipeline = clean_timestamps.create_pipeline()

hour_clean_timestamps_pipeline = pipeline(
    clean_timestamps_pipeline,
    namespace="hour"
)

minute_clean_timestamps_pipeline = pipeline(
    clean_timestamps_pipeline,
    namespace="minute"
)

and everything magically works because of how my config has been set up.

Now if we try to do the same with parameters! Intuitively, I should set up my conf/base/pipelines/clean_timestamps/parameters.yml in exactly the same way, which might look like:

# no namespace
clean_timestamps_params:
  round_frequency: "1min"
  round_col: "dteday"

# hour namespace
hour.clean_timestamps_params:
  round_frequency: "1h"
  round_col: "dteday"

# minute namespace
minute.clean_timestamps_params:
    round_frequency: "1min"
    round_col: "dteday"

and in theory everything should work! Kedro of course does not support this yet. The current workaround is that I make use of the parameters argument on pipelines.

hour_clean_timestamps_pipeline = pipeline(
    clean_timestamps_pipeline,
    parameters={"params:clean_timestamps_params": "params:hour.clean_timestamps_params"},
    namespace="hour"
)

Context

This is important because failure to namespace parameters means that:

I need to add the namespace in two separate places
Non-standard namespacing strategies will be used for parameters leading to:
2.1. Increased learning curves for new team members on projects
2.2 Increased bugs due to not understanding a team convention
It is not possible to specify parameters using dict(param="my_param") format. This is extremely useful, because it means the only strings in my code are things that are configurable, not placeholder variable names. It is much easier to read, and more maintainable.

Adding namespaced parameters will:

Encourage best practices
Make learning new projects much easier
Decrease tech debt due to developers inventing their own strategies
Make code much more readable
Decrease bugs introduced by relying on complex string mappings

Possible Implementation

Continuing from the above example, I would suggest namespacing is handled in exactly the same way as datasets, but appending a prefix to the expected parameter.

With no namespace: "params:my_params"
With namespace: "params:namespace.my_params"

Possible Alternatives

An alternative may be to suffix, but this would be confusing given the dataset implementation.

The text was updated successfully, but these errors were encountered:

mzjp2 · 2020-06-02T15:38:36Z

I've logged this in our backlog to take a look at. Thank you! @lorenabalan you might be interested in this/have a reply :)

lorenabalan · 2020-06-03T09:45:33Z

Thank you @benhorsburgh for such a thoughtful writeup!! Your use case makes sense. Here are some of my thoughts, I'd love to hear more on this.

We've had conflicting feedback about this at the time, with some users saying that it's more often that modular pipelines will only differ by one or 2 parameters, and everything else is intact. I think it was also confusing because "." has a special meaning for parameters. Currently users can access nested parameters through nested.a.b.c and pass that to the node. So when there's a reference nested.a, it's ambiguous if the user meant "entry a under nested" or "parameter named nested.a" and they might end up overwriting each other.

I hear your point about the strings! Currently that'd be hard to do even if we got rid of the "params:" prefix, because we use a jmespath-like syntax for accessing nested parameters with "." notation, which dict() would still be unhappy about, so this one needs more thinking.

limdauto · 2021-02-15T12:52:56Z

This has now landed in develop, so it will go out in 0.18.0.

benhorsburgh added the Issue: Feature Request New feature or improvement to existing feature label Jun 2, 2020

mzjp2 changed the title ~~Name-spaced Parameters~~ [KED-1740] Name-spaced Parameters Jun 2, 2020

pull bot pushed a commit to FoundryAI/kedro that referenced this issue Jul 17, 2020

[KED-1354] Document IncrementalDataSet (kedro-org#399)

d370b88

limdauto closed this as completed Feb 15, 2021

datajoely mentioned this issue Mar 4, 2022

[KED-3022] Sync starters for kedro 0.18.0 kedro-org/kedro-starters#68

Merged

3 tasks

antonymilne mentioned this issue Apr 6, 2022

Update catalog for namespace pref kedro-org/kedro-starters#81

Merged

3 tasks

antonymilne mentioned this issue Aug 8, 2022

Enable dotted keys in params passed to kedro run #1765

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-1740] Name-spaced Parameters #399

[KED-1740] Name-spaced Parameters #399

benhorsburgh commented Jun 2, 2020

mzjp2 commented Jun 2, 2020

lorenabalan commented Jun 3, 2020

limdauto commented Feb 15, 2021

[KED-1740] Name-spaced Parameters #399

[KED-1740] Name-spaced Parameters #399

Comments

benhorsburgh commented Jun 2, 2020

Description

Context

Possible Implementation

Possible Alternatives

mzjp2 commented Jun 2, 2020

lorenabalan commented Jun 3, 2020

limdauto commented Feb 15, 2021