Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long distance restrictions #949

Merged
merged 23 commits into from
May 10, 2024
Merged

Long distance restrictions #949

merged 23 commits into from
May 10, 2024

Conversation

CBroz1
Copy link
Member

@CBroz1 CBroz1 commented Apr 30, 2024

Description

This PR refactors dj_chains and dj_graph to use centralized logic for navigating graphs. See new docs for new notation using << and >> operators.

Other minor changes

  • Adds camel_name as a Mixin property
  • Adds fuzzy_get to dj_helper to permit indexing of custom objects
  • Moves PERIPHERAL_TABLES to dj_helper to avoid circular import
  • Removes redundant check on on loading the graph - graph handles this check
  • Adds import_merge_table backup based on user feedback to search_descendants

Checklist:

  • This PR should be accompanied by a release: No
  • N/A. If release, I have updated the CITATION.cff
  • No. This PR makes edits to table definitions: (yes/no)
  • N/A. If table edits, I have included an alter snippet for release notes.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

@CBroz1 CBroz1 changed the title Generalize RestrGraph class to handle key_from_upstream Long distance restrictions May 3, 2024
@CBroz1 CBroz1 marked this pull request as ready for review May 3, 2024 18:54
@CBroz1 CBroz1 requested a review from samuelbray32 May 3, 2024 18:55
Copy link
Collaborator

@samuelbray32 samuelbray32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One required bug fix, and a few usability ideas.

One other more general idea:

  • From testing, it seems like a multi-part restriction only works through this if all the keys are found in the same up(down)stream table
  • I assume the correct way for a user to do this would be call restrict_by separately for each key and then combine the results.
  • Could be nice to happen inside the restrict_by function (e.g. split the restriction into parts, call restrict_by for each, then return the intersection fo the results

src/spyglass/utils/dj_mixin.py Outdated Show resolved Hide resolved
src/spyglass/utils/dj_mixin.py Show resolved Hide resolved
return graph

ret = graph.leaf_ft[0]
if len(ret) == len(self) or len(ret) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A usability thought/suggestion. As you mentioned, the shortest path option can get tricky with merge tables. In testing this I would end up manually repeating the call when it went the "wrong" way with no results and adding to the banned list each time.

When an empty restriction set is found would it make sense to add whatever the top of the cahin was to the banned list and then recursively call restrict_by?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop condition would either be finding a non-empty set or hitting the error where the key is no longer in the graph

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan for the immediate future is to get a sense of how useful it is from folks. If useful, if the table banning process is a chore, I would continue the TableChain search until all tables in the path had a valid restriction. Rather than deciding the path, I would depth-first search until there was a valid restriction the whole way.

It seemed like a lot of work to put in without that initial feedback, but I can revisit if you think it's needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I agree it's not required for this PR.

As another option that doesn't require action right now; maybe a more efficient solution would be something like a required_tables_list where a chain is only valid if it contains the tables in that list. Would potentially solve the case of picking from a certain source of a merge table.

@CBroz1
Copy link
Member Author

CBroz1 commented May 9, 2024

  • Could be nice to happen inside the restrict_by function (e.g. split the restriction into parts, call restrict_by for each, then return the intersection fo the results

That's true - I had assumed that the passed restriction was valid on exactly one table elsewhere. If the user wanted to restrict by multiple, I think they could (Table << restr1) << restr2. I think the splitting into parts would be tricky. DJ already uses lists to indicate OR and parsing the possibilities can be tricky. What end-user syntax did you have in mind?

@CBroz1 CBroz1 requested a review from samuelbray32 May 9, 2024 20:58
@samuelbray32
Copy link
Collaborator

What end-user syntax did you have in mind?

I was trying something like table << {"restr1": restr1_val, "restr_2": restr_2_val} being parsed by the function into (table << {"restr1": restr1_val}) << {"restr_2": restr_2_val} as you described.

This is another that was just me trying to test it as a naive user so it's fair for it not all implemented at release if the core function is working.

@CBroz1
Copy link
Member Author

CBroz1 commented May 10, 2024

  1. (table << {"restr1": restr1_val}) << {"restr_2": restr_2_val}
  2. table << {"restr1": restr1_val, "restr_2": restr_2_val}

Across these cases, (1) uses additional info from the user that these likely come from different tables, and implementing (2) would assume they do not, which incurs the time cost of checking each table to see where they belong. I'd consider that time cost an annoyance, but maybe the average user wouldn't? Worth asking when I collect feedback

@edeno edeno merged commit 2f6634b into LorenFrankLab:master May 10, 2024
7 checks passed
@CBroz1 CBroz1 deleted the rrt branch May 10, 2024 21:04
@CBroz1 CBroz1 mentioned this pull request May 20, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants