Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding multi tomogram relationships #302

Merged
merged 37 commits into from
Oct 14, 2024
Merged

Conversation

manasaV3
Copy link
Contributor

@manasaV3 manasaV3 commented Oct 4, 2024

Relates to chanzuckerberg/cryoet-data-portal#1053

Description

There are general updates to the file paths. More on this can be found here.

New file structure:
.
├── DatasetID/
│   ├── ExperimentalRun/
│   │   ├── Alignment/
│   │   │   └── <alignment-id>/
│   │   │       ├── alignment_metadata.json
│   │   │       └── <name-in-source>.(aln|xf|tlt|json|xtilt|com)
│   │   ├── Frames/
│   │   │   ├── <name-in-source>.(eer|mrc|mrc.bz|??)
│   │   │   ├── <name-in-source>.mdoc
│   │   │   └── frames_metadata.json
│   │   ├── Gains/
│   │   │   └── <name-in-source>.(mrc|gain|??)
│   │   ├── Reconstructions/
│   │   │   └── VoxelSpacingXX.XX/
│   │   │       ├── Annotations/
│   │   │       │   └── <anno-id>/
│   │   │       │       ├── annotation_files
│   │   │       │       └── <annotation_key>.json (metadata file)
│   │   │       ├── Images/
│   │   │       │   ├── <tomo-id>-expanded.png
│   │   │       │   ├── <tomo-id>-original.png
│   │   │       │   ├── <tomo-id>-thumbnail.png
│   │   │       │   └── <tomo-id>-snapshot.png
│   │   │       ├── NeuroglancerPrecompute/
│   │   │       │   ├── <anno-id>-shape_precompute
│   │   │       │   └── <tomo-id>-neuroglancer-config.json
│   │   │       └── Tomograms/
│   │   │           └── <tomo-id>/
│   │   │               ├── tomogram-metadata.json
│   │   │               ├── <run_name>.zarr 
│   │   │               └── <run_name>.mrc
│   │   ├── TiltSeries/
│   │   │   └── <ts-id>/
│   │   │       ├── <run_name>.mrc
│   │   │       ├── <run_name>.zarr
│   │   │       ├── tiltseries_metadata.json
│   │   │       └── <name-in-source>.rawtlt
│   │   └── run_metadata.json
│   └── Images
└── DepositionMetadata/
    └── <deposition-id>/
        ├── deposition_metadata.json
        └── Images

Alignments

  • The path is updated, to consolidate all the related alignment files into a directory with alignment_id name inside the Alignment folder
  • In the identifier generation, container key now has a prefix of "alignments". This is to prevent key clashes between any other identifier that is generated at a run level.

Annotations

  • The annotations are being moved from Tomograms/VoxelSpacingX.XXX/Annotations -> Reconstructions/VoxelSpacingX.XXX/Annotations/{annotation_id}
  • The annotation_id is now a directory inside the Annotations folder. All the annotation files and metadata json are inside the annotation_id folder.
  • alignment_metadata_path is being added to Annotations metadata at the root level.
  • In the identifier generation, container key now has a prefix of "annotation". This is to prevent key clashes between any other identifier that is generated at a voxel spacing level level.
  • Annotation id hash generation components now include an optional annotation_object.state and related alignment_metadata_path

Key Photos

  • The tomogram key photos are being moved from Tomograms/VoxelSpacingX.XXX/KeyPhotos -> Reconstructions/VoxelSpacingX.XXX/Images/
  • The key photo file names will have a prefix of the tomogram_id they are related to.

Raw Tilts

  • The raw tilts are now being added as a child to tiltseries in the importer dependency. This means, they will be imported only if there is a related tiltseries in the config.
  • The raw tilt files are being moved from TiltSeries -> TiltSeries/{tiltseries_id}/

Tilt Series

  • The tiltseries files are being moved from TiltSeries -> TiltSeries/{tiltseries_id}/
  • The id for a tiltseries is being generated using deposition_id (ie) There can only be only tiltseries for a deposition.

Tomograms

  • The tomogram files are being moved from Tomograms/VoxelSpacingX.XXX/CanonicalTomogram -> Reconstructions/VoxelSpacingX.XXX/Tomograms/{tomogram_id}
  • The id for a tomogram is being generated based on the values of voxel_spacing, alignment, reconstruction_method, processing, deposition_id
  • The tomogram metadata now fields for alignment_metadata_path and neuroglancer_config_path.
  • The neuroglancer_config_path is only set to a non-null value if is_visualization_default=true

Visualization Config

  • The neuroglancer config file is being moved from Tomograms/VoxelSpacingX.XXX/CanonicalTomogram -> Reconstructions/VoxelSpacingX.XXX/NeuroglancerPrecompute/
  • It will have a prefix of the related tomogram id
  • It will be generated only for the tomograms that have is_visualization_default = true
  • The annotations included in the config will have to share the same alignment as the tomogram.

Annotation Precompute

  • The files are being moved from Tomograms/VoxelSpacingX.XXX/NeuroglancerPrecompute -> Reconstructions/VoxelSpacingX.XXX/NeuroglancerPrecompute/

manasaV3 and others added 25 commits September 25, 2024 16:28
* fix: support filtering entities by related object id's
* Updates to schema and template

* Fixing tests

* Adding files to alignments metadata

* Update the alignment configs method type

* Fixing typo

* Adding undefined alignment_method_type
* Adds comments to jensen config generation

* Adding guardrails

* Clean up

* fix: method_name
@manasaV3 manasaV3 changed the title Adding multi tomogram relationships feat: Adding multi tomogram relationships Oct 7, 2024
@manasaV3 manasaV3 marked this pull request as ready for review October 7, 2024 20:02
@manasaV3 manasaV3 marked this pull request as draft October 7, 2024 20:02
@manasaV3 manasaV3 requested a review from jgadling October 9, 2024 18:54
@manasaV3 manasaV3 marked this pull request as ready for review October 9, 2024 18:59
self.local_metadata = {
"object_count": 0,
"files": [],
"alignment_metadata_path": alignment_metadata_path,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can only be one alignment associated to an annotation.

@@ -78,7 +78,6 @@ def get_filename_prefix(self, output_dir: str, identifier: int) -> str:
output_dir,
"-".join(
[
str(identifier),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uermel Should the annotation object state also be included in the file name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please. This was something I asked for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm my understanding, this has already been included in annotation hash used for generating the identifier (ref here). We also want this in the filename?

Comment on lines +67 to +68
if hasattr(self, "identifier") and self.identifier:
glob_vars[f"{self.type_key}_id"] = self.identifier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handles for the introduction of ids in the file paths.

Comment on lines +49 to +51
TiltSeriesImporter: {
RawTiltImporter: {},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw tilts are now being added as a child to tiltseries. This means aw tilts files will be imported only if there is a related tiltseries in the config.
@uermel Would this break any expected patterns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I don't expect anything to be broken.

@uermel
Copy link
Contributor

uermel commented Oct 9, 2024

.
├── DatasetID/
│   ├── ExperimentalRun/
│   │   ├── Alignment/
│   │   │   └── <alignment-id>/
│   │   │       ├── alignment_metadata.json
│   │   │       └── <name-in-source>.(aln|xf|tlt|json)

This needs to include .xtilt, and .com.

│   │   ├── TiltSeries/
│   │   │   └── <ts-id>/
│   │   │       ├── <run_name>.mrc
│   │   │       ├── <run_name>.zarr
│   │   │       ├── tiltseries_metadata.json
│   │   │       └── <name-in-source>.raw_tlt

This should be .rawtlt.

Copy link
Contributor

@uermel uermel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, slight updates to the description may be useful for future reference.

)
return all(volume_dim.get(dim) for dim in "xyz") or self.get_tomogram() is not None

def get_tomogram(self) -> Optional["TomogramImporter"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave a comment as to why this is safe -- basically that we only allow one alignment & voxel spacing per ingestion config, and that we assume all tomograms imported for a single run have the same dimensions.

@@ -24,23 +24,32 @@ class VisualizationConfigImporter(BaseImporter):
finder_factory = DefaultImporterFactory
has_metadata = False
dir_path = (
"{dataset_name}/{run_name}/Tomograms/VoxelSpacing{voxel_spacing_name}/CanonicalTomogram/"
"neuroglancer_config.json"
"{dataset_name}/{run_name}/Reconstructions/VoxelSpacing{voxel_spacing_name}/NeuroglancerPrecompute/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why aren't we putting neuroglancer precompute data in a subdirectory of the tomogram, sort of like this?

{dataset_id}/{run_name}/Reconstructions/Tomograms/{tomo_id}/NeuroglancerPrecompute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotations precompute should be shareable across different tomograms, if they have the same alignments. So, I didn't want to tie it to a tomogram.

Copy link
Contributor

@jgadling jgadling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me once @uermel 's requests are handled.

@manasaV3 manasaV3 merged commit 7204736 into main Oct 14, 2024
6 checks passed
@manasaV3 manasaV3 deleted the mvenkatakrishnan/multi_tomo branch October 14, 2024 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants