Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Script to migrate the directory structure #311

Merged
merged 61 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
7607584
Updating tomograms
manasaV3 Sep 24, 2024
076c3b5
updates to tomogram processing
manasaV3 Sep 26, 2024
b3ab2d3
updates to tomogram key photo filename
manasaV3 Sep 26, 2024
8aee0ee
updates to tomogram neuroglance filepath
manasaV3 Sep 26, 2024
15b7a40
cleaning up tomogram and related entities
manasaV3 Sep 26, 2024
5125982
Updating tiltseries with identifiers
manasaV3 Sep 26, 2024
f1002b0
Updating alignment for tomogram relations
manasaV3 Sep 26, 2024
c79936b
Adding alignment metadatapath to annotations and tomograms
manasaV3 Sep 30, 2024
74a3247
Merge remote-tracking branch 'origin' into multi_tomo
manasaV3 Oct 1, 2024
a42c297
Merge remote-tracking branch 'origin' into multi_tomo
manasaV3 Oct 1, 2024
c1fc0a2
Updating the paths
manasaV3 Oct 1, 2024
60cd302
Updating the paths
manasaV3 Oct 1, 2024
7d01008
Updating failed tests
manasaV3 Oct 1, 2024
98e9163
Working tests clean up needed
manasaV3 Oct 1, 2024
08f0bed
fix: workaround for docker compose bugs. (#295)
jgadling Oct 1, 2024
b89d2c6
fix: support filtering entities by related object id's (#296)
jgadling Oct 1, 2024
dd5f547
chore: add seed script. (#297)
jgadling Oct 1, 2024
c5770ea
feat: Updates to the alignment entity (#298)
manasaV3 Oct 2, 2024
92d45ec
chore: Documenting the Jensen config generation (#293)
manasaV3 Oct 2, 2024
d3c0007
chore: Update footer of release please PR's. (#299)
jgadling Oct 2, 2024
04bedde
Adding tests
manasaV3 Oct 3, 2024
8f6fe81
Merge remote-tracking branch 'origin' into multi_tomo
manasaV3 Oct 4, 2024
06d3a7c
Merge remote-tracking branch 'origin' into multi_tomo
manasaV3 Oct 4, 2024
807f782
cleaning up the paths
manasaV3 Oct 4, 2024
ca54b91
Cleaning up paths
manasaV3 Oct 4, 2024
25df3a4
fix tests
manasaV3 Oct 7, 2024
84a58d5
Updating tomogram id generation
manasaV3 Oct 7, 2024
a656aad
Updating viz_config generation
manasaV3 Oct 7, 2024
065cf49
Making id directories
manasaV3 Oct 7, 2024
f677d01
Making id directories
manasaV3 Oct 7, 2024
785d9c2
Making annotation id directory
manasaV3 Oct 9, 2024
8791a7d
Cleaning up paths with id
manasaV3 Oct 9, 2024
b0efcb2
Updating neuroglancer precompute for dir structure
manasaV3 Oct 9, 2024
a6314f5
Updating neuroglancer config for dir structure
manasaV3 Oct 9, 2024
c1afde9
Updating raw tilt import to tiltseries_id directory
manasaV3 Oct 9, 2024
c84e3b3
Migration script
manasaV3 Oct 10, 2024
497ff54
Update for tomograms
manasaV3 Oct 10, 2024
a03a091
Adding support for alignments
manasaV3 Oct 10, 2024
1df8ee3
Migrating annotation precompute
manasaV3 Oct 14, 2024
79eae8b
Migrating collection_metadata
manasaV3 Oct 14, 2024
6d021bd
Migrating rawtilt and gains
manasaV3 Oct 14, 2024
b482db2
Clean up
manasaV3 Oct 14, 2024
c4b6e0c
Clean up
manasaV3 Oct 14, 2024
79b293f
Clean up
manasaV3 Oct 14, 2024
5b1ffaf
Enabling the move
manasaV3 Oct 14, 2024
2e62a97
Adding metadata updates
manasaV3 Oct 14, 2024
d0db03c
Uncommenting deletion
manasaV3 Oct 14, 2024
a1a4874
Adding check before move
manasaV3 Oct 14, 2024
14992be
Adding wdl
manasaV3 Oct 14, 2024
86efbd7
minor fixes
manasaV3 Oct 14, 2024
05317e8
Updating enqueue script
manasaV3 Oct 15, 2024
148026b
Merge branch 'main' into mvenkatakrishnan/migration_scripts
manasaV3 Oct 15, 2024
7c87421
Updating annotation file names
manasaV3 Oct 15, 2024
8fb098a
Updating the tomogram key photo path
manasaV3 Oct 15, 2024
e9ef47b
Fixing tests
manasaV3 Oct 15, 2024
f4abad1
Merge branch 'main' into mvenkatakrishnan/migration_scripts
manasaV3 Oct 15, 2024
f540ab7
Merge branch 'main', remote-tracking branch 'origin' into migrate_tomo
manasaV3 Oct 15, 2024
602f345
Fix key_photo migration
manasaV3 Oct 15, 2024
05f22ed
Handling default alignments
manasaV3 Oct 16, 2024
43f0c74
fix path standardization when bucket name not in path
manasaV3 Oct 16, 2024
da168d0
Lint.
jgadling Oct 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ingestion_tools/scripts/common/alignment_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def _get_files_with_suffix(self, valid_suffix: list[str]) -> str | None:
for path in self.paths:
if path.endswith(tuple(valid_suffix)):
file_name = os.path.basename(path)
dest_filepath = f"{self.output_prefix}{file_name}"
dest_filepath = os.path.join(self.output_prefix, file_name)
if self.config.fs.exists(dest_filepath):
return dest_filepath
return None
Expand Down
4 changes: 3 additions & 1 deletion ingestion_tools/scripts/common/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,13 @@ def get_metadata_path(self, obj: BaseImporter) -> str:
key = f"{obj.type_key}_metadata"
return self.resolve_output_path(key, obj)

def resolve_output_path(self, key: str, obj: BaseImporter) -> str:
def resolve_output_path(self, key: str, obj: BaseImporter, extra_glob_vars: dict = None) -> str:
from importers.utils import get_importer_output_path

output_prefix = self.output_prefix
glob_vars = obj.get_glob_vars()
if extra_glob_vars:
glob_vars.update(extra_glob_vars)
path = os.path.join(output_prefix, get_importer_output_path(key).format(**glob_vars))
if ".json" in path or ".mrc" in path or ".zarr" in path:
self.fs.makedirs(os.path.dirname(path))
Expand Down
3 changes: 2 additions & 1 deletion ingestion_tools/scripts/common/finders.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ def _is_match(cls, metadata: dict[str, Any], key: list[str], expected_value: Any
return expected_value in value if isinstance(value, list) else value == expected_value

def find(self, config: DepositionImportConfig, glob_vars: dict[str, Any]) -> dict[str, str | None]:
output_path = os.path.join(config.output_prefix, self.importer_cls.dir_path.format(**glob_vars))
updated_glob_vars = {**glob_vars, **{f"{self.importer_cls.type_key}_id": "*"}}
output_path = os.path.join(config.output_prefix, self.importer_cls.dir_path.format(**updated_glob_vars))
responses = {}
for file_path in config.fs.glob(os.path.join(output_path, "*metadata.json")):
local_filename = config.fs.localreadable(file_path)
Expand Down
10 changes: 10 additions & 0 deletions ingestion_tools/scripts/common/fs.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ def exists(self, path: str) -> bool:
def read_block(self, path: str, start: int | None = None, end: int | None = None) -> str:
pass

@abstractmethod
def move(self, src_path: str, dest_path: str, **kwargs) -> None:
pass


class S3Filesystem(FileSystemApi):
def __init__(self, force_overwrite: bool, client_kwargs: None | dict[str, str] = None, **kwargs):
Expand Down Expand Up @@ -169,6 +173,9 @@ def read_block(self, path: str, start: int | None = None, end: int | None = None

return local_dest_file

def move(self, src_path: str, dest_path: str, **kwargs) -> None:
self.s3fs.mv(src_path, dest_path, **kwargs)


class LocalFilesystem(FileSystemApi):
def __init__(self, force_overwrite: bool):
Expand Down Expand Up @@ -203,3 +210,6 @@ def push(self, path: str) -> None:

def exists(self, path: str) -> bool:
return os.path.exists(path)

def move(self, src_path: str, dest_path: str, **kwargs) -> None:
shutil.move(src_path, dest_path)
4 changes: 3 additions & 1 deletion ingestion_tools/scripts/common/id_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ def _load_ids_for_container(
return
metadata_glob = cls._get_metadata_glob(config, parents, *args, **kwargs)
for file in config.fs.glob(metadata_glob):
identifier = int(os.path.basename(file).split("-")[0])
id_dirname = os.path.basename(os.path.dirname(file))
# identifier = int(id_dirname) if id_dirname.isdigit() else 100
identifier = int(id_dirname)
if identifier >= cls.next_identifier[container_key]:
cls.next_identifier[container_key] = identifier + 1
metadata = json.loads(config.fs.open(file, "r").read())
Expand Down
3 changes: 1 addition & 2 deletions ingestion_tools/scripts/common/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class NeuroglancerMetadata(BaseMetadata):


class AnnotationMetadata(MergedMetadata):
def get_filename_prefix(self, output_dir: str, identifier: int) -> str:
def get_filename_prefix(self, output_dir: str) -> str:
version = self.metadata["version"]
obj = None
with contextlib.suppress(KeyError):
Expand All @@ -78,7 +78,6 @@ def get_filename_prefix(self, output_dir: str, identifier: int) -> str:
output_dir,
"-".join(
[
str(identifier),
re.sub("[^0-9a-z]", "_", obj.lower()),
re.sub("[^0-9a-z.]", "_", f"{str(version).lower()}"),
],
Expand Down
54 changes: 25 additions & 29 deletions ingestion_tools/scripts/importers/alignment.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,29 @@
import os.path
from typing import TYPE_CHECKING, Any
from typing import TYPE_CHECKING, Any, Optional

from common.alignment_converter import alignment_converter_factory
from common.config import DepositionImportConfig
from common.finders import MultiSourceFileFinder
from common.id_helper import IdentifierHelper
from common.metadata import AlignmentMetadata
from importers.base_importer import BaseFileImporter
from importers.tiltseries import TiltSeriesImporter
from importers.voxel_spacing import VoxelSpacingImporter

if TYPE_CHECKING:
TomogramImporter = "TomogramImporter"
TiltSeriesImporter = "TiltSeriesImporter"
else:
from importers.tiltseries import TiltSeriesImporter
from importers.tomogram import TomogramImporter


class AlignmentIdentifierHelper(IdentifierHelper):
@classmethod
def _get_container_key(cls, config: DepositionImportConfig, parents: dict[str, Any], *args, **kwargs) -> str:
return parents["run"].get_output_path()
return "-".join(["alignment", parents["run"].get_output_path()])

@classmethod
def _get_metadata_glob(cls, config: DepositionImportConfig, parents: dict[str, Any], *args, **kwargs) -> str:
run = parents["run"]
alignment_dir_path = config.resolve_output_path("alignment", run)
return os.path.join(alignment_dir_path, "*alignment_metadata.json")
metadata_glob = config.resolve_output_path("alignment_metadata", run, {"alignment_id": "*"})
return metadata_glob

@classmethod
def _generate_hash_key(
Expand All @@ -52,10 +50,10 @@ class AlignmentImporter(BaseFileImporter):

type_key = "alignment"
plural_key = "alignments"

finder_factory = MultiSourceFileFinder
has_metadata = True
dir_path = "{dataset_name}/{run_name}/Alignments"
dir_path = "{dataset_name}/{run_name}/Alignments/{alignment_id}"
metadata_path = os.path.join(dir_path, "alignment_metadata.json")

def __init__(self, *args, file_paths: dict[str, str], **kwargs):
super().__init__(*args, **kwargs)
Expand All @@ -71,7 +69,7 @@ def __init__(self, *args, file_paths: dict[str, str], **kwargs):

def import_metadata(self) -> None:
if not self.is_import_allowed():
print(f"Skipping import of {self.name}")
print(f"Skipping import of {self.name} metadata")
return
metadata_path = self.get_metadata_path()
try:
Expand All @@ -95,18 +93,10 @@ def import_item(self) -> None:
dest_filename = self.get_dest_filename(path)
self.config.fs.copy(path, dest_filename)

def get_output_path(self) -> str:
output_directory = super().get_output_path()
return os.path.join(output_directory, f"{self.identifier}-")

def get_dest_filename(self, path: str) -> str | None:
if not path:
return None
output_dir = self.get_output_path()
return f"{output_dir}{os.path.basename(path)}"

def get_metadata_path(self) -> str:
return self.get_output_path() + "alignment_metadata.json"
return os.path.join(self.get_output_path(), os.path.basename(path))

def get_extra_metadata(self) -> dict:
extra_metadata = {
Expand All @@ -125,21 +115,27 @@ def get_extra_metadata(self) -> dict:
return extra_metadata

def get_tomogram_volume_dimension(self) -> dict:
for tomogram in TomogramImporter.finder(self.config, **self.parents):
return tomogram.get_source_volume_info().get_dimensions()

# If no source tomogram is found don't create a default alignment metadata file.
raise IOError("No source tomogram found for creating default alignment")
tomogram = self.get_tomogram()
if not tomogram:
# If no source tomogram is found don't create a default alignment metadata file.
raise IOError("No source tomogram found for creating default alignment")
return tomogram.get_source_volume_info().get_dimensions()

def is_default_alignment(self) -> bool:
return "default" in self.file_paths

def is_valid(self) -> bool:
volume_dim = self.metadata.get("volume_dimension", {})
return (
all(volume_dim.get(dim) for dim in "xyz")
or next(TomogramImporter.finder(self.config, **self.parents), None) is not None
)
return all(volume_dim.get(dim) for dim in "xyz") or self.get_tomogram() is not None

def get_tomogram(self) -> Optional["TomogramImporter"]:
from importers.tomogram import TomogramImporter

for voxel_spacing in VoxelSpacingImporter.finder(self.config, **self.parents):
parents = {**self.parents, "voxel_spacing": voxel_spacing}
for tomogram in TomogramImporter.finder(self.config, **parents):
return tomogram
return None

def get_tiltseries_path(self) -> str | None:
for ts in TiltSeriesImporter.finder(self.config, **self.parents):
Expand Down
41 changes: 32 additions & 9 deletions ingestion_tools/scripts/importers/annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,19 @@
from common.id_helper import IdentifierHelper
from common.image import check_mask_for_label, make_pyramids
from common.metadata import AnnotationMetadata
from importers.alignment import AlignmentImporter
from importers.base_importer import BaseImporter


class AnnotationIdentifierHelper(IdentifierHelper):
@classmethod
def _get_container_key(cls, config: DepositionImportConfig, parents: dict[str, Any], *args, **kwargs):
return parents["voxel_spacing"].get_output_path()
return "-".join(["annotation", parents["voxel_spacing"].get_output_path()])

@classmethod
def _get_metadata_glob(cls, config: DepositionImportConfig, parents: dict[str, Any], *args, **kwargs) -> str:
vs = parents["voxel_spacing"]
anno_dir_path = config.resolve_output_path("annotation", vs)
anno_dir_path = config.resolve_output_path("annotation", vs, {"annotation_id": "*"})
return os.path.join(anno_dir_path, "*.json")

@classmethod
Expand All @@ -36,6 +37,8 @@ def _generate_hash_key(cls, container_key: str, metadata: dict[str, Any], parent
metadata["annotation_object"].get("description") or "",
metadata["annotation_object"]["name"],
metadata["annotation_method"],
metadata["annotation_object"].get("state") or "",
metadata.get("alignment_metadata_path", kwargs.get("alignment_metadata_path")),
],
)

Expand Down Expand Up @@ -71,13 +74,21 @@ def _instantiate(
parents: dict[str, Any] | None,
):
source_args = {k: v for k, v in self.source.items() if k not in {"shape", "glob_string", "glob_strings"}}
alignment_path = self._get_alignment_metadata_path(config, parents)
identifier = AnnotationIdentifierHelper.get_identifier(
config,
metadata,
parents,
alignment_metadata_path=alignment_path,
)
instance_args = {
"identifier": AnnotationIdentifierHelper.get_identifier(config, metadata, parents),
"identifier": identifier,
"config": config,
"metadata": metadata,
"name": name,
"path": path,
"parents": parents,
"alignment_metadata_path": alignment_path,
"allow_imports": allow_imports,
**source_args,
}
Expand All @@ -102,25 +113,36 @@ def _instantiate(
if anno.is_valid():
return anno

@classmethod
def _get_alignment_metadata_path(cls, config: DepositionImportConfig, parents: dict[str, Any]) -> str:
for alignment in AlignmentImporter.finder(config, **parents):
return alignment.get_metadata_path()
return ""


class AnnotationImporter(BaseImporter):
type_key = "annotation"
plural_key = "annotations"
finder_factory = AnnotationImporterFactory
has_metadata = True
dir_path = "{dataset_name}/{run_name}/Tomograms/VoxelSpacing{voxel_spacing_name}/Annotations"
metadata_path = "{dataset_name}/{run_name}/Tomograms/VoxelSpacing{voxel_spacing_name}/Annotations"
dir_path = "{dataset_name}/{run_name}/Reconstructions/VoxelSpacing{voxel_spacing_name}/Annotations/{annotation_id}"
metadata_path = dir_path
written_metadata_files = [] # This is a *class* variable that helps us avoid writing metadata files multiple times.

def __init__(
self,
identifier: int,
alignment_metadata_path: str,
*args,
**kwargs,
) -> None:
super().__init__(*args, **kwargs)
self.identifier: int = identifier
self.local_metadata = {"object_count": 0, "files": []}
self.local_metadata = {
"object_count": 0,
"files": [],
"alignment_metadata_path": alignment_metadata_path,
}
self.annotation_metadata = AnnotationMetadata(self.config.fs, self.get_deposition().name, self.metadata)

# Functions to support writing annotation data
Expand All @@ -133,12 +155,13 @@ def import_item(self):

# Functions to support writing annotation metadata
def get_output_path(self):
output_dir = super().get_output_path()
return self.annotation_metadata.get_filename_prefix(output_dir, self.identifier)
output_dir = super().get_output_path().format(annotation_id=self.identifier)
self.config.fs.makedirs(output_dir)
return self.annotation_metadata.get_filename_prefix(output_dir)

def import_metadata(self):
if not self.is_import_allowed():
print(f"Skipping import of {self.name}")
print(f"Skipping import of {self.name} metadata")
return
dest_prefix = self.get_output_path()
filename = f"{dest_prefix}.json"
Expand Down
3 changes: 3 additions & 0 deletions ingestion_tools/scripts/importers/base_importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ def get_glob_vars(self) -> dict[str, Any]:
glob_vars = {}
glob_vars[f"{self.type_key}_path"] = self.path
glob_vars[f"{self.type_key}_name"] = self.name
if hasattr(self, "identifier") and self.identifier:
glob_vars[f"{self.type_key}_id"] = self.identifier
with contextlib.suppress(ValueError, TypeError):
glob_vars[f"int_{self.type_key}_name"] = int(self.name)

Expand Down Expand Up @@ -159,6 +161,7 @@ def __init__(
):
super().__init__(*args, **kwargs)
self.volume_filename = path
self.identifier = None

def get_voxel_size(self) -> float:
return get_voxel_size(self.config.fs, self.volume_filename)
Expand Down
2 changes: 1 addition & 1 deletion ingestion_tools/scripts/importers/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def import_item(self) -> None:

def import_metadata(self) -> None:
if not self.is_import_allowed():
print(f"Skipping import of {self.name}")
print(f"Skipping import of {self.name} metadata")
return
meta = DatasetMetadata(self.config.fs, self.get_deposition().name, self.get_base_metadata())
extra_data = self.load_extra_metadata()
Expand Down
5 changes: 3 additions & 2 deletions ingestion_tools/scripts/importers/deposition.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os.path
from typing import Any

from common.finders import DefaultImporterFactory
Expand All @@ -12,14 +13,14 @@ class DepositionImporter(BaseImporter):
finder_factory = DefaultImporterFactory
has_metadata = True
dir_path = "depositions_metadata/{deposition_name}"
metadata_path = "depositions_metadata/{deposition_name}/deposition_metadata.json"
metadata_path = os.path.join(dir_path, "deposition_metadata.json")

def import_item(self) -> None:
pass

def import_metadata(self) -> None:
if not self.is_import_allowed():
print(f"Skipping import of {self.name}")
print(f"Skipping import of {self.name} metadata")
return
meta = DepositionMetadata(self.config.fs, self.name, self.get_base_metadata())
extra_data = self.load_extra_metadata()
Expand Down
8 changes: 4 additions & 4 deletions ingestion_tools/scripts/importers/key_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class KeyImageImporter(BaseImporter):
"snapshot": 512, # small detail expand
"expanded": 1024, # large detail expand
}
dir_path = "{dataset_name}/{run_name}/Tomograms/VoxelSpacing{voxel_spacing_name}/KeyPhotos"
dir_path = "{dataset_name}/{run_name}/Reconstructions/VoxelSpacing{voxel_spacing_name}/Images"

def get_metadata(self) -> dict[str, str]:
return {
Expand Down Expand Up @@ -92,9 +92,9 @@ def generate_preview_from_tomo(self) -> tuple[np.ndarray, np.ndarray]:
preview = generate_preview(data, projection_depth=40, annotations=self.load_annotations(), cmap="tab10")
return preview, data.shape[-1]

@staticmethod
def get_file_name(image_type: str) -> str:
return f"key-photo-{image_type}.png"
def get_file_name(self, image_type: str) -> str:
tomogram_id = self.get_tomogram().get_identifier()
return f"{tomogram_id}-key-photo-{image_type}.png"

@classmethod
def get_default_config(cls) -> list[dict] | None:
Expand Down
2 changes: 1 addition & 1 deletion ingestion_tools/scripts/importers/rawtilt.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ class RawTiltImporter(BaseFileImporter):
plural_key = "rawtilts"
finder_factory = DefaultImporterFactory
has_metadata = False
dir_path = "{dataset_name}/{run_name}/TiltSeries"
dir_path = "{dataset_name}/{run_name}/TiltSeries/{tiltseries_id}"
Loading
Loading