Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Annotation Importer for Instance segmentations. #31

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

uermel
Copy link
Contributor

@uermel uermel commented Mar 22, 2024

Imports TARDIS instance points, new point formate with the additional label "instance_id".

@uermel
Copy link
Contributor Author

uermel commented Mar 22, 2024

A config file to test with current ingestion tools:

dataset:
  dataset_identifier: 10002
  dataset_description: "Cryo-electron tomograms of RPE1 cells. Comprehensive annotation of actin filaments and microtubules"
  dataset_title: RPE1 cytosol with actin stress fiber
  authors: &dataset_authors
    - name: Irene de Teresa Trueba
      ORCID: 0000-0002-4691-9501
      primary_author_status: true
    - name: Sara Goetz
      ORCID: 0000-0002-9903-3667
    - name: Alexander Mattausch
      ORCID: 0000-0003-0901-8701
    - name: Frosina Stojanovska
      ORCID: 0000-0002-4327-1068
    - name: Christian Eugen Zimmerli
      ORCID: 0000-0003-4388-1349
    - name: Mauricio Toro-Nahuelpan
      ORCID: 0000-0001-5333-3640
    - name: Dorothy W. C. Cheng
    - name: Fergus Tollervey
    - name: Constantin Pape
      ORCID: 0000-0001-6562-7187
    - name: Martin Beck
      ORCID: 0000-0002-7397-1321
    - name: Alba Diz-Munoz
      ORCID: 0000-0001-6864-8901
    - name: Anna Kreshuk
      ORCID: 0000-0003-1334-6388
    - name: Julia Mahamid
      ORCID: 0000-0001-6968-041X
      corresponding_author_status: true
    - name: Judith B. Zaugg
      ORCID: 0000-0001-8324-4040
      corresponding_author_status: true
  cross_references:
    dataset_publications: &publications doi:10.1101/2022.04.12.488077, doi:10.1038/s41592-022-01746-2
    related_database_entries: EMPIAR-10989
  cell_component:
    name: ~
    id: ~
  cell_type:
    id: CL:0002586
    name: Retinal pigment epithelial-1
  dates: &repo-dates
    deposition_date: 2023-05-01
    last_modified_date: 2023-11-08
    release_date: 2023-11-30
  funding:
    - funding_agency_name: European Research Council (ERC)
      grant_id: '760067'
  grid_preparation: 'model: Quantifoil, material: GOLD, support_film_film_type_id: 1, support_film_film_topology: HOLEY, support_film_instance_type: support_film, pretreatment_type_: PLASMA CLEANING'
  key_photos:
    snapshot: https://www.ebi.ac.uk/pdbe/emdb-empiar/entryIcons/10989-l.gif
    thumbnail: https://www.ebi.ac.uk/pdbe/emdb-empiar/entryIcons/10989.gif
  organism:
    name: Homo sapiens
    taxonomy_id: 9606
  sample_preparation: 'buffer_ph: 7.4, vitrification_cryogen_name: ETHANE, cryo_protectant: None, instance_type: tomography_preparation'
  sample_type: cell
  tissue:
    id: BTO:0001175
    name: retina
runs: {}
tiltseries:
  acceleration_voltage: 300000
  binning_from_frames: 1
  camera:
    manufacturer: Gatan
    model: K2 SUMMIT
  data_acquisition_software: SerialEM
  microscope:
    manufacturer: TFS
    model: KRIOS
  microscope_optical_setup:
    energy_filter: GIF Quantum LS
    phase_plate: VOLTA PHASE PLATE
    image_corrector: null
  pixel_spacing: '{run_pixel_spacing}'
  related_empiar_entry: ~
  scales: []
  spherical_aberration_constant: 2.7
  tilting_scheme: Dose symmetric from 0.0 degrees
  tilt_axis: 79
  tilt_range:
    min: -60
    max: 60
  tilt_step: 3
  tilt_series_quality: 5
  total_flux: 125
  is_aligned: false
  alignment_binning_factor: ~ # We may need to calculate this for each tilt if we can identify aligned & unaligned files in the bucket.
tomograms:
  ctf_corrected: false
  fiducial_alignment_status: NON_FIDUCIAL
  offset:
    x: 0
    y: 0
    z: 0
  affine_transformation_matrix: [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, -1, 0], [0, 0, 625, 1]] # TODO - this is different per tomo
  processing: raw
  reconstruction_method: Weighted back projection
  reconstruction_software: IMOD
  tomogram_version: 1
  authors: *dataset_authors
  voxel_spacing: '{run_voxel_spacing}'
annotations:
  - metadata:
      annotation_object:
        id: GO:0016020
        name: membrane
        description: ~
        state: ~
      dates: *repo-dates
      annotation_method: TARDIS
      method_type: automated
      annotation_publications: ~
      ground_truth_status: False
      authors: &annotation_authors
        - name: Robert Kiewisz
          primary_annotator_status: true
        - name: Tristan Bepler
          corresponding_author_status: true
      annotation_software: TARDIS
      version: '1.0'
      confidence:
        precision: ~
        recall: ~
      is_curator_recommended: False
    sources:
      - file_format: tardis
        binning: 1
        order: xyz
        shape: InstanceSegmentation
        glob_string: 10002/{run_name}/{run_name}_instance.csv
        is_visualization_default: false
  - metadata:
      dates: *repo-dates
      annotation_method: TARDIS
      method_type: automated
      annotation_publications: *publications
      ground_truth_status: False
      authors: *annotation_authors
      annotation_software: TARDIS
      version: '1.0'
      confidence:
        precision: ~
        recall: ~
        ground_truth_used: ~
      annotation_object:
        id: GO:0016020
        name: membrane
        description: ~
        state: ~
      is_curator_recommended: false
    sources:
      - file_format: mrc
        shape: SemanticSegmentationMask
        glob_string: 10002/{run_name}/{run_name}_semantic.mrc
        is_visualization_default: false

overrides_by_run:
  - run_regex: "^00011$"
    tiltseries:
      pixel_spacing: 3.45
    tomograms:
      voxel_spacing: 13.8
  - run_regex: "^00012$"
    tiltseries:
      pixel_spacing: 3.45
    tomograms:
      voxel_spacing: 13.8
standardization_config:
  destination_prefix: '10002_robert'
  source_prefix: julia_test/RPE1/RPE1
  run_to_frame_map_csv: run_to_frame_name_map.csv
  gain_glob: CountRef_{mapped_frame_name}-range.dm4
  frames_glob: frames/{mapped_frame_name}_*.tif
  rawtlt_files:
    - metadata/{run_name}_sq_df_sorted_fid.xf
    - metadata/{run_name}.mdoc
  tiltseries_glob: stack/{run_name}_sq_df_sorted_orig.st
  tomo_format: mrc
  tomo_glob: tomograms/{run_name}_*.rec
  tomo_voxel_size: ''
  run_glob: tomograms/*.rec
  run_regex: .*
  run_name_regex: (.*)_sq_df_sorted.rec
  run_data_map_file: run_data_map.csv

@uermel
Copy link
Contributor Author

uermel commented Mar 22, 2024

@jgadling

In creating this for the short term I have some thoughts on how we import point annotations:

  • ideally the voxel_spacing should be available to the Importer.load method so point coordinates can be scaled by the voxel_size should they be given in Angstrom
  • we should consolidate all point importers and maybe follow the pattern I suggest in this PR (a dataclass defining the point format)

@uermel uermel requested a review from jgadling March 22, 2024 15:35
@uermel uermel merged commit 51e6a6f into main Mar 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants