Skip to content

Commit

Permalink
[Datumaro] Import Memory optimisation (cvat-ai#8382)
Browse files Browse the repository at this point in the history
<!-- Raise an issue to propose your change
(https://github.com/cvat-ai/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/).
-->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
By using tuple as a container for points when dealing with import from
datumaro, we can achieve 2 things:
- Reduce memory needed for copying shapes and tracks during import
(running `deepcopy` on `tuple[int]` will return the same object, as
opposed to `list[int]`)
- Guarantee type safety during later stages of data pipeline and skip
additional conversion added in cvat-ai#1898

Same thing arguable should be done for CVAT format as well.

Benchmarks:
[memray_reports.zip](https://github.com/user-attachments/files/16849509/memray_reports.zip)


### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [x] I submit my changes into the `develop` branch
- [ ] I have created a changelog fragment <!-- see top comment in
CHANGELOG.md -->
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [ ] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [ ] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning))

### License

- [x] I submit _my code changes_ under the same [MIT License](
https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Improved handling of shape points during the import process for
enhanced data accuracy.
- Centralized conversion of shape points to floats, optimizing memory
usage and performance.

- **Refactor**
- Enhanced code readability and maintainability by restructuring the
point conversion logic.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
Bobronium authored and Bradley Schultz committed Sep 12, 2024
1 parent d1e94b8 commit bcdaf75
Showing 1 changed file with 107 additions and 30 deletions.
137 changes: 107 additions & 30 deletions cvat/apps/dataset_manager/bindings.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
import os.path as osp
import re
import sys
from collections import namedtuple
from functools import reduce
from operator import add
from pathlib import Path
Expand Down Expand Up @@ -36,14 +35,20 @@
JobType, Label, LabelType, Project, SegmentType, ShapeType,
Task)
from cvat.apps.engine.rq_job_handler import RQJobMetaField
from cvat.apps.engine.lazy_list import LazyList

from .annotation import AnnotationIR, AnnotationManager, TrackManager
from .formats.transformations import MaskConverter, EllipsesToMasks
from ..engine.log import ServerLogManager

slogger = ServerLogManager(__name__)

CVAT_INTERNAL_ATTRIBUTES = {'occluded', 'outside', 'keyframe', 'track_id', 'rotation'}

class InstanceLabelData:
Attribute = NamedTuple('Attribute', [('name', str), ('value', Any)])
class Attribute(NamedTuple):
name: str
value: Any

@classmethod
def add_prefetch_info(cls, queryset: QuerySet):
Expand Down Expand Up @@ -188,20 +193,76 @@ def _export_attributes(self, attributes):
return exported_attributes

class CommonData(InstanceLabelData):
Shape = namedtuple("Shape", 'id, label_id') # 3d
LabeledShape = namedtuple(
'LabeledShape', 'type, frame, label, points, occluded, attributes, source, rotation, group, z_order, elements, outside, id')
LabeledShape.__new__.__defaults__ = (0, 0, 0, [], False, None)
TrackedShape = namedtuple(
'TrackedShape', 'type, frame, points, occluded, outside, keyframe, attributes, rotation, source, group, z_order, label, track_id, elements, id')
TrackedShape.__new__.__defaults__ = (0, 'manual', 0, 0, None, 0, [], None)
Track = namedtuple('Track', 'label, group, source, shapes, elements, id')
Track.__new__.__defaults__ = ([], None)
Tag = namedtuple('Tag', 'frame, label, attributes, source, group, id')
Tag.__new__.__defaults__ = (0, None)
Frame = namedtuple(
'Frame', 'idx, id, frame, name, width, height, labeled_shapes, tags, shapes, labels, subset')
Label = namedtuple('Label', 'id, name, color, type')
class Shape(NamedTuple):
id: int
label_id: int

class LabeledShape(NamedTuple):
type: int
frame: int
label: int
points: Sequence[int]
occluded: bool
attributes: Sequence[CommonData.Attribute]
source: str | None
rotation: float = 0
group: int = 0
z_order: int = 0
elements: Sequence[CommonData.LabeledShape] = ()
outside: bool = False
id: int | None = None

class TrackedShape(NamedTuple):
type: int
frame: int
points: Sequence[int]
occluded: bool
outside: bool
keyframe: bool
attributes: Sequence[CommonData.Attribute]
rotation: float = 0
source: str = "manual"
group: int = 0
z_order: int = 0
label: str | None = None
track_id: int = 0
elements: Sequence[CommonData.TrackedShape] = ()
id: int | None = None

class Track(NamedTuple):
label: int
group: int
source: str
shapes: Sequence[CommonData.TrackedShape]
elements: Sequence[int] = ()
id: int | None = None

class Tag(NamedTuple):
frame: int
label: int
attributes: Sequence[CommonData.Attribute]
source: str | None
group: int | None = 0
id: int | None = None

class Frame(NamedTuple):
idx: int
id: int
frame: int
name: str
width: int
height: int
labeled_shapes: Sequence[CommonData.LabeledShape]
tags: Sequence[CommonData.Tag]
shapes: Sequence[CommonData.Shape]
labels: Sequence[CommonData.Label]
subset: str

class Label(NamedTuple):
id: int
name: str
color: str | None
type: str | None

def __init__(self,
annotation_ir,
Expand Down Expand Up @@ -536,12 +597,7 @@ def _import_shape(self, shape, parent_label_id=None):
)
]

# TODO: remove once importers are guaranteed to return correct type
# (see https://github.com/cvat-ai/cvat/pull/8226/files#r1695445137)
points = _shape["points"]
for i, point in enumerate(map(float, points)):
points[i] = point

self._ensure_points_converted_to_floats(_shape)
_shape['elements'] = [self._import_shape(element, label_id) for element in _shape.get('elements', [])]

return _shape
Expand All @@ -567,14 +623,35 @@ def _import_track(self, track, parent_label_id=None):
for attrib in shape['attributes']
if self._get_mutable_attribute_id(label_id, attrib.name)
]
# TODO: remove once importers are guaranteed to return correct type
# (see https://github.com/cvat-ai/cvat/pull/8226/files#r1695445137)
points = shape["points"]
for i, point in enumerate(map(float, points)):
points[i] = point
self._ensure_points_converted_to_floats(shape)

return _track

def _ensure_points_converted_to_floats(self, shape) -> None:
"""
Historically, there were importers that were not converting points to ints/floats.
The only place to make sure that all points in shapes have the right type was this one.
However, this does eat up a lot of memory for some reason.
(see https://github.com/cvat-ai/cvat/pull/1898)
So, before we can guarantee that all the importers are returning the right data,
we have to have this conversion.
"""
# if points is LazyList or tuple, we can be sure it has the right type already
if isinstance(points := shape["points"], LazyList | tuple):
return

for point in points:
if not isinstance(point, int | float):
slogger.glob.error(
f"Points must be type of "
f"`tuple[int | float] | list[int | float] | LazyList`, "
f"not `{points.__class__.__name__}[{point.__class__.__name__}]`"
"Please, update import code to return the correct type."
)
shape["points"] = tuple(map(float, points))
return

def _call_callback(self):
if self._len() > self._MAX_ANNO_SIZE:
self._create_callback(self._annotation_ir.serialize())
Expand Down Expand Up @@ -2063,11 +2140,11 @@ def import_dm_annotations(dm_dataset: dm.Dataset, instance_data: Union[ProjectDa
if ann.type in shapes:
points = []
if ann.type == dm.AnnotationType.cuboid_3d:
points = [*ann.position, *ann.rotation, *ann.scale, 0, 0, 0, 0, 0, 0, 0]
points = (*ann.position, *ann.rotation, *ann.scale, 0, 0, 0, 0, 0, 0, 0)
elif ann.type == dm.AnnotationType.mask:
points = MaskConverter.dm_mask_to_cvat_rle(ann)
points = tuple(MaskConverter.dm_mask_to_cvat_rle(ann))
elif ann.type != dm.AnnotationType.skeleton:
points = ann.points
points = tuple(ann.points)

rotation = ann.attributes.pop('rotation', 0.0)
# Use safe casting to bool instead of plain reading
Expand Down

0 comments on commit bcdaf75

Please sign in to comment.