-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mergeback 1.5.1rc2 #1181
Mergeback 1.5.1rc2 #1181
Conversation
openvinotoolkit#1145) - Add multi-threading option (`num_workers > 0`) to `ModelTransform` and `SAMBboxToInstanceMask`. - It is required if the model launcher can take multiple requests at the same time and have high throughput. Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
…it#1149) - One of the tests added in openvinotoolkit#1145 is flaky: https://github.com/openvinotoolkit/datumaro/actions/runs/6156803415/job/16706221640 ```console =========================== short test summary info ============================ FAILED tests/unit/test_util.py::MultiProcUtilTest::test_raise_exception_in_main_thread = 1 failed, 1493 passed, 38 skipped, 2 xfailed, 48148 warnings in 407.34s (0:06:47) = tests-py38-darwin: exit 1 (462.14 seconds) /Users/runner/work/datumaro/datumaro> python -m pytest -v --csv=/Users/runner/work/datumaro/datumaro/.tox/results-tests-py38-darwin.csv tests/unit --cov --cov-report=xml pid=4536 .pkg: _exit> python /Users/runner/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pyproject_api/_backend.py True setuptools.build_meta tests-py38-darwin: FAIL code 1 (793.18=setup[331.04]+cmd[462.14] seconds) evaluation failed :( (803.78 seconds) ``` - This is because `join_timeout` is too short, so that the main thread tries to assert the error logs before they are created. - To fix it, set `join_timeout=None` to wait it infinitely until the producer thread terminates. Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
* update changelog * update release note * update version string
…lkit#1153) - Ticket no. 120785 - Change streaming import logic with DatumPageMapper implemented in Rust | Before | After | | :-: | :-: | | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/0a06ddc0-5256-45b4-af03-e9299b8e61b8) | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/af76210b-8fb5-4b30-aec1-2b5a22856ef7) | Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary Color values in the labelmap.txt should be separated by commas, not colons. <!-- Resolves openvinotoolkit#111 and openvinotoolkit#222. Depends on openvinotoolkit#1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem openvinotoolkit#1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [x] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ```
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary <!-- Resolves openvinotoolkit#111 and openvinotoolkit#222. Depends on openvinotoolkit#1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem openvinotoolkit#1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ```
- Apply fixes to 1.5.1 from openvinotoolkit#1159 and openvinotoolkit#1161 Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com> Co-authored-by: Matěj Šmíd <m@matejsmid.cz> Co-authored-by: Daniil Pastukhov <plus79222222238@gmail.com>
…it#1172) - Update our CI OS from `windows-2019` to `windows-2022` - This is because our CI has problem only on `windows-2019` while cleaning up test directory: https://github.com/openvinotoolkit/datumaro/actions/runs/6549100531/job/17785292362 and https://github.com/openvinotoolkit/datumaro/actions/runs/6557126906/job/17808184193?pr=1169 Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
…#1169) - Ticket no. 122601 - Version up Arrow data format export/import from 1.0 to 2.0 to make them memory bounded | | Before | After | | :-: | :-: | :-: | | export | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/d5641aa7-5c2d-4f3d-899d-01f81cc0a7d1) | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/b0b246a5-9f7a-449a-82d5-2c9893f6bbba) | | import | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/2c395306-5e8f-4813-a60e-afcbd954a66e) | ![image](https://github.com/openvinotoolkit/datumaro/assets/26541465/f38e1e73-e304-4586-a0c4-ad6891bbe37f) | Used the following script for the above experiment. <details> <summary>1. Synthetic data preparation (10000 items with a 224x224 image and a label are exported to Datumaro data format)</summary> ```python import numpy as np from datumaro.components.media import Image from datumaro.components.project import Dataset import os from datumaro.components.dataset_base import DatasetItem from datumaro.components.annotation import Label from datumaro.util.image import encode_image from tempfile import TemporaryDirectory from datumaro.components.progress_reporting import TQDMProgressReporter def fxt_large(test_dir, n=5000) -> Dataset: items = [] for i in range(n): media = None if i % 3 == 0: media = Image.from_numpy(data=np.random.randint(0, 255, (224, 224, 3))) elif i % 3 == 1: media = Image.from_bytes( data=encode_image(np.random.randint(0, 255, (224, 224, 3)), ".png") ) elif i % 3 == 2: Image.from_numpy(data=np.random.randint(0, 255, (224, 224, 3))).save( os.path.join(test_dir, f"test{i}.jpg") ) media = Image.from_file(path=os.path.join(test_dir, f"test{i}.jpg")) items.append( DatasetItem( id=i, subset="test", media=media, annotations=[Label(np.random.randint(0, 3))], ) ) source_dataset = Dataset.from_iterable( items, categories=["label"], media_type=Image, ) return source_dataset if __name__ == "__main__": source_dir = "source" os.makedirs(source_dir, exist_ok=True) with TemporaryDirectory() as test_dir: source = fxt_large(test_dir, n=10000) reporter = TQDMProgressReporter() source.export( source_dir, format="datumaro", save_media=True, progress_reporter=reporter, ) ``` </details> <details> <summary>2. Export 10000 items to Arrow data format</summary> ```python import shutil import os from datumaro.components.progress_reporting import TQDMProgressReporter from datumaro.components.dataset import StreamDataset if __name__ == "__main__": source_dir = "source" source = StreamDataset.import_from(source_dir, format="datumaro") export_dir = "export" if os.path.exists(export_dir): shutil.rmtree(export_dir) reporter = TQDMProgressReporter() source.export( export_dir, format="arrow", save_media=True, max_shard_size=1000, progress_reporter=reporter, ) ``` </details> <details> <summary>3. Import 10000 items in the Arrow data format </summary> ```python import pyarrow as pa from random import shuffle from datumaro.components.progress_reporting import TQDMProgressReporter from time import time from datumaro.components.dataset import Dataset import memory_profiler import shutil if __name__ == "__main__": source_dir = "source" dst_dir = "source.backup" shutil.move(source_dir, dst_dir) export_dir = "export" reporter = TQDMProgressReporter() start = time() dataset = Dataset.import_from(export_dir, format="arrow", progress_reporter=reporter) keys = [(item.id, item.subset) for item in dataset] shuffle(keys) for item_id, subset in keys: item = dataset.get(item_id, subset) img_data = item.media.data dt = time() - start print(f"dt={dt:.2f}") print(memory_profiler.memory_usage()[0]) print(pa.total_allocated_bytes()) shutil.move(dst_dir, source_dir) ``` </details> Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
- Currently, there is discrepancy between the return image data types: `ImageFromBytes.data` (`np.float32`), `ImageFromNumpy.data` (`np.float32`), and `ImageFromFile.data` (`np.uint8`). - This makes the data loader based on the Arrow data format (using `ImageFromBytes.data`) slower since the image preprocessing will be conducted on the `np.float32` data (4x larger than `np.uint8`). - This PR forces `np.uint8` data to be returned for all `Image` classes. Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary <!-- Resolves openvinotoolkit#111 and openvinotoolkit#222. Depends on openvinotoolkit#1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem openvinotoolkit#1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [x] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ```
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #1181 +/- ##
===========================================
- Coverage 80.11% 80.08% -0.03%
===========================================
Files 268 267 -1
Lines 30093 29828 -265
Branches 5916 5846 -70
===========================================
- Hits 24108 23889 -219
+ Misses 4622 4603 -19
+ Partials 1363 1336 -27
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
64a7aa1
to
5c5cc94
Compare
5c5cc94
to
8faca62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.