Get target information for tabular dataset #1471

sooahleex · 2024-04-23T04:41:15Z

Summary

Set target as dictionary to get input and output column. ex. {"input": "question", "output":["rating", "sentiment"]}
If target is None, bring all columns
Set input of target column as media and output of it as annotations
- For example, if we set target as {"input": "length(m)", "output": ["breed_category", "pet_category"]}
  
  And each item for train and test subset is like below.
  
  Test file did not have column for breed_category and pet_category, so test subset did not have any annotations.
Set CategoricalDtype for column which have dtype as object but that could be used as label. For this, we should define the threshold.

How to test

Fix target for unit test and cli test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have added the description of my changes into CHANGELOG.
I have updated the documentation accordingly

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2024 Intel Corporation
#
# SPDX-License-Identifier: MIT

src/datumaro/plugins/data_formats/tabular.py

codecov · 2024-04-23T06:19:15Z

Codecov Report

Attention: Patch coverage is 79.54545% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 80.99%. Comparing base (44cc56a) to head (cc163a6).
Report is 37 commits behind head on develop.

Files	Patch %	Lines
src/datumaro/plugins/data_formats/tabular.py	82.05%	3 Missing and 4 partials ⚠️
src/datumaro/components/media.py	60.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1471      +/-   ##
===========================================
+ Coverage    80.85%   80.99%   +0.14%     
===========================================
  Files          271      272       +1     
  Lines        30689    31242     +553     
  Branches      6197     6294      +97     
===========================================
+ Hits         24815    25306     +491     
- Misses        4489     4519      +30     
- Partials      1385     1417      +32

Flag	Coverage Δ
ubuntu-20.04_Python-3.10	`80.98% <79.54%> (+0.13%)`	⬆️
windows-2022_Python-3.10	`80.97% <79.54%> (+0.14%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wonjuleee

LGTM

sooahleex added 10 commits April 23, 2024 11:30

Match columns

3be024b

Make categorical for tabular type

14a4f02

Set target as input and output

4c38ef8

Update doc for tabular

2958c8e

Update cli for tabular target

1cf34ef

Remove TableCategories comment

fd99efe

Update CHANGELOG

e5dad63

Revert to assertTrue

588d304

Set all columns as media and empty annotation for None target

0c84d9e

Fix target format as string for cli

f0d6a12

sooahleex marked this pull request as ready for review April 23, 2024 05:22

sooahleex requested review from a team as code owners April 23, 2024 05:22

sooahleex requested review from jihyeonyi and removed request for a team April 23, 2024 05:22

Fix doc

152daae

wonjuleee reviewed Apr 23, 2024

View reviewed changes

src/datumaro/plugins/data_formats/tabular.py Show resolved Hide resolved

Add test for string_to_dict function

c12d207

sooahleex added 3 commits April 24, 2024 09:06

Add tests

6dd6249

Fix to use item of train subset

a364143

Update unit test

cc163a6

wonjuleee approved these changes Apr 25, 2024

View reviewed changes

sooahleex merged commit f9a25f5 into openvinotoolkit:develop Apr 25, 2024
7 of 8 checks passed

wonjuleee added this to the 1.7.0 milestone May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get target information for tabular dataset #1471

Get target information for tabular dataset #1471

sooahleex commented Apr 23, 2024 •

edited

Loading

codecov bot commented Apr 23, 2024 •

edited

Loading

wonjuleee left a comment

Get target information for tabular dataset #1471

Get target information for tabular dataset #1471

Conversation

sooahleex commented Apr 23, 2024 • edited Loading

Summary

How to test

Checklist

License

codecov bot commented Apr 23, 2024 • edited Loading

Codecov Report

wonjuleee left a comment

Choose a reason for hiding this comment

sooahleex commented Apr 23, 2024 •

edited

Loading

codecov bot commented Apr 23, 2024 •

edited

Loading