Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured data operators for serializing tablerows, triples and keyvalue pairs added #589

Merged
merged 5 commits into from
Feb 26, 2024

Conversation

csrajmohan
Copy link
Collaborator

  • Added new serialization operators for structured data: Table rows, Triples & KeyValue pairs
  • Renamed table_operators -> struct_data_operators so that it best captures all structured data operators like the ones for Tables, Triples etc. and makes better sense to add other structured data operators to this in future.
  • Added 3 new datatask cards to validate new operators
  • wiki_bio : KeyValue Pair to Text(Data2Text task), Uses KeyValuePairs Serializer
  • dart : Triples to Text(Data2Text task), Uses TriplesSerializer
  • tablerow_classify : Table row classification using a Kaggle dataset(Heart disease prediction), Uses TableRowSerializer

MapInstanceValues(mappers={"label": {"0": "Normal", "1": "Heart Disease"}}),
AddFields(
fields={
"text_type": "Person",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"text_type": "Person",
"text_type": "person medical record",

I think it is more descrptive, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -3,7 +3,7 @@

add_to_catalog(
FormTask(
inputs=["input"],
inputs=["input", "type_of_input"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe would be also good to have type of output?

Suggested change
inputs=["input", "type_of_input"],
inputs=["input", "type_of_input", "type_of_output"],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really good. I really appreciate that you use the existing tasks and even improve them!! Go over my comment and ask for review when done and we can get it merged!

@elronbandel
Copy link
Member

In case its needed: the test is currently failing because of import from unitxt rather then src.unitxt

@csrajmohan
Copy link
Collaborator Author

import for test file fixed

Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 95.68966% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 88.46%. Comparing base (ee1294f) to head (493ad02).

Files Patch % Lines
src/unitxt/struct_data_operators.py 92.42% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #589      +/-   ##
==========================================
+ Coverage   88.38%   88.46%   +0.08%     
==========================================
  Files          87       87              
  Lines        7698     7779      +81     
==========================================
+ Hits         6804     6882      +78     
- Misses        894      897       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@elronbandel elronbandel enabled auto-merge (rebase) February 26, 2024 12:21
Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution!

@elronbandel elronbandel merged commit 3284d41 into main Feb 26, 2024
6 of 7 checks passed
@csrajmohan csrajmohan deleted the structured_data_operators branch February 29, 2024 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants