Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLOps 1.5] Expand the built-ins: NLP #865

Merged
merged 12 commits into from
Jan 20, 2023
Merged

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jan 16, 2023

Feature description

  • Added built-in functions: 'mean_score, percent_zero_labels (token classification) answer_length, answer_score, answer_found (question answering), sequence_length (general nlp function)`.
  • Added exhaustive unit tests.
  • Cleaned up the importing structure to avoid circular dependencies

Coverage:

As specified in the PRD:

nlp_sequence_length -->  covered by "sequence_length "
nlp_percent_unk_tokens --> not covered; see comments below
nlp_answer_found_qa - 1 if yes, 0 if no --> covered by "answer_found"
nlp_answer_length_qa - integer length --> covered by "answer_length"
nlp_score_qa - 0-1 float representing confidence in answer --> covered by "answer_score"
nlp_percent_label_0_token_classification - 0-1 float --> covered by percent_zero_labels
nlp_mean_score_token_classification - 0-1 float --> covered by  "mean score"

Few Comments:

  1. We still need to think about a good naming framework, to avoid ambiguity and possible dupes of the functions naming across integrations. Let's discuss it.
  2. For computing the percentage of the unknown tokens, I am afraid that currently we are heavily constrained. This information is present during the tokenization process but is not available inside the engine_inputs. Hence, we are currently not able to quickly compute this value in a simple, monadic function.

@dbogunowicz dbogunowicz marked this pull request as ready for review January 17, 2023 08:23
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now, but absolutely need to get something in place to handle item-wise functions

Copy link
Contributor

@KSGulin KSGulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dbogunowicz dbogunowicz merged commit e4a05c0 into main Jan 20, 2023
@dbogunowicz dbogunowicz deleted the feature/damian/nlp_funcs branch January 20, 2023 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants