Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation update: tutorial for text classification models comparison #2426

Merged

Conversation

embonhomme
Copy link
Contributor

Description

Context: #2068
In this PR a new tutorial is added: model-comparison for text classification. It is the follow up of the work done during PyConFr in Bordeaux.

Closes #2068

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)
  • Documentation update

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • Test A
  • Test B

Checklist

  • I have merged the original branch into my forked branch
  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

@embonhomme embonhomme marked this pull request as draft February 27, 2023 13:15
@embonhomme
Copy link
Contributor Author

This PR is a WIP, because I haven't figured out how to add the notebook in docs/source/tutorials

@dvsrepo
Copy link
Member

dvsrepo commented Mar 1, 2023

Hi @embonhomme this is super cool and useful!!

In order to make it even more useful, would it be possible to use SetFit's zeroshot model instead of the fewshot classy? We've just published a tutorial to show how easy is to use SetFit and many people are asking about comparisons with the zeroshot HF pipeline, so this tutorial would be awesome and make for a better comparison? https://docs.argilla.io/en/latest/tutorials/notebooks/labelling-textclassification-setfit-zeroshot.html#%F0%9F%94%AB-Zero-shot-predictions-with-SetFit

We'd be happy to walk you through if you have questions.

@embonhomme
Copy link
Contributor Author

Hello @dvsrepo :) Thank you for the feedback, you can find in the new commit the comparaison with SetFit zero-shot.
Tell me if it is relevant.

@dvsrepo dvsrepo changed the base branch from develop to main March 6, 2023 20:56
@dvsrepo dvsrepo changed the base branch from main to develop March 6, 2023 20:56
@dvsrepo
Copy link
Member

dvsrepo commented Mar 6, 2023

This is looking just perfect!

The only one remaining change would be to review the remaining mentions of few-shot and classy-classification and replace them with zero-shot and SetFit. Then we are good to go!

We'd love to share this next week via LinkedIn and Twitter, if you'd like us to mention you as the author, send me an email to daniel @ argilla.io

@dvsrepo dvsrepo self-requested a review March 6, 2023 21:00
@davidberenstein1957
Copy link
Member

@embonhomme Awesome, look great to me too:)

@embonhomme embonhomme marked this pull request as ready for review March 7, 2023 11:15
@embonhomme
Copy link
Contributor Author

Thank you! Yes sorry I totally forgot to change the description part. It should be better now.
Also here I have just added a Jupyter Notebook, I didn't figure out how it works with the modal.md, dvc.md,...

I will send you an email with my LinkedIn :)

@embonhomme
Copy link
Contributor Author

Thank you, I did the integration :)

Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @embonhomme could you rename everything to monitoring-textclassification-setfit-explainability. After that, everything should be fine:) Also, did you want to participate in the LinkedIn shoutout and our community program w.r.t. offsetting? https://www.argilla.io/blog/introducing-argilla-community-growers/

@embonhomme
Copy link
Contributor Author

Hi @davidberenstein1957 I renamed everything :)
Yes, I would like to participate in the LinkedIn shoutout!

@davidberenstein1957
Copy link
Member

Lovely!

@davidberenstein1957 davidberenstein1957 removed the request for review from dvsrepo March 21, 2023 06:40
@davidberenstein1957 davidberenstein1957 merged commit ae2d65b into argilla-io:develop Mar 21, 2023
@frascuchon frascuchon mentioned this pull request Mar 21, 2023
frascuchon added a commit that referenced this pull request Mar 22, 2023
## [1.5.0](v1.4.0...v1.5.0) -
2023-03-21

### Added

- Add the fields to retrieve when loading the data from argilla.
`rg.load` takes too long because of the vector field, even when users
don't need it. Closes
[#2398](#2398)
- Add new page and components for dataset settings. Closes
[#2442](#2003)
- Add ability to show image in records (for TokenClassification and
TextClassification) if an URL is passed in metadata with the key
\_image_url
- Non-searchable fields support in metadata.
[#2570](#2570)

### Changed

- Labels are now centralized in a specific vuex ORM called GlobalLabel
Model, see #2210. This model
is the same for TokenClassification and TextClassification (so both task
have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels
[#2339](#2339) have been moved
to the vuex ORM in dataset settings feature
[#2444](eb37c3b)
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default.
[#2581](#2581)

### Fixes

- Allow URL to be clickable in Jupyter notebook again. Closes
[#2527](#2527)

### Removed

- Removing some data scan deprecated endpoints used by old clients. This
change will break compatibility with client `<v1.3.0`
- Stop using old scan deprecated endpoints in python client. This logic
will break client compatibility with server version `<1.3.0`
- Remove the previous way to add labels through the dataset page. Now
labels can be added only through dataset settings page.



### As always, thanks to our amazing contributors!
- Documentation update: tutorial for text classification models
comparison (#2426) by @embonhomme
- Docs: fix little typo (#2522) by @anakin87
- Docs: Tutorial on image classification (#2420) by @burtenshaw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: write tutorial about model comparison
3 participants