Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add self-training option #287

Open
arthur-thuy opened this issue Apr 30, 2024 · 2 comments
Open

Add self-training option #287

arthur-thuy opened this issue Apr 30, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@arthur-thuy
Copy link
Contributor

Describe the solution you'd like
It would be nice to have an option for self-training. Self-training is related to active learning but gets labels for queries based on its predictions instead of asking an oracle for the ground truth label. As such, only ground truth labels are required for the initial labeled set.

Intuitively, it would make sense to add an argument to the ActiveLearningLoop object.

@arthur-thuy arthur-thuy added the enhancement New feature or request label Apr 30, 2024
@Dref360
Copy link
Member

Dref360 commented May 11, 2024

That's an interesting idea. Would you have some references we could look into it?

It's similar to semi-supervised learning which we have investigated but not really maintained. It became a bit unwieldy in the codebase, but I think we can give it a second try :)

@arthur-thuy
Copy link
Contributor Author

This is a general survey of self-training: Self-Training: A Survey

The survey focuses on more traditional approaches to semi-supervised learning for the acquisition function, such as data clustering and density estimation. It does not discuss uncertainty estimation with approximate Bayesian techniques but is useful to see how self-training aligns with active learning.

Recent approches use self-trainining and epistemic uncertainty for unsupervised domain adaptation (UDA). The UDA methods attempt to reduce the domain shift by adding examples of the shifted target domain to the training set, using pseudo-labels as ground truth labels are often not available.

I think that the implementation and maintenance would be quite limited as self-training is so similar to active learning (less work than the pi-model attempted earlier).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants