You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Take AG News dataset as an example, run_agnews_finetune.sh is used for running the experiment of standard active learning approaches, and run_agnews_finetune.sh is used for running active self-training experiments as unlabeled data is also used during fine-tuning.
You use the same file name run_agnews_finetune.sh twice. Which one is your paper's method?
For question #1, run_agnews.sh is used as our main method (active self-training). We will modify the README to avoid confusion.
For question #2, pool is the size of unlabeled data used in self-training. In self-training, we often do not use the whole unlabeled data as many pseudo-labels may contain noise. A common solution is to first select a subset of data with low uncertainty (pool is the size for such a subset), and fine-tune the pretrained language model on the subset (together with the labeled data) only. Hope these explanations help.
Overall we tune this parameter based on the performance of the validation set.
If there is no validation set, we recommend gradually (linearly) increasing the number of unlabeled examples to around 50% of the size of the unlabeled pool.
Hi, Yu, I don't understand some descriptions in this project ReadMe.
You use the same file name
run_agnews_finetune.sh
twice. Which one is your paper's method?pool
stands for what?Thank you!
The text was updated successfully, but these errors were encountered: