Modelzoo

With the help of UER, we pre-trained models of different properties (for example, models based on different corpora, encoders, and targets). All pre-trained weights introduced in this section are in UER format and can be loaded by UER directly. More pre-trained weights will be released in the near future. Unless otherwise noted, Chinese pre-trained models use BERT tokenizer and models/google_zh_vocab.txt as vocabulary (which is used in original BERT project). models/bert/base_config.json is used as configuration file in default. Commonly-used vocabulary and configuration files are included in models/ folder and users do not need to download them. In addition, We use scripts/convert_xxx_from_uer_to_huggingface.py to convert pre-trained weights into format that Huggingface Transformers supports, and upload them to Huggingface model hub (uer). In the rest of the section, we provide download links of pre-trained weights and the right ways of using them. Notice that, for space constraint, more details of a pre-trained weight are discussed in corresponding Huggingface model hub. We will provide the link of Huggingface model hub when we introduce the pre-trained weight.

Chinese RoBERTa Pre-trained Weights

This is the set of 24 Chinese RoBERTa weights. CLUECorpusSmall is used as training corpus. Configuration files are in models/bert/ folder. We only provide configuration files for Tiny，Mini，Small，Medium，Base，and Large models. To load other models, we need to modify emb_size，feedforward_size，hidden_size，heads_num，layers_num in the configuration file. Notice that emb_size = emb_size, feedforward_size = 4 * hidden_size, heads_num = hidden_size / 64 . More details of these pre-trained weights are discussed here.

The pre-trained Chinese weight links of different layers (L) and hidden sizes (H):

	H=128	H=256	H=512	H=768
L=2	2/128 (Tiny)	2/256	2/512	2/768
L=4	4/128	4/256 (Mini)	4/512 (Small)	4/768
L=6	6/128	6/256	6/512	6/768
L=8	8/128	8/256	8/512 (Medium)	8/768
L=10	10/128	10/256	10/512	10/768
L=12	12/128	12/256	12/512	12/768 (Base)

Take the Tiny weight as an example, we download the Tiny weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt --vocab_path models/google_zh_vocab.txt \
                      --dataset_path dataset.pt --processes_num 8 --data_processor mlm

python3 pretrain.py --dataset_path dataset.pt --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                    --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json \
                    --output_model_path models/output_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --batch_size 64 \
                    --data_processor mlm --target mlm

or use it on downstream classification dataset：

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt --config_path models/bert/tiny_config.json \
                                   --train_path datasets/douban_book_review/train.tsv \
                                   --dev_path datasets/douban_book_review/dev.tsv \
                                   --test_path datasets/douban_book_review/test.tsv \
                                   --learning_rate 3e-4 --epochs_num 8 --batch_size 64

In fine-tuning stage, pre-trained models of different sizes usually require different hyper-parameters. The example of using grid search to find best hyper-parameters:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_roberta_tiny_seq512_model.bin \
                                        --vocab_path models/google_zh_vocab.txt \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/douban_book_review/train.tsv \
                                        --dev_path datasets/douban_book_review/dev.tsv \
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We can reproduce the experimental results reported here through above grid search script.

Chinese word-based RoBERTa Pre-trained Weights

This is the set of 5 Chinese word-based RoBERTa weights. CLUECorpusSmall is used as training corpus. Configuration files are in models/bert/ folder. Google sentencepiece is used as tokenizer tool and models/cluecorpussmall_spm.model is used as sentencepiece model. Most Chinese pre-trained weights are based on Chinese character. Compared with character-based models, word-based models are faster (because of shorter sequence length) and have better performance according to our experimental results. More details of these pre-trained weights are discussed here

The pre-trained Chinese weight links of different sizes:

Link
L=2/H=128 (Tiny)
L=4/H=256 (Mini)
L=4/H=512 (Small)
L=8/H=512 (Medium)
L=12/H=768 (Base)

Take the word-based Tiny weight as an example, we download the word-based Tiny weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt --spm_model_path models/cluecorpussmall_spm.model \
                      --dataset_path dataset.pt --processes_num 8 --data_processor mlm

python3 pretrain.py --dataset_path dataset.pt --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                    --spm_model_path models/cluecorpussmall_spm.model --config_path models/bert/tiny_config.json \
                    --output_model_path models/output_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 5000 --save_checkpoint_steps 2500 --batch_size 64 \
                    --data_processor mlm --target mlm

or use it on downstream classification dataset：

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                                   --spm_model_path models/cluecorpussmall_spm.model \
                                   --config_path models/bert/tiny_config.json \
                                   --train_path datasets/douban_book_review/train.tsv \
                                   --dev_path datasets/douban_book_review/dev.tsv \
                                   --test_path datasets/douban_book_review/test.tsv \
                                   --learning_rate 3e-4 --epochs_num 8 --batch_size 64

The example of using grid search to find best hyper-parameters for word-based model:

python3 finetune/run_classifier_grid.py --pretrained_model_path models/cluecorpussmall_word_roberta_tiny_seq512_model.bin \
                                        --spm_model_path models/cluecorpussmall_spm.model \
                                        --config_path models/bert/tiny_config.json \
                                        --train_path datasets/douban_book_review/train.tsv \
                                        --dev_path datasets/douban_book_review/dev.tsv
                                        --learning_rate_list 3e-5 1e-4 3e-4 --epochs_num_list 3 5 8 --batch_size_list 32 64

We can reproduce the experimental results reported here through above grid search script.

Chinese GPT-2 Pre-trained Weights

This is the set of Chinese GPT-2 pre-trained weights. Configuration files are in models/gpt2/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained GPT-2 weights:

Model link	Description link
CLUECorpusSmall GPT-2	https://huggingface.co/uer/gpt2-chinese-cluecorpussmall
CLUECorpusSmall GPT-2-distil	https://huggingface.co/uer/gpt2-distil-chinese-cluecorpussmall
Poem GPT-2	https://huggingface.co/uer/gpt2-chinese-poem
Couplet GPT-2	https://huggingface.co/uer/gpt2-chinese-couplet
Lyric GPT-2	https://huggingface.co/uer/gpt2-chinese-lyric
Ancient GPT-2	https://huggingface.co/uer/gpt2-chinese-ancient

Notice that extended vocabularies (models/google_zh_poem_vocab.txt and models/google_zh_ancient_vocab.txt) are used in Poem and Ancient GPT-2 models. CLUECorpusSmall GPT-2-distil model uses models/gpt2/distil_config.json configuration file. models/gpt2/config.json are used for other weights.

Take the CLUECorpusSmall GPT-2-distil weight as an example, we download the CLUECorpusSmall GPT-2-distil weight through the above link and put it in models/ folder. We can either conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_vocab.txt \
                      --dataset_path dataset.pt --processes_num 8 \
                      --seq_length 128 --data_processor lm 

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                    --vocab_path models/google_zh_vocab.txt \
                    --config_path models/gpt2/distil_config.json \
                    --output_model_path models/book_review_gpt2_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-5 --batch_size 64

or use it on downstream classification dataset：

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                                   --vocab_path models/google_zh_vocab.txt \
                                   --config_path models/gpt2/distil_config.json \
                                   --train_path datasets/douban_book_review/train.tsv \
                                   --dev_path datasets/douban_book_review/dev.tsv \
                                   --test_path datasets/douban_book_review/test.tsv \
                                   --learning_rate 3e-5 --epochs_num 8 --batch_size 64

GPT-2 model can be used for text generation. First of all, we create story_beginning.txt and enter the beginning of the text. Then we use scripts/generate_lm.py to do text generation:

python3 scripts/generate_lm.py --load_model_path models/cluecorpussmall_gpt2_distil_seq1024_model.bin \
                               --vocab_path models/google_zh_vocab.txt \
                               --config_path models/gpt2/distil_config.json \
                               --test_path story_beginning.txt --prediction_path story_full.txt \
                               --seq_length 128

Chinese ALBERT Pre-trained Weights

This is the set of Chinese ALBERT pre-trained weights. Configuration files are in models/albert/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained ALBERT weights:

Model link	Description link
CLUECorpusSmall ALBERT-base	https://huggingface.co/uer/albert-base-chinese-cluecorpussmall
CLUECorpusSmall ALBERT-large	https://huggingface.co/uer/albert-large-chinese-cluecorpussmall

Take the CLUECorpusSmall ALBERT-base weight as an example, we download the CLUECorpusSmall ALBERT-base weight through the above link and put it in models/ folder. The example of using ALBERT-base on downstream dataset:

python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_albert_base_seq512_model.bin \
                                   --vocab_path models/google_zh_vocab.txt --config_path models/albert/base_config.json \
                                   --train_path datasets/douban_book_review/train.tsv \
                                   --dev_path datasets/douban_book_review/dev.tsv \
                                   --test_path datasets/douban_book_review/test.tsv \
                                   --learning_rate 2e-5 --epochs_num 3 --batch_size 64

python3 inference/run_classifier_infer.py --load_model_path models/finetuned_model.bin \
                                          --vocab_path models/google_zh_vocab.txt \
                                          --config_path models/albert/base_config.json \
                                          --test_path datasets/douban_book_review/test_nolabel.tsv \
                                          --prediction_path datasets/douban_book_review/prediction.tsv \
                                          --labels_num 2

Chinese T5 Pre-trained Weights

This is the set of Chinese T5 pre-trained weights. Configuration files are in models/t5/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained T5 weights:

Model link	Description link
CLUECorpusSmall T5-small	https://huggingface.co/uer/t5-small-chinese-cluecorpussmall
CLUECorpusSmall T5-base	https://huggingface.co/uer/t5-base-chinese-cluecorpussmall

Take the CLUECorpusSmall T5-small weight as an example, we download the CLUECorpusSmall T5-small weight through the above link and put it in models/ folder. We can conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_with_sentinel_vocab.txt \
                      --dataset_path dataset.pt \
                      --processes_num 8 --seq_length 128 \
                      --dynamic_masking --data_processor t5

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_t5_small_seq512_model.bin \
                    --vocab_path models/google_zh_with_sentinel_vocab.txt \
                    --config_path models/t5/small_config.json \
                    --output_model_path models/book_review_t5_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-4 --batch_size 64 \
                    --span_masking --span_geo_prob 0.3 --span_max_length 5

or use it on downstream dataset：

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_t5_small_seq512_model.bin \
                                  --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                  --config_path models/t5/small_config.json \
                                  --train_path datasets/tnews_text2text/train.tsv \
                                  --dev_path datasets/tnews_text2text/dev.tsv \
                                  --seq_length 128 --tgt_seq_length 8 --learning_rate 3e-4 --epochs_num 3 --batch_size 32

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                         --config_path models/t5/small_config.json \
                                         --test_path datasets/tnews_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/tnews_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8 --batch_size 32

Users can download tnews dataset of text2text format from here.

Chinese T5-v1_1 Pre-trained Weights

This is the set of Chinese T5-v1_1 pre-trained weights. Configuration files are in models/t5-v1_1/ folder.

The link and detailed description (Huggingface model hub) of different pre-trained T5-v1_1 weights:

Model link	Description link
CLUECorpusSmall T5-v1_1-small	https://huggingface.co/uer/t5-v1_1-small-chinese-cluecorpussmall
CLUECorpusSmall T5-v1_1-base	https://huggingface.co/uer/t5-v1_1-base-chinese-cluecorpussmall

Take the CLUECorpusSmall T5-v1_1-small weight as an example, we download the CLUECorpusSmall T5-v1_1-small weight through the above link and put it in models/ folder. We can conduct further pre-training upon it:

python3 preprocess.py --corpus_path corpora/book_review.txt \
                      --vocab_path models/google_zh_with_sentinel_vocab.txt \
                      --dataset_path dataset.pt \
                      --processes_num 8 --seq_length 128 \
                      --dynamic_masking --data_processor t5

python3 pretrain.py --dataset_path dataset.pt \
                    --pretrained_model_path models/cluecorpussmall_t5-v1_1_small_seq512_model.bin \
                    --vocab_path models/google_zh_with_sentinel_vocab.txt \
                    --config_path models/t5-v1_1/small_config.json \
                    --output_model_path models/book_review_t5-v1_1_model.bin \
                    --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
                    --total_steps 10000 --save_checkpoint_steps 5000 --report_steps 1000 \
                    --learning_rate 5e-4 --batch_size 64 \
                    --span_masking --span_geo_prob 0.3 --span_max_length 5

or use it on downstream dataset：

python3 finetune/run_text2text.py --pretrained_model_path models/cluecorpussmall_t5-v1_1_small_seq512_model.bin \
                                  --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                  --config_path models/t5-v1_1/small_config.json \
                                  --train_path datasets/tnews_text2text/train.tsv \
                                  --dev_path datasets/tnews_text2text/dev.tsv \
                                  --seq_length 128 --tgt_seq_length 8 --learning_rate 3e-4 --epochs_num 3 --batch_size 32

python3 inference/run_text2text_infer.py --load_model_path models/finetuned_model.bin \
                                         --vocab_path models/google_zh_with_sentinel_vocab.txt \
                                         --config_path models/t5-v1_1/small_config.json \
                                         --test_path datasets/tnews_text2text/test_nolabel.tsv \
                                         --prediction_path datasets/tnews_text2text/prediction.tsv \
                                         --seq_length 128 --tgt_seq_length 8 --batch_size 32

PEGASUS Pre-trained Weights

This is PEGASUS pre-trained weights. Configuration files are in models/pegasus/ folder.

The link and detailed description (Huggingface model hub) of PEGASUS weights:

Model link	Description link
CLUECorpusSmall PEGASUS-base	https://huggingface.co/uer/pegasus-base-chinese-cluecorpussmall

BART Pre-trained Weights

This is BART pre-trained weights. Configuration files are in models/bart/ folder.

The link and detailed description (Huggingface model hub) of BART weights:

Model link	Description link
CLUECorpusSmall BART-base	https://huggingface.co/uer/bart-base-chinese-cluecorpussmall

Fine-tuned Chinese RoBERTa Weights

This is the set of fine-tuned Chinese RoBERTa weights. All of them use models/bert/base_config.json configuration file.

The link and detailed description (Huggingface model hub) of different fine-tuned RoBERTa weights:

Model link	Description link
JD full sentiment classification	https://huggingface.co/uer/roberta-base-finetuned-jd-full-chinese
JD binary sentiment classification	https://huggingface.co/uer/roberta-base-finetuned-jd-binary-chinese
Dianping sentiment classification	https://huggingface.co/uer/roberta-base-finetuned-dianping-chinese
Ifeng news topic classification	https://huggingface.co/uer/roberta-base-finetuned-ifeng-chinese
Chinanews news topic classification	https://huggingface.co/uer/roberta-base-finetuned-chinanews-chinese
CLUENER2020 NER	https://huggingface.co/uer/roberta-base-finetuned-cluener2020-chinese
Extractive QA	https://huggingface.co/uer/roberta-base-chinese-extractive-qa

One can load these pre-trained models for pre-training, fine-tuning, and inference.

Chinese Pre-trained Weights Besides Transformer

This is the set of pre-trained weights besides Transformer.

The link and detailed description of different pre-trained weights:

Model link	Configuration file	Model details	Training details
CLUECorpusSmall LSTM language model	models/rnn_config.json	--embedding word --remove_embedding_layernorm --encoder lstm --target lm	steps: 500000 learning rate: 1e-3 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall GRU language model	models/rnn_config.json	--embedding word --remove_embedding_layernorm --encoder gru --target lm	steps: 500000 learning rate: 1e-3 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall GatedCNN language model	models/gatedcnn_9_config.json	--embedding word --remove_embedding_layernorm --encoder gatedcnn --target lm	steps: 500000 learning rate: 1e-4 batch size: 64*8 (the number of GPUs) sequence length: 256
CLUECorpusSmall ELMo	models/birnn_config.json	--embedding word --remove_embedding_layernorm --encoder bilstm --target bilm	steps: 500000 learning rate: 5e-4 batch size: 64*8 (the number of GPUs) sequence length: 256

Chinese Pre-trained Weights from Other Organizations

Model link	Description	Description link
Google Chinese BERT-Base	Configuration file: models/bert/base_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/google-research/bert
Google Chinese ALBERT-Base	Configuration file: models/albert/base_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/google-research/albert
Google Chinese ALBERT-Large	Configuration file: models/albert/large_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/google-research/albert
Google Chinese ALBERT-Xlarge	Configuration file: models/albert/xlarge_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/google-research/albert
Google Chinese ALBERT-Xxlarge	Configuration file: models/albert/xxlarge_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/google-research/albert
HFL Chinese BERT-wwm	Configuration file: models/bert/base_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese BERT-wwm-ext	Configuration file: models/bert/base_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese RoBERTa-wwm-ext	Configuration file: models/bert/base_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/ymcui/Chinese-BERT-wwm
HFL Chinese RoBERTa-wwm-large-ext	Configuration file: models/bert/large_config.json Vocabulary: models/google_zh_vocab.txt Tokenizer: BertTokenizer	https://github.com/ymcui/Chinese-BERT-wwm

More pre-trained Weights

Models pre-trained by UER:

Pre-trained model	Link	Description
Wikizh(word-based)+BertEncoder+BertTarget	Model: https://share.weiyun.com/5s4HVMi Vocab: https://share.weiyun.com/5NWYbYn	Word-based BERT model pre-trained on Wikizh. Training steps: 500,000
RenMinRiBao+BertEncoder+BertTarget	https://share.weiyun.com/5JWVjSE	The training corpus is news data from People's Daily (1946-2017).
Webqa2019+BertEncoder+BertTarget	https://share.weiyun.com/5HYbmBh	The training corpus is WebQA, which is suitable for datasets related with social media, e.g. LCQMC and XNLI. Training steps: 500,000
Weibo+BertEncoder+BertTarget	https://share.weiyun.com/5ZDZi4A	The training corpus is Weibo.
Weibo+BertEncoder(large)+MlmTarget	https://share.weiyun.com/CFKyMkp3	The training corpus is Weibo. The configuration file is bert_large_config.json
Reviews+BertEncoder+MlmTarget	https://share.weiyun.com/tBgaSx77	The training corpus is reviews.
Reviews+BertEncoder(large)+MlmTarget	https://share.weiyun.com/hn7kp9bs	The training corpus is reviews. The configuration file is bert_large_config.json
MixedCorpus+BertEncoder(xlarge)+MlmTarget	https://share.weiyun.com/J9rj9WRB	Pre-trained on mixed large Chinese corpus. The configuration file is bert_xlarge_config.json
MixedCorpus+BertEncoder(xlarge)+BertTarget(WWM)	https://share.weiyun.com/UsI0OSeR	Pre-trained on mixed large Chinese corpus. The configuration file is bert_xlarge_config.json
MixedCorpus+BertEncoder(large)+MlmTarget	https://share.weiyun.com/5G90sMJ	Pre-trained on mixed large Chinese corpus. The configuration file is bert_large_config.json
MixedCorpus+BertEncoder(base)+BertTarget	https://share.weiyun.com/5QOzPqq	Pre-trained on mixed large Chinese corpus. The configuration file is bert_base_config.json
MixedCorpus+BertEncoder(small)+BertTarget	https://share.weiyun.com/fhcUanfy	Pre-trained on mixed large Chinese corpus. The configuration file is bert_small_config.json
MixedCorpus+BertEncoder(tiny)+BertTarget	https://share.weiyun.com/yXx0lfUg	Pre-trained on mixed large Chinese corpus. The configuration file is bert_tiny_config.json
MixedCorpus+GptEncoder+LmTarget	https://share.weiyun.com/51nTP8V	Pre-trained on mixed large Chinese corpus. Training steps: 500,000 (with sequence lenght of 128) + 100,000 (with sequence length of 512)
Reviews+LstmEncoder+LmTarget	https://share.weiyun.com/57dZhqo	The training corpus is amazon reviews + JDbinary reviews + dainping reviews (11.4M reviews in total). Language model target is used. It is suitable for datasets related with reviews. It achieves over 5 percent improvements on some review datasets compared with random initialization. Set hidden_size in models/rnn_config.json to 512 before using it. Training steps: 200,000; Sequence length: 128;
(MixedCorpus & Amazon reviews)+LstmEncoder+(LmTarget & ClsTarget)	https://share.weiyun.com/5B671Ik	Firstly pre-trained on mixed large Chinese corpus with LM target. And then is pre-trained on Amazon reviews with lm target and cls target. It is suitable for datasets related with reviews. It can achieve comparable results with BERT on some review datasets. Training steps: 500,000 + 100,000; Sequence length: 128
IfengNews+BertEncoder+BertTarget	https://share.weiyun.com/5HVcUWO	The training corpus is news data from Ifeng website. We use news title to predict news abstract. Training steps: 100,000; Sequence length: 128
jdbinary+BertEncoder+ClsTarget	https://share.weiyun.com/596k2bu	The training corpus is review data from JD (jingdong). CLS target is used for pre-training. It is suitable for datasets related with shopping reviews. Training steps: 50,000; Sequence length: 128
jdfull+BertEncoder+MlmTarget	https://share.weiyun.com/5L6EkUF	The training corpus is review data from JD (jingdong). MLM target is used for pre-training. Training steps: 50,000; Sequence length: 128
Amazonreview+BertEncoder+ClsTarget	https://share.weiyun.com/5XuxtFA	The training corpus is review data from Amazon (including book reviews, movie reviews, and etc.). Classification target is used for pre-training. It is suitable for datasets related with reviews, e.g. accuracy is improved on Douban book review datasets from 87.6 to 88.5 (compared with Google BERT). Training steps: 20,000; Sequence length: 128
XNLI+BertEncoder+ClsTarget	https://share.weiyun.com/5oXPugA	Infersent with BertEncoder

MixedCorpus contains baidubaike, Wikizh, WebQA, RenMinRiBao, literature, and reviews.

Home
主页
- 项目特色
- 依赖环境
- 快速上手
- 预训练数据
- 下游任务数据集
- 预训练模型仓库
- 使用说明
- 竞赛解决方案
  - 中文任务测评基准CLUE
  - SMP2020-EWECT
  - SMP2019-ECISA
  - CCF-BDCI2021-面向黑灰产治理的恶意短信变体字还原
  - 英文任务测评基准GLUE
- 引用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modelzoo

Chinese RoBERTa Pre-trained Weights

Chinese word-based RoBERTa Pre-trained Weights

Chinese GPT-2 Pre-trained Weights

Chinese ALBERT Pre-trained Weights

Chinese T5 Pre-trained Weights

Chinese T5-v1_1 Pre-trained Weights

PEGASUS Pre-trained Weights

BART Pre-trained Weights

Fine-tuned Chinese RoBERTa Weights

Chinese Pre-trained Weights Besides Transformer

Chinese Pre-trained Weights from Other Organizations

More pre-trained Weights

Clone this wiki locally