Skip to content

Latest commit

 

History

History
182 lines (164 loc) · 5.67 KB

representation.md

File metadata and controls

182 lines (164 loc) · 5.67 KB

Representation

Pre-trained GPT Models

Model References Link
VietAI/gpt-j-6B-vietnamese-news 🤗VietAI/gpt-j-6B-vietnamese-news
VietAI/gpt-neo-1.3B-vietnamese-news 🤗VietAI/gpt-neo-1.3B-vietnamese-news
imthanhlv/gpt2news 🤗imthanhlv/gpt2news

Pre-trained Transformer Models

Model References Link
VinAIResearch/BARTpho Tran et al. arXiv preprint'21 🤗vinai/bartpho-syllable
🤗vinai/vinai/bartpho-word
💻VinAIResearch/BARTpho
fpt-corp/viBERT Bui et al. PACLIC'20 🤗FPTAI/vibert-base-cased
💻fpt-corp/viBERT
fpt-corp/vELECTRA Bui et al. PACLIC'20 🤗FPTAI/velectra-base-discriminator-cased
💻fpt-corp/viBERT
VinAIResearch/PhoBERT Nguyen et al. EMNLP Findings'20 🤗vinai/phobert-base
🤗vinai/phobert-large
💻VinAIResearch/PhoBERT
NlpHUST/vibert4news 🤗NlpHUST/vibert4news-base-cased
💻bino282/bert4news
nguyenvulebinh/vietnamese-electra 💻nguyenvulebinh/vietnamese-electra
imthanhlv/imthanhlv/t5vi 🤗imthanhlv/t5vi

Model Descriptions

Model #Params Training Data Domain Tokenization Vocab Size
VinAIResearch/BARTpho 396M (bartpho-syllable)
420M (bartpho-word)
20GB News Word (bartpho-word)
Syllable (bartpho-syllable)
64000
fpt-corp/viBERT 10GB News Subword 38168
VinAIResearch/PhoBERT 135M (phobert-base)
370M (phobert-large)
20GB News Word 64000
NlpHUST/vibert4news 20GB News Syllable 62000

Word Vectors

ViCon & ViSim-400

ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges.

The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets

VSimLex-999

Miscellaneous

📜 Papers