Skip to content

medBERTjp - MeCab-Unidic-2.3.0

Compare
Choose a tag to compare
@sy-wada sy-wada released this 03 Nov 02:58
· 12 commits to main since this release
ab7a568
  • Japanese Medical BERT model simultaneously pre-trained on both clinical references and Japanese Wikipedia via our method.
  • Vocabulary: custom 32k vocabulary
    - requirements:
    - fugashi
    - unidic-py
  • Pre-training:
    - BERT-Base (12-layer, 768-hidden, 12-heads)
    - trained from scratch with WWM.
    - max_seq_length = 128 tokens
    - global_batch_size = 2,048 sequences
    - learning_rate = 7e-4
    - warmup_proportion = 0.0125
    - training_steps = 125K steps