Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.
/ KoAirBERT Public archive

๐Ÿค— ํ•ญ๊ณต ์•ˆ์ „ ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด BERT ๋ชจ๋ธ โœˆ๏ธ

License

Notifications You must be signed in to change notification settings

oneonlee/KoAirBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค— KoAirBERT โœˆ๏ธ

ํ•ญ๊ณต ์•ˆ์ „ ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด BERT ๋ชจ๋ธ

Python Hugging Face License: AGPL-v3 DOI

How to use

๐Ÿค— Huggingface Hub์— ์—…๋กœ๋“œ ๋œ ๋ชจ๋ธ์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค :)

# Load model directly
from transformers import AutoTokenizer, AutoModelForPreTraining

tokenizer = AutoTokenizer.from_pretrained("oneonlee/KoAirBERT")
model = AutoModelForPreTraining.from_pretrained("oneonlee/KoAirBERT")

Post-training

KoAirBERT๋Š” klue/bert-base ๋ชจ๋ธ์— MLM ๋ฐ NSP ๋ฐฉ์‹์˜ Post-training์„ ์ถ”๊ฐ€๋กœ ์ˆ˜ํ–‰ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
ํ•™์Šต์—๋Š” ์ง์ ‘ ๊ตฌ์ถ•ํ•œ ํ•œ๊ตญ์–ด ํ•ญ๊ณต์•ˆ์ „ ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ–ˆ๊ณ , NVIDIA RTX A6000 48GB 1์žฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ฝ 40๋ถ„ ์†Œ์š”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ํ•™์Šต ์„ธํŒ…์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Params Initial learning rate Batch Size Epochs Max length* Weight decay
111M 5e-5 8 3 512 0.01

Dataset Info.

๋ฐ์ดํ„ฐ ๋ช… ๋ฐ์ดํ„ฐ ๊ฑด ์ˆ˜ ๋ฌธ์žฅ ์ˆ˜ ๋‹จ์–ด ์ˆ˜
์•ˆ์ „์žฅ์•  ๋ณด๊ณ  ๋ฐ์ดํ„ฐ 684 10,018 69,472
๊ณ ์žฅ ๋ณด๊ณ  ๋ฐ์ดํ„ฐ 1,771 10,480 85,165
ํ•ญ๊ณต์ฒ ๋„์‚ฌ๊ณ ์กฐ์‚ฌ์œ„์›ํšŒ ์‚ฌ๊ณ ยท์ค€์‚ฌ๊ณ ๋ณด๊ณ ์„œ 54 1,850 33,935
ํ•ญ๊ณต์•ˆ์ „๋ฌธํ™”์ง€ํ‘œ ๋ถ„์„ ๋ฐ์ดํ„ฐ 1,055 3,652 244,032
GYRO ํ•ญ๊ณต ์•ˆ์ „ ์ž์œจ ๋ณด๊ณ ์„œ โ€“ ๋ณธ๋ฌธ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ 6,776 66,848 1,214,061
ํ•ญ๊ณต์ •๋ณดํฌํ„ธ์‹œ์Šคํ…œ ํ•ญ๊ณต ์šฉ์–ด์‚ฌ์ „ 4,961 15,312 167,295
ํ•ญ๊ณต์œ„ํ‚ค 4,314 38,927 766,214
----------------------------------------------- -------------- -------- ----------
๋ˆ„์  19,615 147,087 2,580,174

Download Dataset

Reference

Citation

์ด ์ฝ”๋“œ๋ฅผ ์—ฐ๊ตฌ์šฉ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด ์ธ์šฉํ•ด์ฃผ์„ธ์š”.

@software{lee_2023_10158254,
  author       = {Lee, DongGeon},
  title        = {KoAirBERT: Korean BERT Model Specialized for Aviation Safety Domain},
  month        = nov,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v1.0.1},
  doi          = {10.5281/zenodo.10171038},
  url          = {https://doi.org/10.5281/zenodo.10171038}
}

License

KoAirBERT๋Š” AGPL-3.0 ๋ผ์ด์„ ์Šค ํ•˜์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ๋ฐ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋ผ์ด์„ ์Šค ๋‚ด์šฉ์„ ์ค€์ˆ˜ํ•ด์ฃผ์„ธ์š”. ๋ผ์ด์„ ์Šค ์ „๋ฌธ์€ LICENSE ํŒŒ์ผ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

About

๐Ÿค— ํ•ญ๊ณต ์•ˆ์ „ ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด BERT ๋ชจ๋ธ โœˆ๏ธ

Topics

Resources

License

Stars

Watchers

Forks