SleepingMonster / Chinese-text-classification-pytorch-bert Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

Chinese text classification (include single-label and multi-label version), using pytorch & BERT

2 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
README.md		README.md
config.py		config.py
data.py		data.py
model.py		model.py
mytask_classifier.py		mytask_classifier.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run.sh		run.sh
util.py		util.py

Repository files navigation

Chinese-text-classification-pytorc-bert

Chinese text classification (include single-label and multi-label version), using pytorch & BERT

中文文本分类任务（含单标签和多标签分类）。

PyTorch实现，BERT框架，CPU/多卡GPU版本。

Readme

中文版本

实验环境

pytorch 1.4.0版本、python 3.6版本、pytorch_pretrained_bert 0.6.2版本。【严格匹配！】
argparse库、pandas库、glob库、sklearn库、numpy库（详见requirements.txt）。

实验工具

Pycharm

文件组织

data文件夹：
- Hotel_comment文件夹：酒店评论，二分类；
- cnews文件夹：新闻文本，多分类；
mytask_classifier.py：入口文件
config.py：包含运行时所需参数的定义，参数可通过run.sh脚本文件赋值
data.py：包含对原数据集的处理，形成结构化数据集
model.py：包含使用BERT实现文本分类的模型代码，含有单标签/多标签两种实现；
preprocess.py：包含BERT的输入预处理
util.py：包含有用函数
run.sh：脚本文件，可在Linux下运行，包含参数的赋值

English Version

Environments

pytorch == 1.4.0, python == 3.6, pytorch_pretrained_bert == 0.6.2 (Version needs to match exactly!)
argparse, pandas, glob, sklearn, numpy（Please refer to requirements.txt）.

IDE

Pycharm

File structure

data folder:
- Hotel_comment folder: Chinese comment of a hotel，binary classification task;
- cnews folder: Chinese news paragraph，multi classification task;
mytask_classifier.py: entrance file
config.: includes the definitions of parameters required at runtime, parameters can be assigned through the run.sh script file
data.py: includes the processing of the original data set to form a structured data set
model.py：includes the model code for text classification with BERT, which contains the single-label/multi-label implementations;
preprocess.py：includes the input preprocessing for BERT
util.py：includes some useful functions
run.sh：script file, runnable under Linux, containing parameter assignments

About

Chinese text classification (include single-label and multi-label version), using pytorch & BERT

Report repository

Releases

No releases published

Packages

No packages published

Languages