Skip to content

Extractive summarizer using BertSum as summarization model

Notifications You must be signed in to change notification settings

neilctwu/YouyakuMan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouyakuMan

Unstable License

Introduction

This is an one-touch extractive summarization machine.

using BertSum as summatization model, extract top N important sentences.

img


Prerequisites

General requirement

pip install torch
pip install transformers
pip install googletrans

Japanese specific requirement


Pretrained Model

English: Here

Japanese: Here

  • Japanese model updated: trained with 35k data and 120k iteration

Download and put under directory checkpoint/en or checkpoint/jp


Example

$python youyakuman.py -txt_file YOUR_FILE -lang LANG -n 3 --super_long

Note

Since Bert only takes 512 length as inputs, this summarizer crop articles >512 length.

If --super_long option is used, summarizer automatically parse to numbers of 512 length inputs and summarize per inputs. Number of extraction might slightly altered with --super_long used.


Train Example

$python youyakumanJPN_train.py -data_folder [training_txt_path] -save_path [model_saving_path] -train_from [pretrained_model_file]
"""
-data_folder : path to train data folder, structure showed as below:
                training_txt_path
                ├─ article1.pickle
                ├─ article2.pickle
                ..    
"""

Train Data Preparation

Training data should be a dictionary saved by pickle, to be specifically, a dictionary containing below contents of one article.

{'body': 'TEXT_BODY', 'summary': 'SUMMARY_1<sep>SUMMARY_2<sep>SUMMARY3'}

Version Log:

2020-08-03 Updated to transformer package, remove redudndancy, model saving format while training

2020-02-10 Training part added

2019-11-14 Add multiple language support

2019-10-29 Add auto parse function, available for long article as input

About

Extractive summarizer using BertSum as summarization model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages