Skip to content

Using BERT for long sentence classification (more than 512 word pieces).

License

Notifications You must be signed in to change notification settings

franbvalero/BERT-long-sentence-classification

Repository files navigation

BERT for long sentence classifiaction

BERT does not process tokenized sequences of text with more than 512 word pieces, it has to truncate them.

In the case of corpus like 20 Newsgroups, this represents a problem, because, it has a lot of extensive examples.

Recurrences Over BERT (RoBERT)

In this project, we implemented the approach proposed in this article Hierarchical Transformers for Long Document Classification.

RoBERT can process tokenized sequences of text for every size:

  1. Segements the text sequence in segments of N tokens.
  2. Tokenizes all the segments.
  3. Processes with BERT all the segments.
  4. The representation obtained with BERT for each segments is located sequentially in a tensor.
  5. The tensor will be processed by LSTM.
  6. The representation obtained in the last time step of the LSTM will be used to classify.

Dependencies

  • Python 3.7
  • We need the following packages (using pip):
pip install pandas
pip install cleantext
pip install scikit-learn
pip install torch
pip install transformers
pip install matplotlib
pip install mlxtend
pip install seaborn
pip install Unidecode
pip install nltk

Usage

The two command below use the argument True for downloading the 20 Newsgroups corpus (only it is necessary for the first execution of each script).

The first script uses BERT for sequence classication (BERTSC), therefore truncates the sentences.

./launch-experiments-20newsgroups.sh True

The second script uses RoBERT.

./launch-hierarchical-experiments-20newsgroups.sh True

Results

The accuracy obtained in the reference paper in the corpus 20 News Groups using RoBERT on the full datasets is 84 %. In our case, for limitations of hardware, we could not feed all the segmented corpus in the GPU, for this resason, we realize an experimentation with a reduced version, where the maximum number of tokens allowed by example was 512 and 1024 tokens.

In the table below are shown the results obtained, where for the same maximum lenght, BERTSC obtained a better performance than RoBERT, however, we implemented our own implementetation of RoBERT and does not follow the same approach of optimization, however, is obviuous that the LSTM degrade a bit the performance.

MAX. LENGTH MODEL ACC
1024 RoBERT 77 %
512 BERTSC 79 %
512 RoBERT 75 %

About

Using BERT for long sentence classification (more than 512 word pieces).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published