Skip to content

Proposed solution model for LaySumm (The 1st Computational Linguistics Lay Summary Challenge Shared Task)

License

Notifications You must be signed in to change notification settings

RochanaChaturvedi/laysumm20

 
 

Repository files navigation

LaySumm20

This repository contains the replication code for Divide and Conquer: From Complexity to Simplicity for Lay Summarization. The work describes our approach in LaySumm (The 1st Computational Linguistics Lay Summary Challenge Shared Task). The task is to automatically generate non-technical summaries of scholarly text for lay audience.

Directory Structure:

1 Data - includes all data for the model

    1.1 Input-Data              - includes original full-text & abstract files for all documents
    1.1 Sections-DataFrame      - includes csv file containing all documents text (section wise)
    1.1 Input-wMVC              - includes input documents for the wMVC model
    1.1 Input-BART              - includes input documents for the BART model
    1.1 Section-wise-summaries  - includes summaries for all sections (output of BART model)
    1.1 Merged-final            - includes final merged summaries

2 Utilities - includes utility python scripts

    2.1 prepare_data.py          - python script for preparing section-wise preprocessed folders
                                    (in Input-wMVC) to be used as input data for wMVC model.

    2.2 preprocess_data.py       - python script for preprocessing input document text

    2.3 merge_summaries.py       - python script for merging section-wise summaries (taking input
                                    from the Section-wise-summaries folder, and saving final
                                    summaries in Merged-final folder)

3 BART - includes code for generating abstractive summaries using off-the-shelf BART model and the code to fine-tune BART for the task.

4 wMVC - includes code for generating extractive summaries using wMVC model

5 evaluation - includes evaluation script used in the competetion

6 requirements.txt

Generate Laysumm Summaries:

Steps:

  1. Clone the repository and move to the cloned repository:
git clone https://github.com/anuragjoshi3519/laysumm20
cd laysumm20
  1. Create virtual environment and install dependencies:
pip3 install virtualenv
virtualenv -p /usr/bin/python3 env
source env/bin/activate
pip3 install -r requirements.txt
python3 -c "from nltk import download; download(['punkt', 'stopwords'])"
  1. Add test documents (full_texts & abstracts for every document) in Data/Input-Data (first remove default sample_ABSTRACT.txt and sample_FULLTEXT.txt files)

  2. Generate summaries for the test documents:

python3 generateLaysumm.py

Generated summaries can be found in Data/Merged-final folder.

Cite:

If you find the work useful, please cite it as:

@inproceedings{chaturvedi-etal-2020-divide,
    title = "Divide and Conquer: From Complexity to Simplicity for Lay Summarization",
    author = "Chaturvedi, Rochana  and
      Saachi  and
      Dhani, Jaspreet Singh  and
      Joshi, Anurag  and
      Khanna, Ankush  and
      Tomar, Neha  and
      Duari, Swagata  and
      Khurana, Alka  and
      Bhatnagar, Vasudha",
    booktitle = "Proceedings of the First Workshop on Scholarly Document Processing",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.sdp-1.40/",
    doi = "10.18653/v1/2020.sdp-1.40",
    pages = "344--355"
    }

About

Proposed solution model for LaySumm (The 1st Computational Linguistics Lay Summary Challenge Shared Task)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 64.9%
  • Jupyter Notebook 35.1%