Skip to content

dr-m-wasim/DSL-BiomedicalQueryExpansion

Repository files navigation

Biomedical Query Expansion (Data Science Lab, KICS-UET)

Pre-requisties

  • Install JDK 7 or higher
  • Install JRE latest version
  • Install Eclipse
  • Install manven latest version
    • Open terminal and type: sudo apt-get install maven
  • Download and install the solr-6.6.0 or higher from it official site
  • Download genomic data repository from TREC 2007 Genomics Track Data

How to configure the solr

  • Run solr service by default it use port 8983. To check in your browser type: localhost:8983
  • Create a new core or collection, the default core/colletion directory is /var/solr/data
  • Once you download your data repositoty, extract them and combine all the files under one directory, its require about 9.8 GB of space
  • Now Index the data for you created core/collection:
  • Solr indexed your data according to your default solrconfig.xml schema but you can define and specify your own fields

    Specifiy your own fields in solr:

    you can Update solrschema.xml and managed-schema located in your new created core/collection directory files by adding new fields

    How To add new fields

    • Open /var/solr/data/<core/collection name>/conf/ --> managed-schema, solrschema.xml
      • In solrschema.xml file: Search
         <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler"/>
      • Add a new field inside the above tag
         <str name="capture">body</str>
      • To add your own replace the body with your own field name
      • Save and exit the file
      • In managed-schema file: Search
         <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="false" />
      • Add a new tag uder the "text" field
         <field name="body" type="text_general" indexed="true" stored="true"/>
      • To add your own replace the "body" with your own field name
    • Restart your solr by type command
      sudo service solr restart
      • If your solr get error, Check you configuration files properly
        Note: Name must be same in managed-schema and solrconfig.xml files

How to setup files and compile the code

  • Open terminal in you project root directory and type
    mvn compile

It will compiles all the dependencies in your pom.xml file

  • To run code properly the following files must be download and extract to their proper place
    Following files must be included in you resources dir
  1. Downlaod trecgen2007.gold.standard.tsv.txt
  2. Downlaod 2007topics.txt
  3. Download Wordnet-3.0. Create a new directory in the main project named as data and extract the contants of wordnet-3.0 inside this data dir.
  4. Add a folder name script under the resource dir
  5. Download trecgen2007_score.py and save under script dir
  6. Download the Sementic Types Mappings and Sementic Group File. also create a dir named Mappings under resources dir and put the two Sementic Types and Sementic Group Files.
  7. Now create a dir named DocResult under resource dir --> (This directory will be used for the output of results comparision)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •