Skip to content

geoont/wikie-pooh

Repository files navigation

wikie-pooh

Downloader for some Wikipedia articles and related information

Requirements

  1. Download and install node.js from http://nodejs.org/download/
  2. Checkout this repository:
git clone https://github.com/geoont/wikie-pooh.git
cd wikie-pooh
  1. Install dependencies (this command should be run from the top project directory that contains file package.json):
npm install
  1. Optional: run tests for some of the dependencies
cd node_modules/nodemw/0.3.14/package
npm install vows
npm test

if any dependencies are missing simply install them with npm install.

Usage

Interactive Server-based Version

  1. Initialize new experiment: node ../experiment_init.js 0.cat npp.sqlite3 where 0.cat is a list of initial categories with one category per line and npp.sqlite3 is a new database
  2. Update database to current version: node ../experiment_fix.js en npp.sqlite3 (may not be needed but it won't break the database)
  3. Launch the server: node ../experiment_srv.js en npp.sqlite3
  4. Open in the browser: http://localhost:8282

Old Text File based Version

  • To see the content of a Wikipedia page run: node retrieve-page.njs zh 山 (set language and page name accordingly).
  • To retrieve a list of categories and relevant pages run: node retrieve-cats.njs en 0.cats where en is the language and 0.cats is a file with initial list of pages and categories. This will produce a new file 1.cats (or higher number) with a list of pages and categories retrieved based on the original list. All files are tab-delimited and can be opened in a spreadsheet.
  • output file can be edited to remove irrelevant entries which can be either commented out using # symbol or placed on the ignore list by entering dash (-) in into the first column
  • the list of ignored entries will be added to the end of output file
  • To get edits stats run node retrieve-edit-stats.njs en 0.cats.

Developing

Tools

Created with Nodeclipse (Eclipse Marketplace, site)

Nodeclipse is free open-source project that grows with your contributions.

About

Downloader for some Wikipedia articles and related information

Resources

License

Stars

Watchers

Forks

Packages

No packages published