Skip to content

scrapedia/r18

Repository files navigation

A Spider for R18

This is a scrapy project for R18 web scraping, and also as an example for Scrapy technology and CI tools from Github Marketplace.

Overview

CII Best Practices

pylint Score

https://circleci.com/gh/scrapedia/r18/tree/master.svg?style=svg https://codebeat.co/badges/7feab55f-a261-4ee9-8acd-32c7e2ca7cdb Codacy Badge License: AGPL v3 DepShield Badge Code style: black

Requirements

Python 3 pyup Known Vulnerabilities Renovate enabled
  • Python 3.6+
  • Scrapy 1.6.0
  • Fully tested on Linux, but it should works on Windows, Mac OSX, BSD

Usage

Run MongoDB

Run docker-compose in docker folder to initial a MongoDB server:

docker-compose up -d

If you don't want to view log message:

docker-compose up -d && docker-compose logs --follow

Remind: Error saving history file: FileOpenFailed: Unable to open() file /home/mongodb/.dbshell: Unknown error · Issue #323 · docker-library/mongo

Run Sentry

Initial postgres with senty first:

1. Generate secret key first:

docker run --rm sentry config generate-secret-key

2. Use the secret key to create a database in postgres:

docker run --detach \
    --name sentry-redis-init \
    --volume $PWD/redis-data:/data \
    redis
docker run --detach \
    --name sentry-postgres-init \
    --env POSTGRES_PASSWORD=secret \
    --env POSTGRES_USER=sentry \
    --volume $PWD/postgres-data:/var/lib/postgresql/data \
    postgres
docker run --interactive --tty --rm \
    --env SENTRY_SECRET_KEY='<secret-key>' \
    --link sentry-postgres-init:postgres \
    --link sentry-redis-init:redis \
    sentry upgrade

Then input the superusername and password

3. Stop the redis and postgres:

docker stop sentry-postgres-init sentry-redis-init && docker rm sentry-postgres-init senty-redis-init
  1. Edit the env files to add the superusername, password and database related information

5. Start sentry with docker-compose.yml:

docker-compose up --detach && docker-compose logs --follow

Run R18 Spider

Pipenv is adopted for the virtual environment management. Create the virtual environment and activate it:

pipenv install && pipenv shell

Go to the project root and run the command:

cd run && python run.py

Stop MongoDB

Run the following command to stop MongoDB:

docker-compose down --volumes

Scrapy Technology Used In This Spider

CI Used In This Spider

Spider Contracts

TODO

  • [X] Move zh page re-direction to en to a downloader middleware
  • [X] Docker configurations for MongoBD