Skip to content

timephy/bs.to-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bs.to-crawler

A web crawler that extracts video urls from bs.to in bulk.

Attention - This project does not work in the current state

bs.to recognizes automation since javascript property navigator.webdriver is set to true. Overriding this property with javascript code (e.g. Object.defineProperty) does sort of work, but then one would have to solve an near infinite amount of CAPTCHAs.

There is a possible fix (version rollback to pre-Oct-2017, very tedious).

Requirements

Getting started

  1. Visit bs.to and select your desired series (including season and language). Copy the URL in the top bar of your browser (Should be of this form: http://bs.to/serie/<series>/<season>/<language>, e.g. https://bs.to/serie/Downton-Abbey/1/en).
  2. Run the program in the command-line: python3.6 . [URL]
  3. Get video links from the output file(s)!

Please note

  • Currently only one host (vivo.sx) is supported!

  • Hosts like vivo change the video URLs frequently, so the links given only stay valid for a few hours.

  • Windows users: Please user Git Bash/PowerShell instead of cmd!

How it works

This tool uses selenium to parse and query html. It uses chromedriver to control a Chrome browser instance. This simulates a real user and browser. Conveniently this also allows you to solve the CAPTCHAs.

Disclaimer

This tool does not store, stream or use copyrighted material - it only handles URLs.

Use of this tool is at each users own risk. Under no circumstances shall the developer(s) be liable for and indirect, incidental, consequential, special or exemplary damages arising out of the services this tool provides.