Skip to content

Commit

Permalink
Add support for proxies
Browse files Browse the repository at this point in the history
  • Loading branch information
kinoute committed Sep 13, 2024
1 parent c119ff7 commit 9a98175
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 3 deletions.
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ By default, the `docker run` command above will always `pull` before running to
Output of `./moviestills --help`:

```
Usage: moviestills [--website WEBSITE] [--list] [--parallel PARALLEL] [--delay DELAY] [--async] [--timeout TIMEOUT] [--cache-dir CACHE-DIR] [--data-dir DATA-DIR] [--debug] [--no-colors] [--no-style] [--hash]
Usage: moviestills [--website WEBSITE] [--list] [--parallel PARALLEL] [--delay DELAY] [--async] [--timeout TIMEOUT] [--proxy PROXY] [--cache-dir CACHE-DIR] [--data-dir DATA-DIR] [--hash] [--debug] [--no-colors] [--no-style]
Options:
--website WEBSITE, -w WEBSITE
Expand All @@ -99,14 +99,16 @@ Options:
--async, -a Enable asynchronus running jobs [default: false, env: ASYNC]
--timeout TIMEOUT, -t TIMEOUT
Set the default request timeout for the scraper [default: 15s, env: TIMEOUT]
--proxy PROXY, -x PROXY
The proxy URL to use for scraping [env: PROXY]
--cache-dir CACHE-DIR, -c CACHE-DIR
Where to cache scraped websites pages [default: cache, env: CACHE_DIR]
--data-dir DATA-DIR, -f DATA-DIR
Where to store movie snapshots [default: data, env: DATA_DIR]
--hash Hash image filenames with md5 [default: false, env: HASH]
--debug, -d Set Log Level to Debug to see everything [default: false, env: DEBUG]
--no-colors Disable colors from output [default: false, env: NO_COLORS]
--no-style Disable styling and colors entirely from output [default: false, env: NO_STYLE]
--hash Hash image filenames with md5 [default: false, env: HASH]
--help, -h display this help and exit
--version display version and exit
```
Expand Down Expand Up @@ -201,6 +203,10 @@ You can change the default `data` folder with the `—data-dir` CLI argument or

If you use our Docker image to run `moviestills`, don't forget to change the volume path in case you edited the *internal* `data` folder. Again, you should not even bother editing the *internal* `data` folder's path or name anyway as you have volumes to store and get access to these files on the host machine.

#### Proxies

You can set up a proxy URL to use for scraping using the `--proxy` CLI agument or the `PROXY` environment variable. At the moment, you can set only one proxy but the app might support multiple proxies in a round robin fashion later.

#### Hash filenames

To get some consistency, you can use the MD5 hash function to normalize image filenames. All images will then use 32 hexadecimal digits as filenames. To enable the *hashing*, use the `—hash` CLI argument or the `HASH=true` environment variable.
Expand Down
6 changes: 6 additions & 0 deletions config/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ func (Options) Description() string {
return "this program can scrap various websites to get high quality movie snapshots.\n"
}

// Epilogue will be displayed at the end
// of the "help" CLI command.
func (Options) Epilogue() string {
return "For more information visit https://github.com/kinoute/moviestills"
}

// Version of the app can be displayed either
// when a user makes use of the --version flag, the
// --help flag or when an erroneous flag is passed.
Expand Down
3 changes: 2 additions & 1 deletion config/settings.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ type Options struct {
RandomDelay time.Duration `arg:"-r, --delay,env:RANDOM_DELAY" help:"Add some random delay between requests" default:"1s"`
Async bool `arg:"-a, --async,env:ASYNC" help:"Enable asynchronus running jobs" default:"false"`
TimeOut time.Duration `arg:"-t, --timeout,env:TIMEOUT" help:"Set the default request timeout for the scraper" default:"15s"`
Proxy string `arg:"-x, --proxy,env:PROXY" help:"The proxy URL to use for scraping"`
CacheDir string `arg:"-c, --cache-dir,env:CACHE_DIR" help:"Where to cache scraped websites pages" default:"cache"`
DataDir string `arg:"-f, --data-dir,env:DATA_DIR" help:"Where to store movie snapshots" default:"data"`
Hash bool `arg:"--hash,env:HASH" help:"Hash image filenames with md5" default:"false"`
Debug bool `arg:"-d, --debug,env:DEBUG" help:"Set Log Level to Debug to see everything" default:"false"`
NoColors bool `arg:"--no-colors,env:NO_COLORS" help:"Disable colors from output" default:"false"`
NoStyle bool `arg:"--no-style,env:NO_STYLE" help:"Disable styling and colors entirely from output" default:"false"`
Hash bool `arg:"--hash,env:HASH" help:"Hash image filenames with md5" default:"false"`
}
7 changes: 7 additions & 0 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@ func main() {
colly.CacheDir(options.CacheDir),
)

// Set up a proxy
if options.Proxy != "" {
if err := scraper.SetProxy(options.Proxy); err != nil {
log.Error.Println("Could not set proxy", log.White(options.Proxy), log.Red(err))
}
}

// Set request timeout
scraper.SetRequestTimeout(options.TimeOut)

Expand Down

0 comments on commit 9a98175

Please sign in to comment.