Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy not used when crawling on localhost network #19

Closed
GuilloOme opened this issue Feb 9, 2017 · 2 comments
Closed

Proxy not used when crawling on localhost network #19

GuilloOme opened this issue Feb 9, 2017 · 2 comments

Comments

@GuilloOme
Copy link
Contributor

GuilloOme commented Feb 9, 2017

When launching a crawl, it seems that only the start url and robots.txt are requested through the proxy (during the validation process).

way to reproduce:

  • start a crawl with:
    $ ./htcap.py crawl -v -p http:127.0.0.1:8080 http://localhost/index.html test.db
    you get:
Initializing . . done
Database test.db initialized, crawl started with 10 threads
crawl result for: link GET http://localhost/index.html  
  new request found link GET http://localhost/test1.html 
crawl result for: link GET http://localhost/test1.html  
  new request found link GET http://localhost/test2.html 
  new request found link GET http://localhost/index.html 
crawl result for: link GET http://localhost/test2.html  
  new request found link GET http://localhost/test1.html 
  new request found link GET http://localhost/index.html 

Crawl finished, 3 pages analyzed in 0 minutes
  • I only got 2 hits in the proxy log:
    • http://…/index.html
    • http://…/robots.txt
@GuilloOme
Copy link
Contributor Author

After investigating, it seems that phantomjs is not happy with playing on the localhost network…
I filled a bug #14808 to them.

@GuilloOme GuilloOme changed the title Proxy not used for every request during crawl Proxy not used when crawling on localhost network Feb 9, 2017
@GuilloOme
Copy link
Contributor Author

It's seems a behavior of QT (the lib used by phantomJS) ; unfortunately, it's not be "fixable"… (see this response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant