You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When launching a crawl, it seems that only the start url and robots.txt are requested through the proxy (during the validation process).
way to reproduce:
start a crawl with: $ ./htcap.py crawl -v -p http:127.0.0.1:8080 http://localhost/index.html test.db
you get:
Initializing . . done
Database test.db initialized, crawl started with 10 threads
crawl result for: link GET http://localhost/index.html
new request found link GET http://localhost/test1.html
crawl result for: link GET http://localhost/test1.html
new request found link GET http://localhost/test2.html
new request found link GET http://localhost/index.html
crawl result for: link GET http://localhost/test2.html
new request found link GET http://localhost/test1.html
new request found link GET http://localhost/index.html
Crawl finished, 3 pages analyzed in 0 minutes
I only got 2 hits in the proxy log:
http://…/index.html
http://…/robots.txt
The text was updated successfully, but these errors were encountered:
When launching a crawl, it seems that only the start url and robots.txt are requested through the proxy (during the validation process).
way to reproduce:
$ ./htcap.py crawl -v -p http:127.0.0.1:8080 http://localhost/index.html test.db
you get:
http://…/index.html
http://…/robots.txt
The text was updated successfully, but these errors were encountered: