-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to headless chrome #31
Comments
You are talking puppeteer or chromeless ? |
since puppeteer is the project started by the team who develop chrome, I would be inclined to use this lib. |
hi and sorry for the delayed answere... I agree with Gullohome, puppeteer is the best choice. Point 1 may gets resolved by writing a chrome extension but point 2 is very problematic. In Chrome it's possible to intercept and abort requests but not the page navigation. For example if we allow the loading of scripts, it's possible that the crawler will naviate to a .js url... also it's not possible to prevent navigation to about:blank (es <a href="about:blank"...) |
We did all this in our fork. If you want to take a look of the implementation details, it is here: https://github.com/delvelabs/htcap/tree/master/core/crawl/probe We did a lot of work to reach a stable (enough) implementation and it will be deployed in our production environment in January. |
I tried your fork and it seems it faces the same issue as my test code. If a page contains a link to about:blank (<a href="about:blank") the navigation is not locked. |
@segment-srl you are right, any "special" uri scheme makes the probe hang… we didn't found a solution yet. it should be possible to solve it through the We choose to postpone the issue since not many website use other scheme than |
Phantomjs is no longer under development so we need to move to headless Chrome
The text was updated successfully, but these errors were encountered: