-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra error when crawling #11
Comments
I had the same error… Here is the content of the problematic json: [
["cookies",[]],
{"status":"ok","redirect":"http://example.com","time":0}
]
Blocked a frame with origin "file://" from accessing a frame with origin "null". The frame requesting access has a protocol of "file", the frame being accessed has a protocol of "about". Protocols must match.{"status":"ok", "partialcontent":true}] There is clearly some garbage in it… After investigation, it's because that the The best practice should be using |
Thanks! It's clearly some garbage generated by phantomjs. |
I've got the error while crawling one of our client website, I tried to reproduce it in a more stable environment without success. Sorry… I'll try again next week |
Finally, I found a way to reproduce:
[
{"status":"error","code":"load","time":0}
]
Blocked a frame with origin "file://" from accessing a frame with origin "null". The frame requesting access has a protocol of "file", the frame being accessed has a protocol of "about". Protocols must match. |
thanks!! |
It looks like the error happened every time PhantomJS hits a redirect… After some research, it's because phantomjs use stdout to provide feedback and do not offer option to deactivate this feedback, plus we can't rely on the fact that PhantomJS use either stdout or stderr in the right case (PhantomJS send output in stdout even it should have been sent in stderr) So a solution would be using a temporary file shared between the Benefits of this approach:
An other solution would be having some kind of local http stream to share info between the 2 process… but it seems to be a bit overkill for this matter. @segment-srl, What do you think? |
I'm still unable to reproduce this issue, even with "phantomjs core/crawl/probe/analyze.js /". What version of phantomjs are you using on what os? |
|
linux? |
Yes linux… |
interesting yes.. so it's an issue related on the phantomjs build.. one solution is to write analize,js output to fie instead of stdout.. |
I check the build difference between the 2 build (project vs ubuntu repo) and it seems that the ubuntu do not use the same process for building PhantomJS. |
@barhaterahul, what version of PhantomJS do you run? Is it the version provided by Ubuntu too? |
Finally, my question at launchpad regarding the difference with the build process has been closed without a straight answer… |
This issue is related to phantomjs build on some linux distros. Using the binary from the officail website should fix the problem. |
I was trying to crawl a website with -m active -v. I am getting these errors. Could you please look into it,
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 62, in run
self.crawl()
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 215, in crawl
probe = self.send_probe(request, errors)
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 164, in send_probe
probeArray = self.load_probe_json(jsn)
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 99, in load_probe_json
return json.loads(jsn)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 5 column 1 - line 5 column 249 (char 69 - 317)
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 62, in run
self.crawl()
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 215, in crawl
probe = self.send_probe(request, errors)
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 164, in send_probe
probeArray = self.load_probe_json(jsn)
File "/root/Desktop/htcap/core/crawl/crawler_thread.py", line 99, in load_probe_json
return json.loads(jsn)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 5 column 1 - line 5 column 249 (char 341 - 589)
The text was updated successfully, but these errors were encountered: