Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site using window.location to navigate are not crawled properly #22

Closed
GuilloOme opened this issue Mar 29, 2017 · 2 comments · Fixed by delvelabs/htcap#7
Closed

Site using window.location to navigate are not crawled properly #22

GuilloOme opened this issue Mar 29, 2017 · 2 comments · Fixed by delvelabs/htcap#7

Comments

@GuilloOme
Copy link
Contributor

GuilloOme commented Mar 29, 2017

Site relying on window.location for navigation are not crawled properly.

A crawl on a page like this one will not see all the site:

<html>
	<head>
	<script>
		function go1(){
			window.location = "../1.php";
		}
		function go2(){
			window.location = "../2.php";
		}
		function go3(){
			window.location = "../3.php";
		}
	</script>
	</head>
	<body>
		<span onclick="go1()">click here </span> 
		<a onclick="go2()">click here </a> 
		<td onclick="go3()">click here </td> 
	</body>
</html>
@GuilloOme GuilloOme changed the title pgae Site using window.location to navigate are not crawled properly Mar 29, 2017
@GuilloOme
Copy link
Contributor Author

GuilloOme commented Mar 29, 2017

I opened a issue here: ariya/phantomjs#14782

Update: I closed the phantomJS issue today (2016-04-03).
After some research, it is an expected behavior from phantomJS/webkit.
Since all the events are triggered simultaneously, phantomJS (through webkit) only navigate to the last location requested (see more detail in the phantomjs issue), then in htcap, only the last one is trapped.

@GuilloOme
Copy link
Contributor Author

GuilloOme commented Apr 6, 2017

Here is where I am now for the resolution of this issue:

Root cause of the issue
Because all the events found in the DOM are triggered back-to-back without waiting the end of the current stack to resolve, only the last window.location is visible through the onNavigationRequested called by phantomJS.

Solution found
Make sure that none of the events are triggered within the same stack: this way, avoiding collision and inconsistency in the page context (the events we trigger this way are meant to be triggered by a human user so the page script could not have been engineered to handle this kind of concurrency). Technically, it means using a messageEvent queue to place it one by one in the event loop.

Difficulties
The actual implementation of the JS probe is not designed to handle async logic: the usage of setInterval as a loop and global flags for managing the state of the analyzeDOM implie skipping step in the process… for more info: Concurrency model and Event Loop and this very good talk

It is possible that it will be necessary to refactor the current analyzeDOM() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant