Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to dump response content to a folder #45

Open
alexander-schranz opened this issue Feb 19, 2019 · 6 comments
Open

Add option to dump response content to a folder #45

alexander-schranz opened this issue Feb 19, 2019 · 6 comments

Comments

@alexander-schranz
Copy link
Contributor

alexander-schranz commented Feb 19, 2019

I want to use fink not only to check for false response codes. I want also to dump its response content (in my case html) to a file because I want then to use the w3c html5validator to validate this files.

Before investigating into implementing this I would to check if you are open to add this option.

@dantleech
Copy link
Owner

dantleech commented Feb 19, 2019

maybe. My main concern would be that while dumping the HTML is easy enough, it takes you half-way to creating an offline version of a given site (dumping assets etc, then relatiivising URLs), and that's a new problem.

I think Fink could also be used as a library, and you could f.e.do

$dispatcher = DispatcherBuilder::create('http://www.example.com')->callback(function (Response $response) {
    // your stuff here
})->build();

but you probably want to use this via. the command, so that would require more refactoring (and itherwise you would currently have to bootstrap the event loop stuff).

@dantleech
Copy link
Owner

dantleech commented Feb 19, 2019

Note that there is also https://github.com/spatie/crawler which might be more suitable for your use case? looking at the Validator library I guess it makes sense to have a tool which just dumps the HTML.

Not against the idea necessarily, it would be convenient, but actually not 100% sure it belongs here (it could fit though)

@dantleech
Copy link
Owner

dantleech commented Feb 19, 2019

If we did, I guess the Crawler should be refactored to extract the DOM parsing into an observer, the code to dump the HTML can then also be an observer, and the Crawler will only send notifications when it gets a Response and has read the $body.

@alexander-schranz
Copy link
Contributor Author

If we did, I guess the Crawler should be refactored to extract the DOM parsing into an observer, the code to dump the HTML can then also be an observer, and the Crawler will only send notifications when it gets a Response and has read the $body.

Throwing an Event sounds like a good idea for me and would make it very flexible: Would you use the symfony/event-dispatcher for this or which library do you prefer?

@dantleech
Copy link
Owner

No, no libraries for this :) We can simply create an interface for the observer (e.g. CrawlerObserver) and pass a collection of these to the crawler (e.g. CrawlerObservers).

@dantleech
Copy link
Owner

I think this will require some refactoring, still thinking about how to do it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants