Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support crawling websites behind a login #202

Open
staabm opened this issue Dec 3, 2020 · 5 comments
Open

Support crawling websites behind a login #202

staabm opened this issue Dec 3, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@staabm
Copy link

staabm commented Dec 3, 2020

It would be great if the initial request could be a http-post with additional fields like username/password etc. After the initial post linkinator should handle the http-session (like receive/sending a session cookie) and crawl all other links via get-method (like it does already)

That way one could check links even if behind a login

@JustinBeckwith JustinBeckwith added the enhancement New feature or request label Dec 4, 2020
@JustinBeckwith
Copy link
Owner

Thanks for the feature request! Implementing something like this would require really changing the entire way the library works. This would mean moving from stateless HTTP requests that are driven from node, to using something like puppeteer to make the request and maintain state. I don't have immediate plans to implement something like this, but I'd be happy to talk through a PR proposal.

@leviwheatcroft
Copy link

what about exposing the request so headers et cetera could be set?

linkChecker.check(url, {
  configureRequest (request) {
    request.headers.some-header = 'some value'
  }
})

@JustinBeckwith
Copy link
Owner

That's probably something that could be pretty easily done. If we did it though, I would only want to support it via the API (not the CLI). Would that get you where you wanna go?

@leviwheatcroft
Copy link

Actually it turned out that the issue I was trying to solve was nothing to do with missing headers.

That said, I still think that exposing the request like this would make the package more versatile and I think it would benefit a lot of users.

@Veraxus
Copy link

Veraxus commented Dec 16, 2022

This could be easily solved by allowing cookies to be sent with every request. Either allow a cookies.txt file (e.g. linkchecker/linkchecker supports this) or allow cookies to be set from the config file or via the CLI. Requests are still stateless - you just send the cookies with every request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants