Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain name lookup failures get cached forever #234

Closed
anjackson opened this issue Feb 21, 2019 · 1 comment
Closed

Domain name lookup failures get cached forever #234

anjackson opened this issue Feb 21, 2019 · 1 comment
Assignees
Labels

Comments

@anjackson
Copy link
Collaborator

When a job first looks up a URL, the DNS record is fetched first. If this fails, this code kicks in:

if (ch == null || ch.hasBeenLookedUp() && ch.getIP() == null) {

Otherwise, this code use use:

if (isIpExpired(curi) && !curi.getUURI().getScheme().equals("dns")) {

The latter uses 'isIpExpiredwhich implements theipValidityDurationSeconds` check. However, the former does not and thus while successful IP lookups get refreshed every six hours (by default), failed IP lookups are never re-tried.

@anjackson
Copy link
Collaborator Author

anjackson commented Feb 21, 2019

Solution should be

if (ch == null || ch.getIP() == null && !isIpExpired(curi)) {

i.e. if it's null and the lookup has not expired, we reject.

@anjackson anjackson self-assigned this Feb 21, 2019
@anjackson anjackson added the bug label Feb 21, 2019
anjackson added a commit to ukwa/heritrix3 that referenced this issue Feb 25, 2019
nlevitt added a commit that referenced this issue Feb 25, 2019
Allow failed lookups to expire, for #234.
nlevitt added a commit that referenced this issue Mar 14, 2019
* trough-dedup:
  promote dirty segments at crawl finish
  trough dedup!
  Allow failed lookups to expire, for #234.
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release 3.4.0-20190207
  As @nlevitt suggestion, a further check.
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release 3.4.0-20190205
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release 3.4.0-20190205-2
  [maven-release-plugin] prepare for next development iteration
  Extend to set additional property.
  Use argument syntax.
  Skip tests during release process (covered by CI).
  Set consistent tag, and include contrib.
  Wrong repo spec.
  Add build profile for deployment to Maven Central.
  Avoid headings being treated as lists
  Clarification about APIs.
  Swapped original and link to make maintenance simpler.
  Tidy up markup and links.
  Add synchronized statements for #221.
  Add checks to guard against server sending 304 in error, for #229.
  do not checkpoint if crawl job has not started
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant