Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not save empty advertisements to mirror #2471

Merged
merged 1 commit into from
Jan 10, 2024
Merged

Conversation

gammazero
Copy link
Collaborator

@gammazero gammazero commented Jan 10, 2024

Empty advertisements should not be saved to mirror since they do not help index ingestion. Advertisements may have no multihash entries when:

  1. The ad removes a context ID
    When ingesting ads, the indexer does not read removal ads from the mirror since there is no entry data to get.
  2. The ad only updates metadata or provider information
    The indexer stores the metadata, but does not read mirror since there are no entries
  3. The ad was deleted by a removal ad later in the chain
    Once an ad is known to be deleted, that ad will never be read from the mirror since its content is deleted
  4. The publisher is not serving entries data
    This is treated like a no-content ad, as in case 2, but means that at some point in the past the ad had entry data. There is or should be a pending unpublished removal later in the chain. So, do not save this in a mirror since it will end up being deleted later, or may be a temporary publisher error that should be queried by another indexer using the mirror. In either case, the advertisement should not be written to the mirror.

GC also removes any empty (no-content) advertisements from the mirror, and reindexing must not repopulate them.

Other changes:

  • GC always logs number of indexes removed
  • Always HAMT entries which are not mirrored

Empty advertisements should not be saved to mirror since they do not help index ingestion. Advertisements may have no multihash entries when:

1. The ad removes a context ID
  - When ingesting ads, the indexer does not read removal ads from the mirror since there is no entry data to get.
2. The ad only updates metadata or provider information
  - The indexer stores the metadata, but does not read mirror since there are no entries
3. The ad was deleted by a removal ad later in the chain
  - Once an ad is known to be deleted, that ad will never be read from the mirror since its content is deleted
4. The publisher is not serving entries data
  - This is treated like a no-content ad, as in case 2, but means that at some point in the past the ad had entry data. The is or should be a pending unpublished removal later in the chain. So, do not save this in a mirror since it will end up being deleted later, or may be a temporary publisher that should be queried by another indexer using the mirror. In either case the advertisement should not be written to the mirror.

GC also removes any empty (no-content) advertisements from the mirror, and reindexing must not repopulate them.

Other changes:
- GC always logs number of indexes removed
- Always HAMT entries which are not mirrored
@gammazero gammazero merged commit 87bebf0 into main Jan 10, 2024
9 checks passed
@gammazero gammazero deleted the no-mirror-empty-ads branch January 10, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants