Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable caching on intermediate realtime persists #1943

Merged
merged 1 commit into from
Nov 17, 2015

Conversation

xvrl
Copy link
Member

@xvrl xvrl commented Nov 10, 2015

No description provided.

@@ -26,6 +26,7 @@
import com.google.common.base.Throwables;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterables;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import ?

cache.close(input.getSegment().getIdentifier() + "_" + input.getCount());
return null;
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to iterate for the close to be called ? or may be simply replace it with a for loop.
Also, can we abstract the the namespace creation method.

@gianm
Copy link
Contributor

gianm commented Nov 10, 2015

The spills are not consistent across nodes so this would only work if the cache is local. That should probably be called out or enforced or something.

@gianm
Copy link
Contributor

gianm commented Nov 10, 2015

@xvrl is it possible to get real world performance numbers?

@xvrl xvrl force-pushed the realtime-caching branch 2 times, most recently from 5cd6404 to 7bea48e Compare November 10, 2015 21:24
@xvrl
Copy link
Member Author

xvrl commented Nov 10, 2015

@gianm addressed comments, will look into getting some real-word numbers.

@xvrl
Copy link
Member Author

xvrl commented Nov 13, 2015

@gianm roughly 20% speedup on a typical top-n query 10-15min into the hour of a 1-hour realtime index task, with 10 persists completed (~ 1.4M rows to scan)

Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
  cache("true") 226.2689 269.5347 290.6316 279.4486 302.8039 542.1334   100
 cache("false") 285.9610 343.3580 395.8290 396.8076 419.2823 708.2887   100

@himanshug
Copy link
Contributor

👍 since old behavior is preserved by default and this gets into action only if user explicitly provides the configuration.
not sure if documentation update is left out intentionally , if not then pls update same.

@xvrl
Copy link
Member Author

xvrl commented Nov 17, 2015

@himanshug yes, will add docs, was waiting for comments before doing so

drcrallen added a commit that referenced this pull request Nov 17, 2015
Enable caching on intermediate realtime persists
@drcrallen drcrallen merged commit 8fcf240 into apache:master Nov 17, 2015
@drcrallen drcrallen deleted the realtime-caching branch November 17, 2015 23:06
@nishantmonu51 nishantmonu51 mentioned this pull request Dec 1, 2015
@gianm gianm added this to the 0.8.3 milestone Dec 1, 2015
@gianm gianm mentioned this pull request Dec 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants