-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize numeric column null value checking for low filter selectivity (more rows) #8822
optimize numeric column null value checking for low filter selectivity (more rows) #8822
Conversation
…ad of bitmap.get for those sweet sweet nanoseconds
The heatmaps look super cool! (although I don't think I fully understand them yet :| ) What did you use to build them? |
Hah, thanks, I used R with ggplot2 to make them. I'll try to clean up the code and attach it, if I have a chance, in case anyone else wants to do some benchmarking tinker with the results. As for what they mean, I'll try my best to explain it as succinctly as possible 😅. The benchmark I added in this PR, When SQL compatible null handling is enabled, numeric columns are stored with 2 parts if nulls are present: the column itself, and a bitmap that has a set bit for each null value. At query time, filters are evaluated to compute something called an The benchmark simplifies this concept into using a Translating this into heatmap, the y axis is showing the effects of differences in density of the null bitmap (bottom is a few null values, top is nearly all rows are null), the x axis is the differences in the number of rows that our selector will select (left side selects very few rows, right scans nearly all rows), and the z axis is the difference in benchmark operation time between using bitmap.get` and using an iterator (or peekable iterator) from the null bitmap to move along with the iterator on the selectivity bitset. Further, some of the heatmaps have translated the raw benchmark times into the time per row by scaling the time by how many rows are selected, to standardize measurement across the x axis, making it easier to compare the 2 strategies. Sorry, that didn't end up being so short... I .. hope this didn't make it more confusing 😜 |
To follow-up on the PR description, I let the concise benchmarks finish where nearly all of the rows are selected:
Using
I did not plot these because it seemed to be rather pointless, concise seems to be only viable when using |
Thanks Clint! These /are/ really good benchmarks. I'm curious what
percentage of the entire cost for processing a row a null check would be so
we can have a good idea of what % overhead we're talking about.
…On Tue, Nov 5, 2019 at 12:36 AM Clint Wylie ***@***.***> wrote:
To follow-up on the PR description, I let the concise benchmarks finish
where nearly all of the rows are selected:
Benchmark (bitmapType) (filterMatch) (nullDensity) (numRows) Mode Cnt Score Error Units
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0 500000 avgt 2 2438.318 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.1 500000 avgt 2 261150067.426 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.25 500000 avgt 2 430133586.339 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.5 500000 avgt 2 623322588.940 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.75 500000 avgt 2 449875260.568 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.99999 0.99 500000 avgt 2 29015358.505 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0 500000 avgt 2 2316.307 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.1 500000 avgt 2 4158.871 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.25 500000 avgt 2 6041.184 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.5 500000 avgt 2 8397.707 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.75 500000 avgt 2 6129.792 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.99999 0.99 500000 avgt 2 3511.961 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0 500000 avgt 2 2440.615 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.1 500000 avgt 2 4431.640 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.25 500000 avgt 2 6302.718 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.5 500000 avgt 2 8911.671 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.75 500000 avgt 2 6809.052 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.99999 0.99 500000 avgt 2 4995.936 us/op
Using get with concise is just terribad. The results for 1% of rows is
similarly pretty awful, though a couple orders of magnitude less bad:
Benchmark (bitmapType) (filterMatch) (nullDensity) (numRows) Mode Cnt Score Error Units
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0 500000 avgt 2 100.110 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.1 500000 avgt 2 3878501.478 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.25 500000 avgt 2 4853693.538 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.5 500000 avgt 2 8399206.128 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.75 500000 avgt 2 5169724.280 us/op
NullHandlingBitmapGetVsIteratorBenchmark.get concise 0.01 0.99 500000 avgt 2 440739.562 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0 500000 avgt 2 69.756 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.1 500000 avgt 2 1813.610 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.25 500000 avgt 2 3364.983 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.5 500000 avgt 2 3745.194 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.75 500000 avgt 2 2702.778 us/op
NullHandlingBitmapGetVsIteratorBenchmark.iterator concise 0.01 0.99 500000 avgt 2 1874.390 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0 500000 avgt 2 77.929 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.1 500000 avgt 2 1592.171 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.25 500000 avgt 2 2216.227 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.5 500000 avgt 2 3535.110 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.75 500000 avgt 2 2400.203 us/op
NullHandlingBitmapGetVsIteratorBenchmark.peekableIterator concise 0.01 0.99 500000 avgt 2 436.322 us/op
I did not plot these because it seemed to be rather pointless, concise
seems to be *only* viable when using IntIterator or PeekableIntIterator.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8822>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPSYCV5IXHLEI6HE5RKM3DQSEWAJANCNFSM4JI4LCNA>
.
--
Jad Naous
Imply | VP R&D
650-521-3425
jad.naous@imply.io
|
Cool heatmaps! Iteration should definitely perform better than calls to It looks like the operation in question ( |
The last 3 animations show the estimated per row cost in nanoseconds for each of the 3 strategies. I will summarize:
To me it kind of seems like a toss up to me which is better between using the |
That sounds nice. I was planning on digging into this for future work to see if I could improve the null vector construction, but even just using the iterator should be an improvement over the existing code, so I didn't investigate that as part of this PR. |
Sorry my question wasn't clear. I meant in terms of the cost of the full
operation on a row. For example, if we're doing a longSum operation on a
column, what would the cost per row be, and what would the percentage
overhead of this check be?
…On Tue, Nov 5, 2019 at 12:02 PM Clint Wylie ***@***.***> wrote:
I'm curious what
percentage of the entire cost for processing a row a null check would be so
we can have a good idea of what % overhead we're talking about.
The last 3 animations show the estimated per row cost in nanoseconds for
each of the 3 strategies. I will summarize:
- get - most of the numbers look to be in the 10-25ns per row range
(higher at low selectivity where it matters most)
- IntIterator - about half are under 10ns per row (at low
selectivity), this is definitely the best, but at super high selectivities
(.1% of rows selected) with very dense bitmaps it climbs to over a couple
of microseconds per row
- PeekableIntIterator - about half are between 10-15ns per row (at low
selectivity), most below 25ns, but also has more overhead with dense
bitmaps at very high selectivity but only climbs to about 50-60ns per row
in the worst case.
To me it kind of seems like a toss up to me which is better between using
the PeekableIntIterator and the plain IntIterator, it almost seems worth
it to eat the slow per row times at high selectivity in exchange for that
5ns per row at low selectivity, but both approaches fair better at low
selectivity than using get, so I went more conservative and used the
PeekableIntIterator for now.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8822>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPSYCSJAKFZOTCZLBN7VPDQSHGMHANCNFSM4JI4LCNA>
.
--
Jad Naous
Imply | VP R&D
650-521-3425
jad.naous@imply.io
|
The plots are great indeed. Bravo! |
The benchmark added in this PR can't determine that, all I can provide is the fixed cost per row of having nulls. That seems useful to know, but also time intensive so I might not get to it as part of this PR. |
I slightly adjusted the code to try to cut out a few instructions per loop, re-running benchmarks currently, and this time on an |
public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator | ||
{ | ||
final TIntIterator baseIterator; | ||
Integer mark; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might perform better if you make mark
an int
and then use -1
to signify not being set.
@Override | ||
public boolean isNull() | ||
{ | ||
return nullValueBitmap.get(offset.getOffset()); | ||
final int i = offset.getOffset(); | ||
if (nullMark < i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone calls offset.reset()
(topN does this sometimes) then this will be wrong. I think you need some logic to detect the offset going backwards, and resetting the iterator in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, I forgot about reset, will fix (hopefully without adding too many extra instructions since this is called in a hot loop 😢)
@@ -47,14 +47,38 @@ | |||
retVal = new boolean[offset.getMaxVectorSize()]; | |||
} | |||
|
|||
// Probably not super efficient to call "get" so much, but, no worse than the non-vectorized version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take that, comment!!
if (offset.isContiguous()) { | ||
final int startOffset = offset.getStartOffset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs some logic to detect the offset getting reset.
if (offset.isContiguous()) { | ||
final int startOffset = offset.getStartOffset(); | ||
nullIterator.advanceIfNeeded(startOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to call advanceIfNeeded
immediately after calling next
, and then not update nextNullRow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a mistake, I will fix (and make sure there is test coverage because this looks wrong).
} | ||
} else { | ||
final int[] currentOffsets = offset.getOffsets(); | ||
nullIterator.advanceIfNeeded(currentOffsets[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
for (int i = 0; i < offset.getCurrentVectorSize(); i++) { | ||
retVal[i] = nullValueBitmap.get(offset.getOffsets()[i]); | ||
final int row = currentOffsets[i]; | ||
if (row == nextNullRow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the nulls are [1, 2, 3]
and the offsets are [1, 3, 4]
? I think at some point nextNullRow
will be 2, and row
will be 3, which will cause a false
to get written into retVal
, which is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were correct on this, I have fixed the implementation to only use advanceIfNeeded
and peekNext
which I think makes it a bit easier to follow, though maybe not quite as optimal since it might result in a few additional advanceIfNeeded
operations. I also added tests for this method that catch this, since it wasn't actually covered otherwise previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thank you for the detailed benchmark results.
|
||
@Override | ||
public int next() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: even though it's missing in the javadoc of IntIterator
, it would be a good convention to throw NoSuchElementException
if hasNext()
returns false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking into the implementations of IntIterator
this is wrapping, it looks like they will throw this exception, so I don't need to have the check here. I also removed the check from peekNext
for the same reason.
import org.roaringbitmap.IntIterator; | ||
import org.roaringbitmap.PeekableIntIterator; | ||
|
||
public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add javadoc?
I repeated the benchmarks after being modified to have the latest changes to make sure everything was still good, this time on an AWS peekable iterator better than get:get better than peekable iterator: |
{ | ||
if (mark < i) { | ||
baseIterator.skipAllBefore(i); | ||
if (baseIterator.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i
is past the end of the set, I'm guessing baseIterator.hasNext
will be false and the mark
will remain unchanged. That means next
will return it, even though it's not >= i
. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops good catch, if there is no next it should reset the mark
if (offsets[0] < offsetMark) { | ||
nullIterator = nullValueBitmap.peekableIterator(); | ||
} | ||
offsetMark = offsets[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make more sense to set it to the last value of offsets
rather than the first. (With the first-offset approach, I think you'd see some issues if offset arrays on subsequent calls overlapped each other.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, will fix and add a test
if (!nullIterator.hasNext()) { | ||
break; | ||
} | ||
retVal[i] = row == nullIterator.peekNext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not next()
instead of peekNext()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the vectorized version doesn't save the nullMark
like the other version, doing next
here could consume a value that is outside of the range we are building the vector for, and we would lose it for the next vector that has that value.
{ | ||
WrappedRoaringBitmap mutable = new WrappedRoaringBitmap(); | ||
for (int i = 0; i < numRows; i++) { | ||
if (ThreadLocalRandom.current().nextDouble(0.0, 1.0) > 0.7) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to generate N
random bitmaps with a fixed seed. That way, the test is deterministic (but still tests decent variety due to the N
factor).
@clintropolis I finally got around to re-reading the discussion - this was very helpful. I think the part I missed initially was the z-axis in the first few heatmaps show the difference between 2 benchmark runs. I can finally take a pass at reading the code, now that I kinda understand what it's trying to do 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add unit tests for the PeekableIteratorAdapter classes since those seem like foundational classes
public class PeekableIteratorAdapter<TIntIterator extends IntIterator> implements PeekableIntIterator | ||
{ | ||
static final int NOT_SET = -1; | ||
final TIntIterator baseIterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want all of these to be package private instead of protected?
also nit: empty line between static and class variables
@Override | ||
public void advanceIfNeeded(int i) | ||
{ | ||
while (mark < i && baseIterator.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is i
always guaranteed to be positive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i
is supplied by an Offset
which should always be positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud here - feel free to ignore.
According to the interface javadocs i
is the minVal
that we want to advance to. There is no guarantee that minVal will always be positive. It looks like in right now we only pass in a positive integer, but it's possible someone can pass in negative values in the future? (if the baseIterator is a list of sorted integers - not sure if this is ever the case in druid)
If we set mark
to Integer.MIN_VALUE instead of -1 will that support negative numbers as well without a performance hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest dealing with this, if at all, just through javadocs saying that these interfaces should only be used with nonnegative ints. In practice these iterators are used for iterating over bitmaps that represent row numbers. It's doubly-impossible to see negative numbers: bitmaps cannot store negative ints, and row numbers cannot be negative either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs for this don't seem especially necessary to me, I can add if there is any other changes I need to make to this PR, otherwise I'd prefer to skip to avoid churning through CI again
{ | ||
static final int NOT_SET = -1; | ||
final TIntIterator baseIterator; | ||
int mark = NOT_SET; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mark
was confusing for me to understand. Is this nextVal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, I originally was calling this some form of 'next', however that was also confusing to me because it's actually the previous value of the baseIterator
, just the next value of the adapter. So, I changed it to be mark
since like, functionally this code is marking the position from the iterator it has consumed from to save it for the future. I guess I was thinking like how you mark the position on a buffer to save where you were so that you can go back to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is fine, maybe a javadoc explaining what it is.
*/ | ||
default PeekableIntIterator peekableIterator() | ||
{ | ||
return new PeekableIteratorAdapter(iterator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IntelliJ is complaining about raw types here
return new PeekableIteratorAdapter<>(iterator());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, will change
Description
I would like to add documentation for Druid SQL compatible null handling added in #4349 to the website, in part because it has been merged for quite a while now, but also so that we can feel better about the changes #8566 causes in SQL behavior in Druids native mode (tl;dr some calcite optimizations lead to some query incompatibilities specifically around our treatment of
''
andnull
as equivalent).Part of like.. being responsible and stuff, before adding this documentation I first wanted to collect some data to determine if it was a good idea to in fact document it at this time. The place I was specifically worried about was the
isNull
check, so I added a benchmark,NullHandlingBitmapGetVsIteratorBenchmark
, and ran some experiments to see what to expect as well as if we could do better.The nature of selectors lends itself to using a bitmap iterator as an alternative to calling bitmap.get for every
isNull
check, so I first testedget
vsiterator
on 500k row 'column's with various null bitmap densities and filter selectivities to simulate the overhead of having null value handling selector with each approach. I have so far only collected information on roaring, because withconcise
theget
method take far too long to complete at low selectivity,more on this later.
To help make sense of the numbers collected from this, I created heatmaps to compare the differences between the approaches. With the raw output, these looked something like this which isn't so telling:
Interpolating the data to fill in the gaps yields something a bit prettier, but still not incredibly telling on it's own:
But does start to offer that the iterator is better at dealing with a larger number of rows, diminishing as the density of the bitmap increases.
Scaling the data points to estimate the cost per row in nanoseconds provides a much more useful picture, and shows the abysmal performance of using the iterator at super high selectivity (e.g. 0.1% of rows match and are selected) with really dense bitmaps:
However if we truncate the results and compare areas where the
iterator
is better:to where
get
is betterit does look like the iterator does perform up to 15 ns per row better than using
get
when processing a lot of rows.Searching for a way around the limitation, I was unaware that the
IntIterator
for roaring was actually aPeekableIntIterator
which is an iterator that offers anadvanceIfNeeded
method that allows skipping the iterator ahead to an index.ConciseSet
actually has a similar method on it's iterator,skipAllBefore
, which is used by thecontains
method that theget
of concise uses! This is likely why concise just flat out wouldn't complete the benchmarks when processing a higher number of rows, because everyget
is creating an iterator, skipping to the position, loop checking to see if the iterator contains the index or passes it, and then throwing it away.Adding the 'peekable' iterator to the benchmark had a similar outcome to using the plain iterator,
but without nearly the overhead at high selectivity on dense bitmaps.
It's still worse, but not nearly as bad, no more than 60ns slower per row when processing a small number of rows than get, compared to over 2 microseconds for using the plain iterator.
Peekable iterator better:
Get better than peekable iterator:
Finally, to get a handle on the per row cost in nanoseconds of checking for nulls at all, I threw together these animations:
get:
iterator:
peekable iterator:
Based on these results, this PR swaps the
isNull
check to usePeekableIntIterator
in order to optimize the case where the selector is going to be processing a larger number of rows.PeekableIntIterator
was chosen overIntIterator
because it offers nearly the same performance increase at low selectivity, without nearly as severe of a performance loss for high selectivity.This PR has:
Key changed/added classes in this PR
ImmutableBitmap
,WrappedRoaringBitmap
,WrappedConciseBitmap
,WrappedImmutableRoaringBitmap
,WrappedImmutableConciseBitmap
,ColumnarDoubles
,ColumnarLongs
,ColumnarFloats