Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Histogram field mapper that supports percentiles aggregations. #48580

Merged
merged 30 commits into from
Nov 28, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c4bfdb7
Add HistogramField.
iverase Oct 28, 2019
550394c
checkStyle
iverase Oct 28, 2019
9d4f9c4
more checkStyle
iverase Oct 28, 2019
4e3eed7
Addressed part of the review
iverase Oct 29, 2019
a168d32
Extract the logic of creating a new histogram to a separate method
iverase Oct 29, 2019
038d429
Addressed more comments.
iverase Oct 29, 2019
edc2faf
formatting
iverase Oct 29, 2019
c527aec
extract logic for getting histogram in TDigest
iverase Oct 29, 2019
bd59238
remove unused imports
iverase Oct 29, 2019
71886a8
rename test class
iverase Oct 29, 2019
793a257
Detect in the constructor if we expect histogram value source
iverase Oct 29, 2019
579c05c
revert last change
iverase Oct 29, 2019
af1249f
Values must be provided in increasing order
iverase Oct 31, 2019
1cb8f53
Handling null value and do not fail if arrays are empty, trate it as a
iverase Oct 31, 2019
93229e5
Handle ignore malformed properly
iverase Oct 31, 2019
996f8fc
Merge branch 'master' into histogramField
iverase Oct 31, 2019
edec448
initial documentation for the new field
iverase Oct 31, 2019
adf12a4
initial documentation for the new field
iverase Oct 31, 2019
3c5892e
Addressed docs review
iverase Nov 1, 2019
19f15a2
Add HistogramFieldTypeTests
iverase Nov 1, 2019
1f6383d
address last review comments
iverase Nov 3, 2019
fe039ee
Merge branch 'master' into histogramField
iverase Nov 3, 2019
40f679d
Merge branch 'master' into histogramField
iverase Nov 15, 2019
79f7fd9
Merge branch 'master' into histogramField
iverase Nov 27, 2019
fbabf1c
Make sure that in ignore malformed we move to the end of the
iverase Nov 27, 2019
f1a1ead
address review comments
iverase Nov 28, 2019
c8a1f12
remove support for parsed fields
iverase Nov 28, 2019
0045a8b
Merge branch 'master' into histogramField
elasticmachine Nov 28, 2019
f8cf1a7
addressed last comments
iverase Nov 28, 2019
2e8649a
Merge branch 'histogramField' of github.com:iverase/elasticsearch int…
iverase Nov 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.fielddata;


/**
* {@link AtomicFieldData} specialization for histogram data.
*/
public interface AtomicHistogramFieldData extends AtomicFieldData {

/**
* Return Histogram values.
*/
HistogramValues getHistogramValues();

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.fielddata;

import java.io.IOException;

/**
* Per-document histogram value. Every value of the histogram consist on
* a value and a count.
*/
public abstract class HistogramValue {

/**
* Advance this instance to the next value of the histogram
* @return true if there is a next value
*/
public abstract boolean next() throws IOException;

/**
* the current value of the histogram
* @return the current value of the histogram
*/
public abstract double value();

/**
* The current count of the histogram
* @return the current count of the histogram
*/
public abstract int count();

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.fielddata;

import java.io.IOException;

/**
* Per-segment histogram values.
*/
public abstract class HistogramValues {

/**
* Advance this instance to the given document id
* @return true if there is a value for this document
*/
public abstract boolean advanceExact(int doc) throws IOException;

/**
* Get the {@link HistogramValue} associated with the current document.
* The returned {@link HistogramValue} might be reused across calls.
*/
public abstract HistogramValue histogram();

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.fielddata;


import org.elasticsearch.index.Index;
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData;

/**
* Specialization of {@link IndexFieldData} for histograms.
*/
public abstract class IndexHistogramFieldData extends DocValuesIndexFieldData implements IndexFieldData<AtomicHistogramFieldData> {

public IndexHistogramFieldData(Index index, String fieldName) {
super(index, fieldName);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,13 @@
import org.elasticsearch.common.util.ArrayUtils;
import org.elasticsearch.common.util.BigArrays;
import org.elasticsearch.common.util.ObjectArray;
import org.elasticsearch.index.fielddata.HistogramValue;
import org.elasticsearch.index.fielddata.HistogramValues;
import org.elasticsearch.index.fielddata.SortedNumericDoubleValues;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.LeafBucketCollector;
import org.elasticsearch.search.aggregations.LeafBucketCollectorBase;
import org.elasticsearch.search.aggregations.metrics.NumericMetricsAggregator;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;
Expand All @@ -47,13 +48,13 @@ private static int indexOfKey(double[] keys, double key) {
}

protected final double[] keys;
protected final ValuesSource.Numeric valuesSource;
protected final ValuesSource valuesSource;
protected final DocValueFormat format;
protected ObjectArray<DoubleHistogram> states;
protected final int numberOfSignificantValueDigits;
protected final boolean keyed;

AbstractHDRPercentilesAggregator(String name, ValuesSource.Numeric valuesSource, SearchContext context, Aggregator parent,
AbstractHDRPercentilesAggregator(String name, ValuesSource valuesSource, SearchContext context, Aggregator parent,
double[] keys, int numberOfSignificantValueDigits, boolean keyed, DocValueFormat formatter,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
super(name, context, parent, pipelineAggregators, metaData);
Expand All @@ -77,7 +78,18 @@ public LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
return LeafBucketCollector.NO_OP_COLLECTOR;
}
final BigArrays bigArrays = context.bigArrays();
final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
if (valuesSource instanceof ValuesSource.Histogram) {
final HistogramValues values = ((ValuesSource.Histogram)valuesSource).getHistogramValues(ctx);
return collectHistogramValues(values, bigArrays, sub);
} else {
final SortedNumericDoubleValues values = ((ValuesSource.Numeric)valuesSource).doubleValues(ctx);
return collectNumeric(values, bigArrays, sub);
}

}

private LeafBucketCollector collectNumeric(final SortedNumericDoubleValues values,
final BigArrays bigArrays, final LeafBucketCollector sub) {
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
Expand Down Expand Up @@ -106,6 +118,36 @@ public void collect(int doc, long bucket) throws IOException {
};
}

private LeafBucketCollector collectHistogramValues(final HistogramValues values,
final BigArrays bigArrays, final LeafBucketCollector sub) {
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
states = bigArrays.grow(states, bucket + 1);
DoubleHistogram state = states.get(bucket);
if (state == null) {
state = new DoubleHistogram(numberOfSignificantValueDigits);
// Set the histogram to autosize so it can resize itself as
// the data range increases. Resize operations should be
// rare as the histogram buckets are exponential (on the top
// level). In the future we could expose the range as an
// option on the request so the histogram can be fixed at
// initialisation and doesn't need resizing.
state.setAutoResize(true);
states.set(bucket, state);
}
iverase marked this conversation as resolved.
Show resolved Hide resolved

if (values.advanceExact(doc)) {
final HistogramValue sketch = values.histogram();
while(sketch.next()) {
iverase marked this conversation as resolved.
Show resolved Hide resolved
state.recordValueWithCount(sketch.value(), sketch.count());
}
}
}
};
}


@Override
public boolean hasMetric(String name) {
return indexOfKey(keys, Double.parseDouble(name)) >= 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
import org.elasticsearch.common.util.ArrayUtils;
import org.elasticsearch.common.util.BigArrays;
import org.elasticsearch.common.util.ObjectArray;
import org.elasticsearch.index.fielddata.HistogramValue;
import org.elasticsearch.index.fielddata.HistogramValues;
import org.elasticsearch.index.fielddata.SortedNumericDoubleValues;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.Aggregator;
Expand All @@ -45,13 +47,13 @@ private static int indexOfKey(double[] keys, double key) {
}

protected final double[] keys;
protected final ValuesSource.Numeric valuesSource;
protected final ValuesSource valuesSource;
protected final DocValueFormat formatter;
protected ObjectArray<TDigestState> states;
protected final double compression;
protected final boolean keyed;

AbstractTDigestPercentilesAggregator(String name, ValuesSource.Numeric valuesSource, SearchContext context, Aggregator parent,
AbstractTDigestPercentilesAggregator(String name, ValuesSource valuesSource, SearchContext context, Aggregator parent,
double[] keys, double compression, boolean keyed, DocValueFormat formatter,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
super(name, context, parent, pipelineAggregators, metaData);
Expand All @@ -75,7 +77,18 @@ public LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
return LeafBucketCollector.NO_OP_COLLECTOR;
}
final BigArrays bigArrays = context.bigArrays();
final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
if (valuesSource instanceof ValuesSource.Histogram) {
colings86 marked this conversation as resolved.
Show resolved Hide resolved
final HistogramValues values = ((ValuesSource.Histogram)valuesSource).getHistogramValues(ctx);
return collectHistogramValues(values, bigArrays, sub);
} else {
final SortedNumericDoubleValues values = ((ValuesSource.Numeric)valuesSource).doubleValues(ctx);
return collectNumeric(values, bigArrays, sub);
}

}

private LeafBucketCollector collectNumeric(final SortedNumericDoubleValues values,
final BigArrays bigArrays, final LeafBucketCollector sub) {
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
Expand All @@ -97,6 +110,28 @@ public void collect(int doc, long bucket) throws IOException {
};
}

private LeafBucketCollector collectHistogramValues(final HistogramValues values,
final BigArrays bigArrays, final LeafBucketCollector sub) {
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
states = bigArrays.grow(states, bucket + 1);
TDigestState state = states.get(bucket);
if (state == null) {
state = new TDigestState(compression);
states.set(bucket, state);
}

if (values.advanceExact(doc)) {
final HistogramValue sketch = values.histogram();
while(sketch.next()) {
state.add(sketch.value(), sketch.count());
}
}
}
};
}

@Override
public boolean hasMetric(String name) {
return indexOfKey(keys, Double.parseDouble(name)) >= 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.InternalAggregation;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;

import java.io.IOException;
Expand All @@ -32,9 +32,9 @@

class HDRPercentileRanksAggregator extends AbstractHDRPercentilesAggregator {

HDRPercentileRanksAggregator(String name, Numeric valuesSource, SearchContext context, Aggregator parent,
double[] percents, int numberOfSignificantValueDigits, boolean keyed, DocValueFormat format,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
HDRPercentileRanksAggregator(String name, ValuesSource valuesSource, SearchContext context, Aggregator parent,
double[] percents, int numberOfSignificantValueDigits, boolean keyed, DocValueFormat format,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
super(name, valuesSource, context, parent, percents, numberOfSignificantValueDigits, keyed, format, pipelineAggregators,
metaData);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
import org.elasticsearch.search.aggregations.AggregatorFactory;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
import org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory;
import org.elasticsearch.search.aggregations.support.ValuesSourceConfig;
import org.elasticsearch.search.internal.SearchContext;
Expand All @@ -35,13 +34,13 @@
import java.util.Map;

class HDRPercentileRanksAggregatorFactory
extends ValuesSourceAggregatorFactory<ValuesSource.Numeric> {
extends ValuesSourceAggregatorFactory<ValuesSource> {

private final double[] values;
private final int numberOfSignificantValueDigits;
private final boolean keyed;

HDRPercentileRanksAggregatorFactory(String name, ValuesSourceConfig<Numeric> config, double[] values,
HDRPercentileRanksAggregatorFactory(String name, ValuesSourceConfig<ValuesSource> config, double[] values,
int numberOfSignificantValueDigits, boolean keyed, QueryShardContext queryShardContext,
AggregatorFactory parent, AggregatorFactories.Builder subFactoriesBuilder,
Map<String, Object> metaData) throws IOException {
Expand All @@ -61,7 +60,7 @@ protected Aggregator createUnmapped(SearchContext searchContext,
}

@Override
protected Aggregator doCreateInternal(Numeric valuesSource,
protected Aggregator doCreateInternal(ValuesSource valuesSource,
SearchContext searchContext,
Aggregator parent,
boolean collectsFromSingleBucket,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.InternalAggregation;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;

import java.io.IOException;
Expand All @@ -32,9 +32,9 @@

class HDRPercentilesAggregator extends AbstractHDRPercentilesAggregator {

HDRPercentilesAggregator(String name, Numeric valuesSource, SearchContext context, Aggregator parent, double[] percents,
int numberOfSignificantValueDigits, boolean keyed, DocValueFormat formatter,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
HDRPercentilesAggregator(String name, ValuesSource valuesSource, SearchContext context, Aggregator parent, double[] percents,
int numberOfSignificantValueDigits, boolean keyed, DocValueFormat formatter,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
super(name, valuesSource, context, parent, percents, numberOfSignificantValueDigits, keyed, formatter,
pipelineAggregators, metaData);
}
Expand Down
Loading