-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QTL] Implement LookupExtractorFactory of namespaced lookup #2926
Changes from 41 commits
14f4c88
7d75062
e4ae726
a525a5f
153925a
c5cc36a
3b45ae2
379af21
5b35e42
688e7c1
10fc4f8
f2b6864
00f42c1
4e91b13
c23e06d
1b3e6cc
8a77bc7
d780468
4216aa8
fe44182
4f89413
99e8ac2
4313a79
a7b35ce
2f97f9d
b0379b9
15dc879
cda32b3
ab2230c
f33ed53
7d5f681
e061eb6
423e392
25083a3
42bb4b2
ef0fab2
e772c5c
b2c7f96
bcccf12
db45e44
552114a
365d8f1
df6dfc4
e430113
6283dc2
c9db080
fa1c0c1
38ca68e
6762c91
2330549
15363e0
9900d99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,16 +11,22 @@ Lookups are an <a href="../development/experimental.html">experimental</a> featu | |
Make sure to [include](../../operations/including-extensions.html) `druid-namespace-lookup` as an extension. | ||
|
||
## Configuration | ||
<div class="note caution"> | ||
Static configuration is no longer supported. Only cluster wide configuration is supported | ||
</div> | ||
|
||
Cached namespace lookups are appropriate for lookups which are not possible to pass at query time due to their size, | ||
or are not desired to be passed at query time because the data is to reside in and be handled by the Druid servers, | ||
and are small enough to reasonably populate on a node. This usually means tens to tens of thousands of entries per lookup. | ||
|
||
Cached namespace lookups all draw from the same cache pool, allowing each node to have a fixed cache pool that can be used by namespace lookups. | ||
|
||
Namespaced lookups are appropriate for lookups which are not possible to pass at query time due to their size, | ||
or are not desired to be passed at query time because the data is to reside in and be handled by the Druid servers. | ||
Namespaced lookups can be specified as part of the runtime properties file. The property is a list of the namespaces | ||
described as per the sections on this page. For example: | ||
Cached namespace lookups can be specified as part of the [cluster wide config for lookups](../../querying/lookups.html) as a type of `cachedNamespace` | ||
|
||
```json | ||
druid.query.extraction.namespace.lookups= | ||
[ | ||
{ | ||
{ | ||
"type": "cachedNamespace", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we rename as well the .md file ? |
||
"extractionNamespace": { | ||
"type": "uri", | ||
"namespace": "some_uri_lookup", | ||
"uri": "file:/tmp/prefix/", | ||
|
@@ -33,7 +39,14 @@ described as per the sections on this page. For example: | |
}, | ||
"pollPeriod": "PT5M" | ||
}, | ||
{ | ||
"firstCacheTimeout": 0 | ||
} | ||
``` | ||
|
||
```json | ||
{ | ||
"type": "cachedNamespace", | ||
"extractionNamespace": { | ||
"type": "jdbc", | ||
"namespace": "some_jdbc_lookup", | ||
"connectorConfig": { | ||
|
@@ -46,10 +59,19 @@ described as per the sections on this page. For example: | |
"keyColumn": "mykeyColumn", | ||
"valueColumn": "MyValueColumn", | ||
"tsColumn": "timeColumn" | ||
} | ||
] | ||
}, | ||
"firstCacheTimeout": 120000, | ||
"oneToOne":true | ||
} | ||
``` | ||
|
||
The parameters are as follows | ||
|Property|Description|Required|Default| | ||
|--------|-----------|--------|-------| | ||
|`extractionNamespace`|Specifies how to populate the local cache. See below|Yes|-| | ||
|`firstCacheTimeout`|How long to wait (in ms) for the first run of the cache to populate. 0 indicates to not wait|No|`60000` (1 minute)| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does "first run of the cache" mean? just from reading the documentation it's unclear why one would want to wait, or what the use-case is. Also should would the term "delay" be more appropriate than "timeout" here (similar to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the cache populates within the timeout, then it will effectively be successful. If it does NOT populate within the timeout, then the starting of the extractor factory is considered a failure. |
||
|`oneToOne`|If the underlying map is injective (keys and values are unique) then optimizations can occur internally by setting this to `true`|No|`false`| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not call this isInjective to keep the same terminology throughout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
|
||
Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes: | ||
`druid-namespace-lookup` | ||
|
||
|
@@ -60,7 +82,7 @@ setting namespaces (broker, peon, historical) | |
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|`druid.query.extraction.namespace.cache.type`|Specifies the type of caching to be used by the namespaces. May be one of [`offHeap`, `onHeap`]. `offHeap` uses a temporary file for off-heap storage of the namespace (memory mapped files). `onHeap` stores all cache on the heap in standard java map types.|`onHeap`| | ||
|`druid.lookup.namespace.cache.type`|Specifies the type of caching to be used by the namespaces. May be one of [`offHeap`, `onHeap`]. `offHeap` uses a temporary file for off-heap storage of the namespace (memory mapped files). `onHeap` stores all cache on the heap in standard java map types.|`onHeap`| | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it might be nice to explain why u would use onHeap vs offHeap and the tradeoffs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added : df6dfc4 |
||
The cache is populated in different ways depending on the settings below. In general, most namespaces employ | ||
a `pollPeriod` at the end of which time they poll the remote resource of interest for updates. | ||
|
@@ -76,27 +98,25 @@ The remapping values for each namespaced lookup can be specified by a json objec | |
```json | ||
{ | ||
"type":"uri", | ||
"namespace":"some_lookup", | ||
"uri": "s3://bucket/some/key/prefix/renames-0003.gz", | ||
"namespaceParseSpec":{ | ||
"format":"csv", | ||
"columns":["key","value"] | ||
}, | ||
"pollPeriod":"PT5M", | ||
"pollPeriod":"PT5M" | ||
} | ||
``` | ||
|
||
```json | ||
{ | ||
"type":"uri", | ||
"namespace":"some_lookup", | ||
"uriPrefix": "s3://bucket/some/key/prefix/", | ||
"fileRegex":"renames-[0-9]*\\.gz", | ||
"namespaceParseSpec":{ | ||
"format":"csv", | ||
"columns":["key","value"] | ||
}, | ||
"pollPeriod":"PT5M", | ||
"pollPeriod":"PT5M" | ||
} | ||
``` | ||
|Property|Description|Required|Default| | ||
|
@@ -250,3 +270,7 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol | |
"pollPeriod":600000 | ||
} | ||
``` | ||
|
||
# Introspection | ||
|
||
Cached namespace lookups have introspection points at `/keys` and `/values` which return a complete set of the keys and values (respectively) in the lookup. Introspection to `/` returns the entire map. Introspection to `/version` returns the version indicator for the lookup, or a 404 on a race condition during a delete. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why would delete be a race condition? If the lookup is deleted it seems normal to return 404 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -77,5 +77,10 @@ | |
<version>3.0.1</version> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we move this version into the parent pom? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it need to? there are other extensions who have extension-specific (aka nowhere else in druid) library versions in their pom. Is there a reason why this one needs to be in parent pom? |
||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.easymock</groupId> | ||
<artifactId>easymock</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
</project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,6 @@ | |
import com.google.common.io.ByteSource; | ||
import com.google.common.io.LineProcessor; | ||
import com.metamx.common.parsers.Parser; | ||
|
||
import java.io.IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor nit, I don't think we need to change formatting here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
import java.util.Map; | ||
|
||
|
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we drop a phrase or 2 about what is cached means ? like
CachedLookup provides a global pool of memory to cache lookups ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding
Cached namespace lookups all draw from the same cache pool, allowing each node to have a fixed cache pool that can be used by namespace lookups.