[QTL] Make URI Exctraction Namespace take more sane arguments #2738

drcrallen · 2016-03-25T18:21:28Z

Fixes Better URI namespaced lookups behavior with single files #2669

drcrallen · 2016-04-06T16:17:25Z

This is incompatible but the static config is probably going away anyways.

nishantmonu51 · 2016-04-18T12:38:30Z

...up/src/main/java/io/druid/server/namespace/cache/OffHeapNamespaceExtractionCacheManager.java

@@ -119,6 +119,7 @@ protected boolean swapAndClearCache(String namespaceKey, String cacheKey)

      final String priorCache = currentNamespaceCache.put(namespaceKey, swapCacheKey);
      if (priorCache != null) {
+        // TODO: resolve what happens here if query is actively going on


please file a github issue for this if there is none already.

added #2863

@drcrallen @nishantmonu51 you could use a reference counting mechanism similar to segments to avoid closing the resources in query is in flight.

Eventually, yes. Unless you have some sort of insight, though, this is not straight forward. The issue is that queries do not have a concept of resources associated with them. There are some runners which wrap resources, but there isn't a general concept of query resources that need to be accounted for.

There are quite a few resources that get tied to a particular query though, so it might be worthwhile to add such a convention.

nishantmonu51 · 2016-04-18T12:40:57Z

👍

b-slim · 2016-04-20T20:34:12Z

docs/content/development/extensions-core/namespaced-lookup.md

 |Property|Description|Required|Default|
 |--------|-----------|--------|-------|
 |`namespace`|The namespace to define|Yes||
 |`pollPeriod`|Period between polling for updates|No|0 (only once)|
-|`versionRegex`|Regex to help find newer versions of the namespace data|Yes||
+|`uri`|URI for the file of interest|No|Use `uriPrefix`|
+|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`|


can this read the most recent file under a prefix ? imagine i have an HDFS dir where prefix is /year/day/hour/lookupDir/

Yes, that's part of io.druid.storage.hdfs.HdfsFileTimestampVersionFinder which is used to discriminate among multiple files matching the pattern in uriPrefix

can we document this ?

added some documentation

drcrallen · 2016-04-25T18:57:54Z

@b-slim this should be fixed now. Any other comments?

b-slim · 2016-04-27T14:57:03Z

docs/content/development/extensions-core/namespaced-lookup.md

-|`versionRegex`|Regex to help find newer versions of the namespace data|Yes||
+|`uri`|URI for the file of interest|No|Use `uriPrefix`|
+|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`|
+|`fileRegex`|Optional regex for matching the file name under `uriPrefix`. Only used if `uriPrefix` is used|No|`".*"`|
 |`namespaceParseSpec`|How to interpret the data at the URI|Yes||


call it lookupParseSpec ?

Out of scope here

b-slim · 2016-04-27T15:08:57Z

@drcrallen please see the comments. My main concern i am not sure why this PR is not implementing the new interfaces ?

drcrallen · 2016-04-27T16:21:04Z

@b-slim because the move to the new interface is in a single PR with no other modifications. I'm not going to modify a bunch of functionality in a PR as big as the lookup framework migration one.

b-slim · 2016-04-27T16:29:03Z

@drcrallen can we have an issue to track this by outlining the plan ?

drcrallen · 2016-04-27T16:29:31Z

@b-slim I'm good for that. Give me 15 mins to get it written up

drcrallen · 2016-05-02T16:39:03Z

@b-slim any more comments here?

b-slim · 2016-05-02T16:58:37Z

docs/content/development/extensions-core/namespaced-lookup.md


-The `versionRegex` value specifies a regex to use to determine if a filename in the parent path of the uri should be considered when trying to find the latest version. Omitting this setting or setting it equal to `null` will match to all files it can find (equivalent to using `".*"`). The search occurs in the most significant "directory" of the uri.
+The `pollPeriod` value specifies the period in ISO 8601 format between checks for updates. If the source of the lookup is capable of providing a timestamp, the lookup will only be updated if it has changed since the prior tick of `pollPeriod`. A value of 0, an absent parameter, or `null` all mean populate once and do not attempt to update. Whenever an update occurs, the updating system will look for a file with the most recent timestamp and assume that one with the most recent data.


reading this i can not tell if the update is incremental or is swapping the entire cache. Can we make it more clear please ?

Explained more here, please look.

b-slim · 2016-05-02T19:48:33Z

LGTM 👍 , i would recommend one more ut where we have new lookup file creation to test the get new version path, but is is not a blocker. it is up to the author to add it or not. (please squash)

drcrallen · 2016-05-02T19:51:32Z

@b-slim IMPLs for io.druid.data.SearchableVersionedDataFinder do have tests for finding new versions. If there are things not covered by those tests, or if you're looking for more integrations with io.druid.data.SearchableVersionedDataFinder impls, please let me know what you're looking for and I can add it in another PR.

drcrallen · 2016-05-02T19:53:37Z

Thanks @b-slim !

b-slim · 2016-05-02T22:02:22Z

@drcrallen thought this can go in one commit ?

drcrallen · 2016-05-02T22:07:53Z

@b-slim like this? 6b957aa

b-slim · 2016-05-02T22:17:12Z

@drcrallen my bad but in the top of this pr i can see 5 commits and thought that's what is going to be in the master.

drcrallen · 2016-05-02T22:19:19Z

@b-slim nope! magic! https://groups.google.com/d/msg/druid-development/f_-1WloZ5uU/EVFxhnGxAgAJ

gianm · 2016-05-06T19:09:10Z

...ce-lookup/src/main/java/io/druid/server/namespace/URIExtractionNamespaceFunctionFactory.java

+            versionRegex = null;
+          }
+        } else {
+          final Path filePath = Paths.get(extractionNamespace.getUri());


With an s3 lookup, this Paths.get call throws this exception for me,

java.nio.file.FileSystemNotFoundException: Provider "s3" not installed at java.nio.file.Paths.get(Paths.java:147) ~[?:1.8.0_66] at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:154) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT] at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:121) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT] at io.druid.server.namespace.cache.NamespaceExtractionCacheManager$5.run(NamespaceExtractionCacheManager.java:364) [druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT] at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_66] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_66] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_66] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]

Make URI Exctraction Namespace take more sane arguments

608d568

* Fixes apache#2669

drcrallen added the Improvement label Mar 25, 2016

drcrallen added this to the 0.9.1 milestone Mar 25, 2016

drcrallen changed the title ~~Make URI Exctraction Namespace take more sane arguments~~ [QTL] Make URI Exctraction Namespace take more sane arguments Mar 25, 2016

drcrallen added Incompatible and removed Incompatible labels Apr 6, 2016

nishantmonu51 reviewed Apr 18, 2016
View reviewed changes

drcrallen mentioned this pull request Apr 20, 2016

Investigate off-heap lookup caching potential race on cache swapping. #2863

Closed

b-slim reviewed Apr 20, 2016
View reviewed changes

drcrallen added 3 commits April 21, 2016 15:10

Update docs

04a681f

Rename error message

2f02446

Undo overzealous deletion of docs

3ed2cf2

b-slim reviewed Apr 27, 2016
View reviewed changes

drcrallen mentioned this pull request Apr 27, 2016

[QTL] Immediate future plans #2889

Closed

b-slim reviewed May 2, 2016
View reviewed changes

Explain caching mechanism a bit more in docs

47a4148

drcrallen merged commit 6b957aa into apache:master May 2, 2016

drcrallen deleted the lookupExtensionImprovements branch May 2, 2016 19:54

gianm reviewed May 6, 2016
View reviewed changes

gianm mentioned this pull request May 27, 2016

Wires crossed on lookup caching #3031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QTL] Make URI Exctraction Namespace take more sane arguments #2738

[QTL] Make URI Exctraction Namespace take more sane arguments #2738

drcrallen commented Mar 25, 2016

drcrallen commented Apr 6, 2016

nishantmonu51 Apr 18, 2016

drcrallen Apr 20, 2016

xvrl Apr 20, 2016

drcrallen Apr 20, 2016

nishantmonu51 commented Apr 18, 2016

b-slim Apr 20, 2016

drcrallen Apr 20, 2016

b-slim Apr 20, 2016

drcrallen Apr 21, 2016

drcrallen commented Apr 25, 2016

b-slim Apr 27, 2016

drcrallen Apr 27, 2016

b-slim commented Apr 27, 2016

drcrallen commented Apr 27, 2016

b-slim commented Apr 27, 2016

drcrallen commented Apr 27, 2016

drcrallen commented May 2, 2016

b-slim May 2, 2016

drcrallen May 2, 2016

b-slim commented May 2, 2016 •

edited

Loading

drcrallen commented May 2, 2016

drcrallen commented May 2, 2016

b-slim commented May 2, 2016

drcrallen commented May 2, 2016

b-slim commented May 2, 2016

drcrallen commented May 2, 2016

gianm May 6, 2016


		The `versionRegex` value specifies a regex to use to determine if a filename in the parent path of the uri should be considered when trying to find the latest version. Omitting this setting or setting it equal to `null` will match to all files it can find (equivalent to using `".*"`). The search occurs in the most significant "directory" of the uri.
		The `pollPeriod` value specifies the period in ISO 8601 format between checks for updates. If the source of the lookup is capable of providing a timestamp, the lookup will only be updated if it has changed since the prior tick of `pollPeriod`. A value of 0, an absent parameter, or `null` all mean populate once and do not attempt to update. Whenever an update occurs, the updating system will look for a file with the most recent timestamp and assume that one with the most recent data.

[QTL] Make URI Exctraction Namespace take more sane arguments #2738

[QTL] Make URI Exctraction Namespace take more sane arguments #2738

Conversation

drcrallen commented Mar 25, 2016

drcrallen commented Apr 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nishantmonu51 commented Apr 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Apr 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-slim commented Apr 27, 2016

drcrallen commented Apr 27, 2016

b-slim commented Apr 27, 2016

drcrallen commented Apr 27, 2016

drcrallen commented May 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-slim commented May 2, 2016 • edited Loading

drcrallen commented May 2, 2016

drcrallen commented May 2, 2016

b-slim commented May 2, 2016

drcrallen commented May 2, 2016

b-slim commented May 2, 2016

drcrallen commented May 2, 2016

Choose a reason for hiding this comment

b-slim commented May 2, 2016 •

edited

Loading