Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QTL] Make URI Exctraction Namespace take more sane arguments #2738

Merged
merged 5 commits into from
May 2, 2016

Conversation

drcrallen
Copy link
Contributor

@drcrallen drcrallen added this to the 0.9.1 milestone Mar 25, 2016
@drcrallen drcrallen changed the title Make URI Exctraction Namespace take more sane arguments [QTL] Make URI Exctraction Namespace take more sane arguments Mar 25, 2016
@drcrallen
Copy link
Contributor Author

This is incompatible but the static config is probably going away anyways.

@@ -119,6 +119,7 @@ protected boolean swapAndClearCache(String namespaceKey, String cacheKey)

final String priorCache = currentNamespaceCache.put(namespaceKey, swapCacheKey);
if (priorCache != null) {
// TODO: resolve what happens here if query is actively going on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please file a github issue for this if there is none already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added #2863

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen @nishantmonu51 you could use a reference counting mechanism similar to segments to avoid closing the resources in query is in flight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, yes. Unless you have some sort of insight, though, this is not straight forward. The issue is that queries do not have a concept of resources associated with them. There are some runners which wrap resources, but there isn't a general concept of query resources that need to be accounted for.

There are quite a few resources that get tied to a particular query though, so it might be worthwhile to add such a convention.

@nishantmonu51
Copy link
Member

👍

|Property|Description|Required|Default|
|--------|-----------|--------|-------|
|`namespace`|The namespace to define|Yes||
|`pollPeriod`|Period between polling for updates|No|0 (only once)|
|`versionRegex`|Regex to help find newer versions of the namespace data|Yes||
|`uri`|URI for the file of interest|No|Use `uriPrefix`|
|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this read the most recent file under a prefix ? imagine i have an HDFS dir where prefix is /year/day/hour/lookupDir/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's part of io.druid.storage.hdfs.HdfsFileTimestampVersionFinder which is used to discriminate among multiple files matching the pattern in uriPrefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some documentation

@drcrallen
Copy link
Contributor Author

@b-slim this should be fixed now. Any other comments?

|`versionRegex`|Regex to help find newer versions of the namespace data|Yes||
|`uri`|URI for the file of interest|No|Use `uriPrefix`|
|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`|
|`fileRegex`|Optional regex for matching the file name under `uriPrefix`. Only used if `uriPrefix` is used|No|`".*"`|
|`namespaceParseSpec`|How to interpret the data at the URI|Yes||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it lookupParseSpec ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope here

@b-slim
Copy link
Contributor

b-slim commented Apr 27, 2016

@drcrallen please see the comments. My main concern i am not sure why this PR is not implementing the new interfaces ?

@drcrallen
Copy link
Contributor Author

@b-slim because the move to the new interface is in a single PR with no other modifications. I'm not going to modify a bunch of functionality in a PR as big as the lookup framework migration one.

@b-slim
Copy link
Contributor

b-slim commented Apr 27, 2016

@drcrallen can we have an issue to track this by outlining the plan ?

@drcrallen
Copy link
Contributor Author

@b-slim I'm good for that. Give me 15 mins to get it written up

@drcrallen
Copy link
Contributor Author

@b-slim any more comments here?


The `versionRegex` value specifies a regex to use to determine if a filename in the parent path of the uri should be considered when trying to find the latest version. Omitting this setting or setting it equal to `null` will match to all files it can find (equivalent to using `".*"`). The search occurs in the most significant "directory" of the uri.
The `pollPeriod` value specifies the period in ISO 8601 format between checks for updates. If the source of the lookup is capable of providing a timestamp, the lookup will only be updated if it has changed since the prior tick of `pollPeriod`. A value of 0, an absent parameter, or `null` all mean populate once and do not attempt to update. Whenever an update occurs, the updating system will look for a file with the most recent timestamp and assume that one with the most recent data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading this i can not tell if the update is incremental or is swapping the entire cache. Can we make it more clear please ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained more here, please look.

@b-slim
Copy link
Contributor

b-slim commented May 2, 2016

LGTM 👍 , i would recommend one more ut where we have new lookup file creation to test the get new version path, but is is not a blocker. it is up to the author to add it or not. (please squash)

@drcrallen
Copy link
Contributor Author

@b-slim IMPLs for io.druid.data.SearchableVersionedDataFinder do have tests for finding new versions. If there are things not covered by those tests, or if you're looking for more integrations with io.druid.data.SearchableVersionedDataFinder impls, please let me know what you're looking for and I can add it in another PR.

@drcrallen
Copy link
Contributor Author

Thanks @b-slim !

@drcrallen drcrallen merged commit 6b957aa into apache:master May 2, 2016
@drcrallen drcrallen deleted the lookupExtensionImprovements branch May 2, 2016 19:54
@b-slim
Copy link
Contributor

b-slim commented May 2, 2016

@drcrallen thought this can go in one commit ?

@drcrallen
Copy link
Contributor Author

@b-slim like this? 6b957aa

@b-slim
Copy link
Contributor

b-slim commented May 2, 2016

@drcrallen my bad but in the top of this pr i can see 5 commits and thought that's what is going to be in the master.

@drcrallen
Copy link
Contributor Author

versionRegex = null;
}
} else {
final Path filePath = Paths.get(extractionNamespace.getUri());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an s3 lookup, this Paths.get call throws this exception for me,

java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
        at java.nio.file.Paths.get(Paths.java:147) ~[?:1.8.0_66]
        at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:154) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
        at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:121) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
        at io.druid.server.namespace.cache.NamespaceExtractionCacheManager$5.run(NamespaceExtractionCacheManager.java:364) [druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_66]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_66]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_66]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants