Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid Lookups introspect keys and values endpoints do not return valid JSON #17361

Open
teyeheimans opened this issue Oct 16, 2024 · 6 comments

Comments

@teyeheimans
Copy link

Description

While analyzing the Lookup features of druid, I noticed that the keys and values endpoints for lookups do not return valid JSON.

https://druid.apache.org/docs/latest/querying/lookups#introspect-a-lookup

Example response:

"[20416, 20404, 20415, 02F440, 02F461, 20420, 02F402, 02F480, 20408, 20409, 20410, 20412, 20402, 02F421, 02F420, 20601, 02F601, 02F620, VODAFONE, CLARO]

It seems that all keys or values are just joined with , and wrapped between two square brackets.

Finally, the documentation seems incorrect on this page:
https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

It states:

Introspection to / returns the entire map. Introspection to /version returns the version indicator for the lookup.

However, /version does not seem to work and returns an 404.

Motivation

For as far as I know, all API endpoints return valid JSON. However, the introspect keys and values do not. This is incorrect in my opinion.

@ashwintumma23
Copy link
Contributor

Hi @teyeheimans,
What type of lookup are you creating?

Map Lookup

  • With the following configuration,
{
  "type": "map",
  "map": {
    "1": "One",
    "2": "Two",
    "3": "Three"
  }
}

I do see the key-value pairs, keys and values correctly, and formatted as a JSON

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/    
{"1":"One","2":"Two","3":"Three"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys   
[1, 2, 3]        

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values     
[One, Two, Three]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/version
-- Does not return anything                                                                                                                                         
  • /version endpoint is not implemented in MapLookupIntrospectionHandler ; hence, we do not see the response.

cachedNamespace Lookup

  • With the following configuration
{
  "type": "cachedNamespace",
  "extractionNamespace": {
    "type": "uri",
    "uri": "file:/tmp/sampleCSV.csv",
    "namespaceParseSpec": {
      "format": "csv",
      "columns": [
        "key",
        "value"
      ],
      "skipHeaderRows": 1
    },
    "pollPeriod": "PT30S"
  },
  "firstCacheTimeout": 0
}

I see all the endpoints returning responses:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/        
{"20":"Twenty","10":"Ten","30":"Thirty"}    
                                                                                                                                    
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/keys
["20","10","30"]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/values
["Twenty","Ten","Thirty"]

$ $ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/version
{"version":"1729184323236"}
  • One caveat to call out here is /version endpoint does not return the version which was set manually when lookup was being created, but the epoch time. I see version as v1 on the Console, but 1729184323236 on the Introspect API response.
    image

Thanks!

@teyeheimans
Copy link
Author

I am using a map lookup, just like you. Your example shows the problem already:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/    
{"1":"One","2":"Two","3":"Three"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys   
[1, 2, 3]        

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values     
[One, Two, Three]

The values returned in your example is NOT valid JSON. The values are not quoted. The correct response would be:

["One", "Two", "Three"]

Also, to check if it is valid JSON you could use jq:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values | jq '.'

This also happens when the keys are strings. So the keys and values endpoints of the introspect API's are NOT returning valid JSON.

Finally, the version endpoint does not seem to work (indeed). However, it is documented that it should be there, so the documentation seems to be incorrect. See this page at the bottom: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

@abhishekrb19
Copy link
Contributor

@teyeheimans, that does look like a bug. This is the relevant introspection code for map lookups: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/query/lookup/MapLookupExtractorFactory.java#L156.

I think getValues() response should just be map.values() instead of map.values().toString(), which would result in a String representation on the underlying collection. The same would apply to getKeys(). If that sounds about right, please feel free to raise a PR.

@abhishekrb19
Copy link
Contributor

Btw, you can directly query a map lookup in SQL: SELECT "k", "v" FROM "lookup"."mapLookup". This should return the keys and values in the correct string form. The Druid web-console uses SQL instead of API to introspect values when you open the lookup modal.

@ashwintumma23
Copy link
Contributor

Hi @abhishekrb19,

For the /version endpoint:

Documentation-wise

  • It is indeed specified on the lookups-cached-global page, but, I think we should update the documentation to explicitly state that it is available only for the lookups of type cachedNamespace. I can create a PR for this item.

Functionality-wise

@teyeheimans
Copy link
Author

I agree on what you describe. However, I am not familiar with the java-side of druid. We have created an PHP client for druid, see https://github.com/level23/druid-client.

Recently I have integrated support for lookup management. There I found out that the response of the keys and values endpoints do not return valid JSON (at least for the MAP lookup). If I just use the introspect endpoint, it does give me valid JSON. So this is wrong and is the reason why I started this topic.

Also, I find it strange that it is not possible to specify for all different types of lookups if the data is injective or not. Also strange is that the same injective functionality is called oneToOne in the kafka lookup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants