Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__tenant_id__ label doesn't work for metric/stats queries #13259

Open
jtackaberry opened this issue Jun 18, 2024 · 4 comments
Open

__tenant_id__ label doesn't work for metric/stats queries #13259

jtackaberry opened this issue Jun 18, 2024 · 4 comments

Comments

@jtackaberry
Copy link

Describe the bug

A query such as {__tenant_id__="sometenant", service="someservice"} works with limited/filter type queries (inasmuch as log lines are returned) but no other query types return data.

This manifests in several ways:

  1. The volume histogram in Grafana shows "No logs volume available" even though log results are returned
  2. In the query builder, if you select __tenant_id__ first, you're not able to select any further labels from that tenant. (Ditto label browser.)
  3. Any metric query containing the __tenant_id__ label will return no results. For example bytes_over_time({__tenant_id__="sometenant", service="someservice"}[1m]) show "No data" in Grafana, even though again the non-metric query will return log lines.

To Reproduce

Steps to reproduce the behavior:

  1. Deploy Loki in multi-tenant mode with data coming in for multiple tenants
  2. Create a data source in Grafana that is mapped to multiple tenants (in the simplest case by having Grafana pass X-Scope-OrgID: tenant1|tenant2|tenant3 etc.)
  3. Open Grafana in Explore view, choose the multi-tenant Loki data source
  4. Open the label browser
  5. Choose __tenant_id__ and select tenant1 (or whatever)

Grafana will display Empty results, no matching label for {__tenant_id__="tenant1"}

Alternatively, you can try any of the other approaches mentioned above in the description section.

Expected behavior

In the label browser scenario, after selecting a __tenant_id__ label, I expect to see labels for that tenant.

More generally I expect any of the examples mentioned above to work when the __tenant_id__ label is given.

Environment:

  • Infrastructure: Kubernetes (EKS)
  • Deployment tool: Helm
  • Loki: v3.0.0
  • Architecture: arm64
  • Object Storage: S3
@JStickler
Copy link
Contributor

@jtackaberry Are your tenants all on the same cluster? Currently Loki supports cross-tenant queries but not cross-cluster queries.

@jtackaberry
Copy link
Author

@JStickler yep, same cluster.

It may be clearer to demonstrate this with logcli. Running inside the Kubernetes cluster and bypassing any ingress and authentication, talking directly to the query frontend.

We're going to do a stats query against {service=~".+"} using two tenants, alice and bob.

First let's establish that each tenant has data:

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="alice" stats '{service=~".+"}'
2024/06/25 00:22:28 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/index/stats?end=1719274948921675625&query=%7Bservice%3D~%22.%2B%22%7D&start=1719271348921675625
{
  bytes: 5.5GB
  chunks: 2612
  streams: 1857
  entries: 8987572
}

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="bob" stats '{service=~".+"}'
2024/06/25 00:22:49 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/index/stats?end=1719274969411742119&query=%7Bservice%3D~%22.%2B%22%7D&start=1719271369411742119
{
  bytes: 4.0GB
  chunks: 857
  streams: 289
  entries: 6796008
}

Next let's do the same stats query against both tenants simultaneously, as a multi-tenant query:

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="alice|bob" stats '{service=~".+"}'
2024/06/25 00:23:13 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/index/stats?end=1719274993136923957&query=%7Bservice%3D~%22.%2B%22%7D&start=1719271393136923957
{
  bytes: 9.8GB
  chunks: 3530
  streams: 2148
  entries: 16322619
}

It's not an exact sum, but I understand stats is more an approximation. Anyway both tenants are clearly being represented.

Now let's take the same multi-tenant query and add the __tenant_id__ label:

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="alice|bob" stats '{__tenant_id__="alice", service=~".+"}'
2024/06/25 00:23:28 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/index/stats?end=1719275008605432794&query=%7B__tenant_id__%3D%22alice%22%2C+service%3D~%22.%2B%22%7D&start=1719271408605432794
{
  bytes: 0B
  chunks: 0
  streams: 0
  entries: 0
}

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="alice|bob" stats '{__tenant_id__="bob", service=~".+"}'
2024/06/25 00:23:32 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/index/stats?end=1719275012900943019&query=%7B__tenant_id__%3D%22bob%22%2C+service%3D~%22.%2B%22%7D&start=1719271412900943019
{
  bytes: 0B
  chunks: 0
  streams: 0
  entries: 0
}

The presence of __tenant_id__ breaks the stats query. Here I would expect those stats queries to have (roughly) the same results as the first test block above (single-tenant query against each tenant respectively without __tenant_id__).

The same thing applies to volume queries.

An actual log search with __tenant_id__ does work though, results are returned:

$ ./logcli --addr="http://loki-query-frontend.loki.svc.cluster.local:3100" --org-id="alice|bob" query '{__tenant_id__="bob", service=~".+"}' | wc -l
2024/06/25 00:26:55 http://loki-query-frontend.loki.svc.cluster.local:3100/loki/api/v1/query_range?direction=BACKWARD&end=1719275215225588167&limit=30&query=%7B__tenant_id__%3D%22bob%22%2C+service%3D~%22.%2B%22%7D&start=1719271615225588167
2024/06/25 00:26:55 Common labels: {__tenant_id__="bob", environment="stg", service="bob", cluster="bob-stg", level="info", region="us-east-1"}
30

Hopefully that added detail is useful.

@d4nyll
Copy link

d4nyll commented Aug 20, 2024

I can confirm I am seeing what @jtackaberry described in point (2) above (In the query builder, if you select tenant_id first, you're not able to select any further labels from that tenant. (Ditto label browser.))

  1. ✅ Query with some label

    Screenshot 2024-08-20 at 09 53 41
  2. ❌ Query with only __tenant_id__ label

    Screenshot 2024-08-20 at 09 53 04

    Gives the error parse error : queries require at least one regexp or equality matcher that does not have an empty-compatible value. For instance, app=~".*" does not meet this requirement, but app=~".+" will and does not allow you to add additional labels via the Builder

  3. ✅ Query with some label and then adding the __tenant_id__ label after

    Screenshot 2024-08-20 at 09 54 40

Note that the __tenant_id__ value used in (2) and (3) are the same (foo)

  • Grafana version: v10.4.7 (8d068faebe)
  • Loki version: Helm chart v5.47.2, App version 2.9.6

@IntegersOfK
Copy link

IntegersOfK commented Sep 19, 2024

2 on this list is somewhat limiting for us at scale. We have hundreds of tenants and without being able to discriminate between them from the very first drop-down, subsequent queries are clunky.

But more than that, it's just confusing for users because it's a completely valid thing to click as the first drop-down. Workaround is to select literally any other label first (which then may or may not be available for the __tenant_id__ you intend to select).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants