Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki helm chart 2.x to 3.x migration - Log retention issues #7827

Open
im-wanyama opened this issue Dec 1, 2022 · 4 comments
Open

Loki helm chart 2.x to 3.x migration - Log retention issues #7827

im-wanyama opened this issue Dec 1, 2022 · 4 comments

Comments

@im-wanyama
Copy link

Describe the bug

So im seeing this weird issue with loki after upgrading from the helm chart 2.12.2 to 3.4.2. Our current setup stores logs in our file system and I want to upgrade to an equivalant set up in 3.4.2, which i assume is the single binary/single tenant mode.

I can’t query old log messages despite pointing the new instance loki at the old directory that the older version of loki was using. Querying new log data still works. Querying also works if i downgrade to 2.x, I've also tried upgrading to the latest 2.x and then to 3.x and it yields the same result

This is the error i get when trying to query older data:

open /data/loki/chunks/fake/6c8c778b0292b2a0/MTg0ODdjM2RjNTM6MTg0ODgzMjI1M2Q6MTJmNGNmZTM=: no such file or directory

It appears that the way chunks are being stored has changed because the structure of our history logs aren't paritioned by using a tenant id called "fake" and there's not a directory with an alpha-numeric string like "6c8c778b0292b2a0" within the path of our historical chunk files.

Will it be possible to retain the ability to query our historical logs from the 2.x when upgrading to 3.x?

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki (2.6.1) helm : 2.12.2
  2. Started Promtail (2.6.1) helm: 6.6.2
  3. Query: run any query that would work on a cluster running 2.X then migrate to 3.4.2 and run that same query

Expected behavior
I expect to be able to query all data collected by promtail irrespective of the loki helm chart version

Environment:

  • Infrastructure: EKS, kubenetes - 1.22
  • Deployment tool: helm
@quaideman
Copy link

We are seeing the exact same thing, going from 2.16.0 to 3.8.0. @im-wanyama did you manage to get anywhere with this?

@quaideman
Copy link

I managed to "fix" the issue by changing:

loki.storage.filesystem.chunks_directory to /var/loki/loki/chunks/ (due to how the new chart mounts to a different mount path)

And copying over the same schema config from the 2.16.0 version. Seems to be the schema version param specifically, going to v12 is the thing breaking things. Not sure what we're missing out on in v12.

schemaConfig:
  configs:
    - from: "2020-10-24"
      index:
        period: 24h
        prefix: index_
      object_store: filesystem
      schema: v11 ## v12 breaks historical logs (anything using v11)
      store: boltdb-shipper

@DavidRayner
Copy link

DavidRayner commented Jan 11, 2023

I just tested a Loki install where I specified two custom schemas so that the v12 schema is adopted on a specific day. This worked and Loki queries are returning data across both schemas.

loki:
  storage:
    type: filesystem
    filesystem:
      chunks_directory: /var/loki/loki/chunks/
  schemaConfig:
    configs:
      # Keep using the previous schema until 11/01/2023
      - from: 2020-10-24
        store: boltdb-shipper
        object_store: filesystem
        schema: v11
        index:
          prefix: index_
          period: 24h
      # Adopt new schema on 11/01/2023
      - from: 2023-01-11
        store: boltdb-shipper
        object_store: filesystem
        schema: v12
        index:
          prefix: loki_index_
          period: 24h

It looks like schema v12 was created for the following reason:

New v12 schema optimized to better handle S3 prefix rate limits
#5054

@FalconerTC
Copy link

FalconerTC commented Feb 27, 2023

@DavidRayner How exactly did you align this transition? It seems like you have to cutover right at midnight, or manually delete v11 logs generated that day via the API. Otherwise queries that span between schema versions will fail with NoSuchKey error.

Edit: it just occurred to me that Loki will transition naturally if you set the new schema to be used in the future instead of on the day of deployment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants