Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-lambda: Log retention gives rate exceeded error #31338

Open
1 task done
Exter-dg opened this issue Sep 6, 2024 · 2 comments · May be fixed by #31340
Open
1 task done

aws-lambda: Log retention gives rate exceeded error #31338

Exter-dg opened this issue Sep 6, 2024 · 2 comments · May be fixed by #31340
Labels
@aws-cdk/aws-lambda Related to AWS Lambda bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@Exter-dg
Copy link

Exter-dg commented Sep 6, 2024

Describe the bug

Legacy log retention in Lambda gives a rate limit exceeded error.

We are in the process of upgrading our app from CDK v1 to v2. To test this, we created a new env in a new account and redeployed the configuration using CDK v1.

We are creating 70-80 lambdas with log retention enabled. The legacy log retention creates a custom lambda resource to create log group and set log retention. CDK V1 used to create Node 14 lambas for this purpose (for which the creation is blocked in AWS). Hence, we disabled the log retention and upgraded the stack to 2.151.0 and then enabled the log retention.

While doing so, our stack is failing with the error:

Received response status [FAILED] from custom resource. Message returned: Out of attempts to change log group

Initially we thought this is an issue with the “CreateLogGroup throttle limit in transactions per second” quota. We increased it to 80 from 10 but the issue still exists.

On exploring the cloudwatch logs for the custom lambda resource, we found:

2024-09-06T05:23:33.260Z	06a9833f-0ad3-4faf-8f94-aa78dd49d0ec	ERROR	{
  clientName: 'CloudWatchLogsClient',
  commandName: 'PutRetentionPolicyCommand',
  input: {
    logGroupName: '/aws/lambda/LogRetentionaae0aa3c5b4d-mE6Tt6xks1CB',
    retentionInDays: 1
  },
  error: ThrottlingException: Rate exceeded
      at de_ThrottlingExceptionRes (/var/runtime/node_modules/@aws-sdk/client-cloudwatch-logs/dist-cjs/index.js:2321:21)
      at de_CommandError (/var/runtime/node_modules/@aws-sdk/client-cloudwatch-logs/dist-cjs/index.js:2167:19)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
      at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/core/dist-cjs/index.js:165:18
      at async /var/runtime/node_modules/@aws-sdk/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
      at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
      at async /var/task/index.js:1:1148
      at async /var/task/index.js:1:2728
      at async y (/var/task/index.js:1:1046) {
    '$fault': 'client',
    '$metadata': {
      httpStatusCode: 400,
      requestId: 'e247739c-8ebb-40d3-b85e-293802a87e24',
      extendedRequestId: undefined,
      cfId: undefined,
      attempts: 3,
      totalRetryDelay: 466
    },
    __type: 'ThrottlingException'
  },
  metadata: {
    httpStatusCode: 400,
    requestId: 'e247739c-8ebb-40d3-b85e-293802a87e24',
    extendedRequestId: undefined,
    cfId: undefined,
    attempts: 3,
    totalRetryDelay: 466
  }
}

Looks like an issue with the rate limit for PutRetentionPolicyCommand. The service quota for the same cannot be changed. Our earlier implementation had one difference in how log retention was implemented.
The base property was enabled to apply a exponential backoff (probably to handle such cases). This is now deprecated and hence we removed it during our upgrade from CDK v1 to v2. The documentation for LogRetentionRetryOptions says that this was removed as it is handled differently in AWS SDK v3. Is this what is causing the issue? Should't CDK/ SDK handle the backoff in this case?

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

1.204.0

Expected Behavior

Log retention backoff should be handled internally

Current Behavior

Creating legacy log retention for multiple lambas together gives a rate limit exceeded error.

Reproduction Steps

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.108.1

Framework Version

No response

Node.js Version

v22.1.0

OS

MacOs

Language

TypeScript

Language Version

No response

Other information

No response

@Exter-dg Exter-dg added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 6, 2024
@github-actions github-actions bot added @aws-cdk/aws-lambda Related to AWS Lambda potential-regression Marking this issue as a potential regression to be checked by team member labels Sep 6, 2024
@rix0rrr rix0rrr removed the potential-regression Marking this issue as a potential regression to be checked by team member label Sep 6, 2024
rix0rrr added a commit that referenced this issue Sep 6, 2024
When the Log Retention Lambda runs massively parallel (on 70+ Lambdas
at the same time), it can run into throttling problems and fail.

Raise the retry count and delays:

- Raise the default amount of retries from 5 -> 10
- Raise the sleep base from 100ms to 1s.
- Change the sleep calculation to apply the 10s limit *after* jitter instead
  of before (previously, we would take a fraction of 10s; now we're
  taking a fraction of the accumulated wait time, and after calculating
  that limit it to 10s).

Fixes #31338.
@rix0rrr rix0rrr linked a pull request Sep 6, 2024 that will close this issue
1 task
@Exter-dg
Copy link
Author

Exter-dg commented Sep 6, 2024

@rix0rrr Is this related?
#24485

@khushail khushail added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Sep 6, 2024
@Exter-dg
Copy link
Author

We fixed it by increasing the value of maxRetries from 7 to 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda Related to AWS Lambda bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants