Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run-all init fails with parallelism #3093

Open
gnuletik opened this issue Apr 23, 2024 · 16 comments
Open

run-all init fails with parallelism #3093

gnuletik opened this issue Apr 23, 2024 · 16 comments

Comments

@gnuletik
Copy link

Describe the bug

A clear and concise description of what the bug is.

When running the following command:

rm -rf **/.terragrunt-cache && terragrunt run-all init

I get one of the following error on some projects:

│ Error: Backend initialization required, please run "terraform init"
│
│ Reason: Initial configuration of the requested backend "s3"
│
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please
│ run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags
│ to
│ use the current configuration.
│
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.
╷
│ Error: Required plugins are not installed
│
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│   - registry.terraform.io/hashicorp/aws: there is no package for registry.terraform.io/hashicorp/aws 5.46.0 cached in .terraform/providers
│   - registry.terraform.io/hashicorp/http: there is no package for registry.terraform.io/hashicorp/http 3.4.2 cached in .terraform/providers
│   - registry.terraform.io/datadog/datadog: there is no package for registry.terraform.io/datadog/datadog 3.38.0 cached in .terraform/providers
│
│ Terraform uses external plugins to integrate with a variety of different
│ infrastructure services. To download the plugins required for this
│ configuration, run:
│   terraform init
╵

The workaround was to set TERRAGRUNT_PARALLELISM=1.

To Reproduce

I was not able to extract a small reproducible example.

Expected behavior

To not fail with parallelism enabled.

Versions

  • Terragrunt version: v0.57.6
  • Terraform version: v1.5.7
  • Environment details (Ubuntu 20.04, Windows 10, etc.): Mac OS 14.4.1
@gnuletik gnuletik added the bug Something isn't working label Apr 23, 2024
@levkohimins
Copy link
Contributor

Hi @gnuletik, most likely you are using the Provider Plugin Cache. Make sure to read Provider Caching.

@levkohimins levkohimins self-assigned this Apr 24, 2024
@gnuletik
Copy link
Author

Hi @levkohimins, thanks for the feedback !

I was using the Terraform's provider plugin cache but I disabled it since I migrated to Terragrunt's Provider Caching Server.

I was able to reproduce this issue with:

  • Terragrunt's provider caching
  • without any caching mechanism enabled

Also, previous releases of Terragrunt didn't have this issue because the parallelism was disabled during init (because of the Terraform's provider plugin cache issue).

@levkohimins
Copy link
Contributor

Hi @gnuletik,
It would be most effective if you make a minimal set of configuration files that will reproduce this issue and upload them to a public repository, because I cannot reproduce the issue from the data, you provided in the description. Thank you!

@gnuletik
Copy link
Author

I already tried to reproduce it without success as it only occurs on projects with multiple different modules / configuration and doesn't occurs 100% of the time (seems to be around 70%).
I'll try to find a way to reproduce it though.

@levkohimins
Copy link
Contributor

levkohimins commented Apr 24, 2024

To clarify, before the implementation of Terragrunt Provider Caching, with Provider Plugin Cache enabled and running run-all init command, Terragrunt automatically set TERRAGRUNT_PARALLELISM=1, but with the advent of Terragrunt Provider Caching, this behavior was removed.

Therefore, I asked you to check whether this feautre is enabled in the Terraform CLI config or in the environment variable

plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"
export TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache"

@gnuletik
Copy link
Author

@levkohimins Yes I can confirm that this configuration is not enabled.
Also, this behavior has been reproduced both locally and on our CI.

@ZachGoldberg ZachGoldberg added the terragrunt label Apr 25, 2024 — with Linear
@GMartinez-Sisti
Copy link

GMartinez-Sisti commented Jun 25, 2024

Hi, I'm getting the same behavior while using Provider Plugin Cache with Atlantis.

Atlantis has ATLANTIS_PARALLEL_POOL_SIZE=4. I'm not using run-all though, just plain run. Without Provider Plugin Cache this issue doesn't happen, but each plan will pull the providers on it's own. I'm seeing that 25% of the plans will fail with the same Error: Required plugins are not installed error.

UPD: interesting that some of the errors happen while running terragrunt show. While using atlantis, we need to run terragrunt plan [...] -out=file.json and then atlantis needs the terraform compatible terraform show from that file to report the status.

@levkohimins
Copy link
Contributor

Hi @GMartinez-Sisti, I would be happy to help you solve this issue, but I need a sample from you.
You could create it with docker-composer and put it in some repository, for example https://github.com/amontalban/terragrunt-issue-3076, once I can reproduce the issue locally I'll start working on a solution.

@GMartinez-Sisti
Copy link

Hi @GMartinez-Sisti, I would be happy to help you solve this issue, but I need a sample from you. You could create it with docker-composer and put it in some repository, for example https://github.com/amontalban/terragrunt-issue-3076, once I can reproduce the issue locally I'll start working on a solution.

This is a great starting point, I might be able to make it happen just by forking this repo and enabling the Provider Plugin Cache. Will let you know!

@ZachGoldberg ZachGoldberg removed bug Something isn't working terragrunt labels Jul 8, 2024
@HamoucheTF1
Copy link

Hi,

I wanted to ask if there have been any updates regarding this issue. I'm experiencing the same problem when running terragrunt plan in parallel with the Terragrunt provider cache enabled.

Any information or progress on this would be greatly appreciated.

Thank you!

@HamoucheTF1
Copy link

To provide more context, I am working on implementing Terragrunt drift detection. As part of this process, I run multiple terragrunt plan commands in parallel, which seems to trigger the error described in this issue.

@ajax-ryzhyi-r
Copy link

ajax-ryzhyi-r commented Jul 11, 2024

@GMartinez-Sisti @HamoucheTF1

I had the same issue with Atlantis. I saw the error Required plugins are not installed during drift detection runs for 2-3 modules out of more than 300 overall modules. I discovered that errors may arise when Atlantis runs plans for two dependent modules in parallel. The error Required plugins are not installed occurs when Terragrunt tries to parse dependencies. It seems there is a race condition where one Terragrunt process generates the .terraform.lock.hcl file while another process tries to initialize the module to get outputs.

I resolved it by adding execution order groups for modules in the Atlantis config. Now, plans for dependent modules do not run in parallel, and the issue has been resolved. In the last ~30 consecutive drift detection runs, this error has not occurred. Previously, each run contained 2-3 failed runs with this error. Hope it helps 🙃

@levkohimins Thanks a lot for the great feature for providers caching! It sped up our CI by more than 3 times 🔥

@HamoucheTF1
Copy link

Hi @ajax-ryzhyi-r ,

Thank you very much for your response and suggestion. By implementing execution order groups, I was able to resolve the errors with parallel plans. I haven't encountered any errors related to required plugins not being installed since making this change.

However, I am now facing a different issue. Specifically, I'm getting the following error related to the AWS TFLint plugin:

Failed to initialize plugins; fork/exec /path/to/.tflint.d/plugins/github.com/terraform-linters/tflint-ruleset-aws/0.30.0/tflint-ruleset-aws: text file busy
Have you by any chance come across this error? Any insights you could share would be greatly appreciated.

Thank you in advance for your help!

@levkohimins
Copy link
Contributor

Thank you @ajax-ryzhyi-r!

Terragrunt's provider cache is safe when used in parallel with the same provider cache directory, but this is not true if multiple Terragrunt processes are running with the same working directory.

@ajax-ryzhyi-r
Copy link

Hi @ajax-ryzhyi-r ,

Thank you very much for your response and suggestion. By implementing execution order groups, I was able to resolve the errors with parallel plans. I haven't encountered any errors related to required plugins not being installed since making this change.

However, I am now facing a different issue. Specifically, I'm getting the following error related to the AWS TFLint plugin:

Failed to initialize plugins; fork/exec /path/to/.tflint.d/plugins/github.com/terraform-linters/tflint-ruleset-aws/0.30.0/tflint-ruleset-aws: text file busy Have you by any chance come across this error? Any insights you could share would be greatly appreciated.

Thank you in advance for your help!

We have the same issue with tflint plugins, but we use a cache for these plugins (enabled via the TFLINT_PLUGIN_DIR environment variable), so we only face this issue when we update the plugin versions. We haven't found a solution for tflint concurrency yet. We just fill the cache with the new plugin versions before updating the versions in the tflint configs. Actually, it's not related to terragrunt - it's a tflint concurrency issue 🥲

@HamoucheTF1
Copy link

Hi @ajax-ryzhyi-r ,
Thank you very much for your response and suggestion. By implementing execution order groups, I was able to resolve the errors with parallel plans. I haven't encountered any errors related to required plugins not being installed since making this change.
However, I am now facing a different issue. Specifically, I'm getting the following error related to the AWS TFLint plugin:
Failed to initialize plugins; fork/exec /path/to/.tflint.d/plugins/github.com/terraform-linters/tflint-ruleset-aws/0.30.0/tflint-ruleset-aws: text file busy Have you by any chance come across this error? Any insights you could share would be greatly appreciated.
Thank you in advance for your help!

We have the same issue with tflint plugins, but we use a cache for these plugins (enabled via the TFLINT_PLUGIN_DIR environment variable), so we only face this issue when we update the plugin versions. We haven't found a solution for tflint concurrency yet. We just fill the cache with the new plugin versions before updating the versions in the tflint configs. Actually, it's not related to terragrunt - it's a tflint concurrency issue 🥲

Thank you @ajax-ryzhyi-r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants