Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpAdvancedIT fails with Keyclaok container insufficient memory error in Podman Jenkins jobs #2106

Open
jcarranzan opened this issue Oct 21, 2024 · 10 comments
Labels
priority/urgent Issue that needs to be resolved by next RHBQ release.

Comments

@jcarranzan
Copy link
Contributor

It's just failing in Podman Jenkins jobs and HttpAdvancedIT, but we use the same Keycloak service container for other tests in http-module.

Some points about my previous investigation:

@michalvavrik
Copy link
Member

  • IIUC Keycloak consumes all available memory it needs, therefore we need to check that Set memory limit for Keycloak container quarkus-test-framework#1351 has effect and it reduces available memory inside container. Let's try to set it to 1 MiB and expect failure. Let's try to set it to raise default value in failing tests. Trouble with settings Java heap limit is that metaspace could still raise and that is why I think it is preferable to limit total container memory
  • we need to test how much memory is available on the executor in total before this test is started, so that we know whether there is not something leaking before this test starts

@jcarranzan
Copy link
Contributor Author

Summary of Investigations:

  1. Attempt with 1 MiB Memory Limit:
    @KeycloakContainer( command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 1 )
    Outcome:
    The Keycloak container failed to start because the minimum required memory is 6 MiB.
    I then increased the memory limit to 10 MiB:
    @KeycloakContainer( command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 10 )
    but the container crashed with the following error:
2024-10-21 12:54:04,523 ERROR [tc.qua.io/.0] (main) Could not start container: java.lang.IllegalStateException: Wait strategy failed. Container crashed with out-of-memory (OOMKilled)
2024-10-21 12:54:04,527 ERROR [tc.qua.io/.0] (main) Log output from the failed container:
/opt/keycloak/bin/kc.sh: line 169:    84 Killed                  'java' -Dkc.config.build-and-exit=true -Djava.util.concurrent.ForkJoinPool.common.threadFactory=io.quarkus.bootstrap.forkjoin.QuarkusForkJoinWorkerThreadFactory '-XX:MetaspaceSize=96M' '-XX:MaxMetaspaceSize=256m' '-Dfile.encoding=UTF-8' '-Dsun.stdout.encoding=UTF-8' '-Dsun.err.encoding=UTF-8' '-Dstdout.encoding=UTF-8' '-Dstderr.encoding=UTF-8' '-XX:+ExitOnOutOfMemoryError' '-Djava.security.egd=file:/dev/urandom' '-XX:+UseG1GC' '-XX:FlightRecorderOptions=stackdepth=512' '-XX:MaxRAMPercentage=70' '-XX:MinRAMPercentage=70' '-XX:InitialRAMPercentage=50' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '--add-opens=java.base/java.security=ALL-UNNAMED' '-Duser.language=en' '-Duser.country=US' -Dkc.home.dir='/opt/keycloak/bin/..' -Djboss.server.config.dir='/opt/keycloak/bin/../conf' -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dpicocli.disable.closures=true -Dquarkus-log-max-startup-records=10000 -cp '/opt/keycloak/bin/../lib/quarkus-run.jar' io.quarkus.bootstrap.runner.QuarkusEntryPoint --profile=dev start-dev '--import-realm' '--hostname-strict=false'

  1. Attempt with Increased Memory Limit (2000 MiB):
    And I also modified the test to print the memory logs before container is starting

@KeycloakContainer(
    command = { "start-dev", "--import-realm", "--hostname-strict=false" },
    memoryLimitMiB = 2000
)

Executed in our Podman-based Jenkins job.
Log Jenkins job output:

13:16:02 [INFO] Running io.quarkus.ts.http.advanced.HttpAdvancedIT
13:16:02 === JVM Memory Info ===
13:16:02 Free Memory: 453 MB
13:16:02 Total Memory: 1788 MB
13:16:02 Max Memory: 1924 MB
13:16:02 === System Memory Info ===
13:16:02 Total Physical Memory: 7695 MB
13:16:02 Free Physical Memory: 1263 MB
13:16:02 === Disk Space Info ===
13:16:02 Total Disk Space: 40847 MB
13:16:02 Free Disk Space: 32480 MB
13:16:12 11:16:12,469 Container quay.io/keycloak/keycloak:25.0 is starting: 410b311947561a85de756b388a52cf266b40f549eb1976b4155f44f034d4a015
13:21:12 11:21:12,785 Could not start container: java.lang.IllegalStateException: Wait strategy failed. Container exited with code 1
13:21:12 	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:533)
13:21:12 	at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:354)
3:21:12 11:21:12,813 Log output from the failed container:
13:21:12 OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006af400000, 4034920448, 0) failed; error='Not enough space' (errno=12)
13:21:12 #
13:21:12 # There is insufficient memory for the Java Runtime Environment to continue.
13:21:12 # Native memory allocation (mmap) failed to map 4034920448 bytes. Error detail: committing reserved memory.
13:21:12 # An error report file with more information is saved as:
13:21:12 # /tmp/hs_err_pid84.log


@michalvavrik michalvavrik added the priority/urgent Issue that needs to be resolved by next RHBQ release. label Oct 21, 2024
@rsvoboda
Copy link
Member

What about " memoryLimitMiB = 768?"

What === JVM Memory Info === is printed out when .withProperty("JAVA_OPTS", "-Xms512m -Xmx1g") approach is used ?

@jcarranzan
Copy link
Contributor Author

  • Just with @KeycloakContainer(command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 768)

Output in Jenkins:

09:36:46 [INFO] Running io.quarkus.ts.http.advanced.HttpAdvancedIT
09:36:47 === JVM Memory Info ===
**09:36:47 Free Memory: 433 MB**
09:36:47 Total Memory: 1537 MB
09:36:47 Max Memory: 1924 MB
09:36:47 === System Memory Info ===
09:36:47 Total Physical Memory: 7695 MB
09:36:47 Free Physical Memory: 1254 MB
09:36:47 === Disk Space Info ===
09:36:47 Total Disk Space: 40847 MB
09:36:47 Free Disk Space: 32480 MB
  • With just withProperty("JAVA_OPTS", "-Xms512m -Xmx1g") approach:

Output in Jenkins:

0:01:47 === JVM Memory Info ===
**10:01:47 Free Memory: 553 MB**
10:01:47 Total Memory: 1664 MB
10:01:47 Max Memory: 1924 MB
10:01:47 === System Memory Info ===
10:01:47 Total Physical Memory: 7695 MB
10:01:47 Free Physical Memory: 1402 MB
10:01:47 === Disk Space Info ===
10:01:47 Total Disk Space: 40847 MB
10:01:47 Free Disk Space: 32480 MB

and test passed OK

@rsvoboda
Copy link
Member

and test passed OK

only the last one or both cases?

@jcarranzan
Copy link
Contributor Author

and test passed OK

only the last one or both cases?

Only the last one, with withProperty("JAVA_OPTS", "-Xms512m -Xmx1g")

@rsvoboda
Copy link
Member

Both are using similar memory, so what's the difference?

@rsvoboda
Copy link
Member

@mabartos do you have any idea why the withProperty("JAVA_OPTS", "-Xms512m -Xmx1g") approach would work better than your quarkus-qe/quarkus-test-framework#1351 change?

@michalvavrik
Copy link
Member

That's all interesting, but as far as actual failure goes, this sentence from the issue description is what matters most:

The issue occurs before tests start. The failure is nor related to the HttpAdvancedIT tests. When Keycloak container tries to allocate memory ,ther's already insufficient memory due to a resource leak.

Here is what you need to do: implement org.junit.platform.launcher.TestExecutionListener (here is example how I did it in TFW ), and print out memory information in executionFinished, executionStarted, testPlanExecutionFinished, testPlanExecutionStarted. Ideally save it so that you can scroll through results and identify exact place where memory jumped or wasn't cleaned.

@mabartos
Copy link

There are two things:

  1. What other services/processes are started when these tests are executed? Seems odd to me that for the test execution, there's available only 1250MB out of 8GB of the system memory. Is the container properly destroyed after the test class execution and does it release the memory? Does the issue occur even when only the HttpAdvancedIT test is executed in Jenkins? Or is it reproducible only when all these tests are executed and possibly some resources are held over the tests execution?
  2. It seems the container limit is not set, or it is not handled well in some cases from the testcontainers side. The maximum heap size for Keycloak in a container should be 70% of the whole available container memory. Specifically, when the limit is set, the cgroup memory.limit_in_bytes should be the limit - when limit is set to 1GB, the max heap size should be ~700MB. However, you can see in the log, that it tries to allocate ~4GB, which is weird.

After my experiments on my local machine, the container limit was always set and I haven't spotted any issue with that. Wondering why it behaves differently in Jenkins.

As values -Xms, and -Xmx are static (hardcoded), they are always respected.

@jcarranzan Until this issue is resolved, and your workaround sets these values, it'd be good to change the JAVA_OPTS to JAVA_OPTS_KC_HEAP(Keycloak takes it into consideration), in order to not override all JVM opts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/urgent Issue that needs to be resolved by next RHBQ release.
Projects
None yet
Development

No branches or pull requests

4 participants