REQUEST: Repository maintenance on `Benchmark Bare Metal Runners` #2331

XSAM · 2024-09-04T18:47:01Z

Affected Repository

https://github.com/open-telemetry/opentelemetry-go

Requested changes

Need to investigate the Error: No space left on device issue of this runner while initiating jobs. https://github.com/open-telemetry/opentelemetry-go/actions/runs/10705102088/job/29682643790

The runner would fail the job before doing any tasks, and the Go SIG cannot solve such a situation, as we lack the context of the running environment and don't have access to the bare metal machine.

Purpose

https://github.com/open-telemetry/opentelemetry-go needs a runnable benchmark runner to run benchmarks.

Repository Maintainers

@open-telemetry/go-maintainers

The text was updated successfully, but these errors were encountered:

XSAM · 2024-09-05T16:46:16Z

Now, the runner seems to work again. https://github.com/open-telemetry/opentelemetry-go/actions/runs/10715454343/job/29710949026

I am curious whether someone fixed the issue or the runner healed itself.

XSAM · 2024-09-09T17:35:32Z

We haven't encountered any issue like this recently. I will close this for now.

Feel free to re-open if other people encounter similar issues.

XSAM · 2024-09-12T16:53:08Z

It happens again:

https://github.com/open-telemetry/opentelemetry-go/actions/runs/10825818975

System.IO.IOException: No space left on device
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10818056607/job/30012831798

Warning: Failed to restore: ENOSPC: no space left on device, write

trask · 2024-09-17T14:24:51Z

cc @tylerbenson

also see https://cloud-native.slack.com/archives/C01NJ7V1KRC/p1725475267605189

tylerbenson · 2024-09-17T15:51:02Z

Some job is generating a lot of 1GB+ logs in the /tmp directory:

...
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5408 item_index=item_5408 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5536 item_index=item_5536 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5537 item_index=item_5537 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5538 item_index=item_5538 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5444 item_index=item_5444 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5539 item_index=item_5539 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5544 item_index=item_5544 a=test b=5 c=3 d=true
2024-09-09 INFO3 Load Generator Counter #0 batch_index=batch_5520 item_index=item_5520 a=test b=5 c=3 d=true
...

Perhaps the collector @codeboten?
Each job should really clean up the /tmp directory before or after executing. I'm not really sure how to enforce this better.

tylerbenson · 2024-09-17T15:53:33Z

Looks like this can be centralized:
https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job#triggering-the-scripts

tylerbenson · 2024-09-17T16:03:01Z

Alternatively the TC could decide to schedule a restart every week to ensure the /tmp directory is cleaned, perhaps on Sunday to reduce risk of interrupting an active test.

tylerbenson · 2024-09-17T16:21:27Z

For the time being, I followed this guide and added a script that executes find /tmp -user "ghrunner" -delete at the end of each job execution. We'll see if that helps.

tylerbenson · 2024-09-17T16:32:40Z

@XSAM It should be fixed now, but please reconsider running your performance job so frequently. It looks like your job takes over an hour to run. That is entirely too long to be run on ever merge to main. Remember, this is a single instance shared by all OTel projects. You should either make it run in under 15 minutes, or limit it to only run daily.

XSAM added the area/repo-maintenance Maintenance of repos in the open-telemetry org label Sep 4, 2024

XSAM changed the title ~~REQUEST: Repository maintenance on 'Benchmark Bare Metal Runners'~~ REQUEST: Repository maintenance on Benchmark Bare Metal Runners Sep 4, 2024

XSAM closed this as completed Sep 9, 2024

XSAM reopened this Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REQUEST: Repository maintenance on `Benchmark Bare Metal Runners` #2331

REQUEST: Repository maintenance on `Benchmark Bare Metal Runners` #2331

XSAM commented Sep 4, 2024

XSAM commented Sep 5, 2024

XSAM commented Sep 9, 2024

XSAM commented Sep 12, 2024

trask commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

REQUEST: Repository maintenance on Benchmark Bare Metal Runners #2331

REQUEST: Repository maintenance on Benchmark Bare Metal Runners #2331

Comments

XSAM commented Sep 4, 2024

Affected Repository

Requested changes

Purpose

Repository Maintainers

XSAM commented Sep 5, 2024

XSAM commented Sep 9, 2024

XSAM commented Sep 12, 2024

trask commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

tylerbenson commented Sep 17, 2024

REQUEST: Repository maintenance on `Benchmark Bare Metal Runners` #2331

REQUEST: Repository maintenance on `Benchmark Bare Metal Runners` #2331