Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinning coreos-assembler in FCOS releases #1068

Open
dustymabe opened this issue Feb 20, 2020 · 18 comments
Open

Pinning coreos-assembler in FCOS releases #1068

dustymabe opened this issue Feb 20, 2020 · 18 comments
Labels
jira for syncing to jira

Comments

@dustymabe
Copy link
Member

Right now we have a testing stream and we have a stable stream. We let things bake in testing a few weeks before promoting to stable, except for all of the changes that have gone into COSA, which could be significant. Maybe we should add a tag to cosa on the day a testing release is run and use that same cosa image that was used for testing when we do the stable pipeline run.

@jlebon
Copy link
Member

jlebon commented Feb 20, 2020

Yeah, I think matching the cosa image makes sense. Hmm, I think we could have this work transparently by just using the image digest. E.g. when building stable, figure out from which testing release it's been promoted, then use the same cosa image digest that was used to build it.

Though this is dependent on the GC policy of Quay.io. If we build cosa in OpenShift, I think we might have more control.

@dustymabe
Copy link
Member Author

Though this is dependent on the GC policy of Quay.io. If we build cosa in OpenShift, I think we might have more control.

I think we'd just need to add a tag in quay.io whenever we do a testing build

@miabbott
Copy link
Member

miabbott commented Sep 1, 2020

There was an ad-hoc discussion yesterday where this topic came up; it was suggested we could experiment for a couple of releases with doing a git tag of cosa on each testing release and see if we end up using the same tag to do the stable release.

@jlebon
Copy link
Member

jlebon commented Jan 17, 2022

Another thing we could do is make the cosa tag to use live in https://github.com/coreos/fedora-coreos-config and have bump-lockfile also bump it.

Some advantages of this is that (1) cosa is "locked" by default which improves reproducibility, and (2) we transparently get cosa tag propagation as part of the regular promotion process. A disadvantage is that it introduces more delay between cosa changes and it affecting FCOS builds (though for hotfixes, we always have the override approach at job run time).

@bgilbert
Copy link
Contributor

Adding context from coreos/coreos-assembler#2644:

There's a second problem with the current situation: some cosa changes interlock with fedora-coreos-config changes, and sometimes the cosa side is accidentally merged before the f-c-c side reaches FCOS stable. This breaks the build for some or all production branches. The breakage might be fixed by routine branch promotion without anyone noticing, but it also impedes any out-of-cycle releases on existing branch heads that might be necessary before then. In addition, if testing gets broken, the next stable promotion will be broken also.

Another fix approach is to use separate cosa branches for each FCOS stream, similar to how RHCOS branches cosa for each release. Since the FCOS uses rolling branches, the cosa branches would also need to roll, using promotions similar to those in fedora-coreos-config.

In the shorter term, or if we don't want to address this in other ways, it would still be useful to catch FCOS branch breakage in cosa CI. It'd probably be sufficient to run an additional CI job that performs a complete build and test cycle against the fedora-coreos-config stable branch.

@bgilbert bgilbert changed the title Jan 17, 2022
@bgilbert bgilbert transferred this issue from coreos/fedora-coreos-streams Jan 17, 2022
@dustymabe
Copy link
Member Author

Another thing we could do is make the cosa tag to use live in https://github.com/coreos/fedora-coreos-config and have bump-lockfile also bump it.

I kind of like it.. though I think that means we'd need to create a tag (git tag) pretty much for every commit against the COSA repo and then do builds against that. Do I understand correctly?

Some advantages of this is that (1) cosa is "locked" by default which improves reproducibility, and (2) we transparently get cosa tag propagation as part of the regular promotion process. A disadvantage is that it introduces more delay between cosa changes and it affecting FCOS builds (though for hotfixes, we always have the override approach at job run time).

@cgwalters
Copy link
Member

I agree we've hit problems but so far they haven't been too hard to work around. My instinct here is to keep this very simple to start. A very simple thing would be (pretty sure I proposed this elsewhere):

  • When a FCOS branch succeeds, it updates a tagged cosa build for that branch to point to the cosa used successfully

Then, in a case like coreos/coreos-assembler#2643 we can just temporarily override the stable branch to use that cosa.

@bgilbert
Copy link
Contributor

bgilbert commented Jan 17, 2022

I agree the workarounds haven't been difficult so far. The problem is that we discover the issues while performing an out-of-cycle release, which is already a time-sensitive and chaotic process. And there's still the possibility of changes that don't cause build or CI breakage, but ship unbaked functionality directly to stable.

So, I don't have opinions about the technical means used to accomplish it, but I do think we should find a systematic solution here.

@dustymabe
Copy link
Member Author

Note: Most likely a solution to this will require us to have our multi-arch COSA build/pushing (#1027) fleshed out and working.

@jlebon
Copy link
Member

jlebon commented Jan 18, 2022

Another thing we could do is make the cosa tag to use live in coreos/fedora-coreos-config and have bump-lockfile also bump it.

I kind of like it.. though I think that means we'd need to create a tag (git tag) pretty much for every commit against the COSA repo and then do builds against that. Do I understand correctly?

Yeah, I think so. We'll have to investigate the Quay.io auto-GC options. Or maybe as you say if we do our own building and pushing, it wouldn't be hard to have that process manage the tags for us.

@cgwalters
Copy link
Member

cgwalters commented Apr 28, 2022

I just did

$ oc tag coreos-assembler:latest coreos-assembler:stable                                                                                                                                                                                        04/28/2022 04:54:38 PM
Tag coreos-assembler:stable set to coreos-assembler@sha256:a7441ebcd415b42c871e8f52c81bc1a194826f7e7f13ba506aa13e397b4541fa.
$ oc image info registry.ci.openshift.org/coreos/coreos-assembler:stable                                                                                                                                                                        04/28/2022 04:55:37 PM
Name:        registry.ci.openshift.org/coreos/coreos-assembler:stable
Digest:      sha256:a7441ebcd415b42c871e8f52c81bc1a194826f7e7f13ba506aa13e397b4541fa
Media Type:  application/vnd.docker.distribution.manifest.v2+json
Created:     2d ago
...
$

What I'm thinking actually is that we should have a controller which queries for which coreos-assembler was used to build the FCOS stable job, and updates that tag or so.

@dustymabe
Copy link
Member Author

Note: Most likely a solution to this will require us to have our multi-arch COSA build/pushing (#1027) fleshed out and working.

Noting that at least this part should be unblocked now since #1027 is completed.

@jlebon jlebon added the meeting topics for meetings label Aug 17, 2022
@jlebon
Copy link
Member

jlebon commented Aug 17, 2022

Picking this up again since it's been unblocked now. @cgwalters' strawman seems like it would be simple enough to start. Translating it into more concrete terms:

  1. In the FCOS pipeline release job, we push a Quay.io tag like quay.io/coreos-assembler/coreos-assembler:fcos-${params.STREAM}.
  2. In the FCOS pipeline build job, we look up the correct tag to use for the stream being built (this would re-implement this knowledge, which ideally we'd export... into e.g. a JSON file maybe in the pipeline repo?).

The advantage of this over the cosa branches approach is that we're re-using the exact same image, whereas rebuilding a cosa git branch currently can yield different package sets each time. (Really tempting to start locking cosa...) The disadvantage is that if we really need to backport a patch, it wouldn't be as streamlined an operation. But I think we've very rarely had to do that; the primary issues have just been from using :main instead of an older image.

@dustymabe
Copy link
Member Author

We briefly discussed this in the community meeting today.

No one is opposed but can be tricky to do right. We're going to try to get together to work out some details and report a detailed proposal here.

@dustymabe dustymabe removed the meeting topics for meetings label Aug 24, 2022
@dustymabe
Copy link
Member Author

dustymabe commented Aug 25, 2022

Part of the problem is that the cosa tag in quay could get updated in the middle of a run, so pushing a quay.io tag in the release job would be too late I think.

How about we create a new toplevel job that then triggers build.Jenkinsfile. This new toplevel job is only run manually (i.e. when we manually run prod builds). This job will:

  • For the testing run (same for next):
    • create a quay.io/coreos-assembler/coreos-assembler-staging:fcos-testing tag based on current quay.io/coreos-assembler/coreos-assembler:main.
    • launch the build job for testing using this new container tag
  • For the stable run:
    • create a quay.io/coreos-assembler/coreos-assembler-staging:fcos-stable tag based on current quay.io/coreos-assembler-staging/coreos-assembler:fcos-testing.

This kind of requires that stable run before testing in the triple release workflow, otherwise the fcos-testing tag would have been updated with today's update rather than the one from two weeks ago. One idea here is to run all 3 builds as part of this job (i.e. one job triggers the build job for stable, testing and next). That way the ordering is enforced.

We could also verify that the prod streams use the right COSA image by codifying it in the build.Jeninsfile job.

There are still some details to be worked out here, but I think there is a working solution in here somwehere.

@bgilbert
Copy link
Contributor

I'm not thrilled by the ordering dependency in that proposal, and we'd also have to handle out-of-cycle releases, which shouldn't update the cosa pins.

Semantically, we're trying to augment the fedora-coreos-config promotion process to pin cosa containers, right? Could we just do this directly in the f-c-c promotion job for next and testing? It could tag cosa main to prevent GC, and write the tag into a file in the branch. Further promotions and out-of-cycle builds would then automatically DTRT, and the development streams could continue to build with main.

@dustymabe
Copy link
Member Author

dustymabe commented Aug 26, 2022

I'm not thrilled by the ordering dependency in that proposal, and we'd also have to handle out-of-cycle releases, which shouldn't update the cosa pins.

My proposal on how to handle that is to update the build job to default to quay.io/coreos-assembler-staging/coreos-assembler:fcos-${STREAM} for prod streams.

For out-of-cycle releases you just run the build job directly with the stream selected and all other defaults (basically what you do today). No shuffling of tags happens for the out-of-cycle releases. This assumes a fix in COSA wasn't needed for the out-of-cycle release, which is a safe assumption 99% of the time I think.

Semantically, we're trying to augment the fedora-coreos-config promotion process to pin cosa containers, right? Could we just do this directly in the f-c-c promotion job for next and testing? It could tag cosa main to prevent GC, and write the tag into a file in the branch. Further promotions and out-of-cycle builds would then automatically DTRT, and the development streams could continue to build with main.

The reason I think doing it in a pipeline job is best is just because we already have the credentials there and the tools there to do everything. Rather than wiring that up in a GH action.

@bgilbert
Copy link
Contributor

The reason I think doing it in a pipeline job is best is just because we already have the credentials there and the tools there to do everything. Rather than wiring that up in a GH action.

Hmm, I'm not sure I understand that. I think it's worth a moderate amount of code complexity in exchange for the operational flexibility of running builds in any order. And the build job proposal sounds like a larger change in any event. I think my proposal basically reduces to:

if [[ $src_branch =~ ^testing-devel|next-devel$ ]]; then
    repo="quay.io/coreos-assembler/coreos-assembler"
    manifest=$(skopeo inspect "docker://${repo}:latest" | jq -r .Digest)
    tag=fcos-$(echo "${manifest}" | cut -f2 -d: | cut -c-10)
    skopeo copy -a "docker://${repo}@${manifest}" "docker://${repo}:${tag}"
    echo "${repo}:${tag}" > .cosa-ref
    git add .cosa-ref
fi

plus credential handling similar to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

5 participants