Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer refactor #2029

Merged
merged 5 commits into from
Mar 26, 2024
Merged

Layer refactor #2029

merged 5 commits into from
Mar 26, 2024

Conversation

ambarve
Copy link
Contributor

@ambarve ambarve commented Feb 14, 2024

Currently container image layers are represented as an array of strings everywhere in the hcsshim code. This representation is good enough for WCIFS based layers, however, some other types of layers (for e.g. CimFS) can not be cleanly represented in this format. This has already made some of the layer management code hacky and adding newer types of layers is getting more difficult. We need a better way of representing different types of layers.

Current workflow is that containerd forwards the layer information via a RootFS mount to the shim. Shim parses that mount and fills the layer folders into the spec.Windows.LayerFolders field in the container spec. (In some cases the spec.Windows.LayerFolders is already populated and no RootFS mount is passed). This LayerFolders array is then used in multiple packages within the shim code. This refactoring makes following changes to this workflow:

  1. We parse the RootFS mount (or spec.Windows.LayerFolders) into a Go type that represents a specific type of layers (e.g. WCIFS, CimFS etc.)
  2. We add separate functions that know how to mount each type of a layer.
  3. Lastly, once the layers are mounted, we return a separate struct that provides all the information about these mounted layers. (Code that starts the container or prepares the HCS container doc, should only care about where the container rootfs is mounted, instead of how it is mounted)

With these changes adding new type of layers or making changes to exiting layer functionality is much easier.

The PR is broken into 4 commits for easier review:

  1. First commit simply moves the LCOW & WCOW layer mounting code into their own files.
  2. Second commit adds the Go types for different types of layers and corresponding parsing functions.
  3. Third commit refactors the hcsshim code to use the newly added types for mounting layers.
  4. Lastly, we also update the layer import interface to explicitly take in the CIM paths instead of generating them from the layer paths.

Future work:

  1. We are only focusing on WCOW for now but similar refactoring will be done for LCOW in the future.
  2. Code that handled the cleanup of CimFS layers during shim delete is completely removed. The assumption that shim delete will be issued by containerd for every container is incorrect (Containerd doesn't issue shim delete for containers anymore. containerd/containerd#9727) so this code wasn't doing much anyway. A new PR will be made to handle this cleanup properly.

cmd/containerd-shim-runhcs-v1/task_hcs.go Outdated Show resolved Hide resolved
cmd/containerd-shim-runhcs-v1/task_hcs.go Show resolved Hide resolved
internal/layers/layers_wcow.go Outdated Show resolved Hide resolved
@ambarve ambarve force-pushed the layer_refactor branch 6 times, most recently from 8bb7864 to 7d729e5 Compare February 27, 2024 19:39
@ambarve ambarve force-pushed the layer_refactor branch 4 times, most recently from 388d8c0 to 67b8035 Compare February 28, 2024 23:45
@ambarve ambarve marked this pull request as ready for review February 29, 2024 18:29
@ambarve ambarve requested a review from a team as a code owner February 29, 2024 18:29
internal/hcsoci/create.go Show resolved Hide resolved
internal/layers/wcow_parse.go Show resolved Hide resolved
// entry in the list.
scratchLayer = filepath.Join(scratchLayer, "vm")
scratchVHDPath := filepath.Join(scratchLayer, "sandbox.vhdx")
if err = os.MkdirAll(scratchLayer, 0777); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem right that we create directories or copy vhd as part of Parse function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, this is just for backwards compatibility so we can remove this code in the future. Plus, if we don't create these files here, we will have to pass all the layer paths to the UVM creator function (via the WCOWBootFiles struct) which I am trying to avoid. What do you think? Let's ask @kevpar too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree having this in a function named Parse is non-ideal, but it is work we want to get out of the way initially so we don't have to worry about it later. Maybe there is a better name for this function which conveys that it does some setup work too?

cmd/containerd-shim-runhcs-v1/delete.go Show resolved Hide resolved
internal/layers/helpers.go Outdated Show resolved Hide resolved
@ambarve ambarve force-pushed the layer_refactor branch 3 times, most recently from 6c17a7a to e9720ad Compare March 8, 2024 07:36
Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, I think it would have been better to do the CimFS UVM code deletion as a separate PR. It doesn't really relate to layer refactoring, as far as I'm aware?

Comment on lines +126 to +127
// TODO(ambarve):
// correctly handle cleanup of cimfs layers in case of shim process crash here.
Copy link
Member

@kevpar kevpar Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current thinking here? Are we still evaluating with containerd if the new deletion behavior should be changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior of container was actually due to a bug, that means with 2.0 onwards we need to persist layer data for each container started within a shim and properly clean it up during shim delete call. However, that's a medium/large change so it needs to be handled in a separate PR.

Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change need to be backported to 0.12, or can we get it into 0.13 for containerd 2.1?

@ambarve ambarve force-pushed the layer_refactor branch 2 times, most recently from 5e3ccbe to b914759 Compare March 14, 2024 21:33
@ambarve
Copy link
Contributor Author

ambarve commented Mar 18, 2024

For future reference, I think it would have been better to do the CimFS UVM code deletion as a separate PR. It doesn't really relate to layer refactoring, as far as I'm aware?

It was related in the sense that we didn't need the GetCimPathFromLayer etc. functions after this refactor. But yeah, it could have been removed in a separate PR too.

Will this change need to be backported to 0.12, or can we get it into 0.13 for containerd 2.1?

For now we are aiming for containerd 2.1 with a 0.13 tag on hcsshim because we don't have any strong reason to backport this to 0.12, we can backport in the future if required.

Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending one change to the defer block in storage.go.

My understanding is the following work will be done as follow-up PRs:

  • Refactor the containerd mount parsing code into the containerd repo.
  • Properly track mounted layers for later removal on shim delete call.

internal/wclayer/cim/process.go Show resolved Hide resolved
internal/jobcontainers/storage.go Show resolved Hide resolved
As we refactor the layer management code, it is easier to keep LCOW & WCOW layer management code
in their own separate files.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Adds a set of functions that can parse layers or rootfs mounts provided by containerd into
structs that can be later used for mounting layers. Primary purpose of this change is to
remove restriction of always representing layers as an array of strings.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
This commit uses the newly added WCOW layer parsers and the new type for representing
mounted WCOW layers. LCOW functions are also moved around (and renamed) to follow similar
style as that of WCOW functions.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Current layer writer interface forces us to calculate the CIM path from the layer path by
making assumptions about CIM storage. This isn't a very good approach, better way is to be
explicit about what information the layer writer needs from containerd. This change
updates the CIM layer writer to take in layer CIM & parent CIM paths as inputs. This also
means a corresponding changes needs to be made in containerd.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
CimFS is currently not supported with HyperV isolation. However, we still have code that
handles processing of UtilityVM layer during image import. After the layer refactoring
change we need to update this code as well. But since this code isn't being used anywhere
updating it doesn't make much sense. There are no tests for this either. This code is
removed for now and we can add it back later once the plan for running HyperV isolated
containers with CimFS is more solid.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
@ambarve ambarve merged commit 1d406d0 into microsoft:main Mar 26, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants