Skip to content

DavideAG/Understanding-containers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Build Status CodeFactor

Understanding containers

Nowadays light virtualization is a weapon used by many. Microservice based architecture is increasingly used and containers are the backbone of it.

Software like Docker allow us to manage containers in a simple way but: how are they made? What features are really necessary for a process to be a container manager? It's time to get your hands dirty and make our homemade container.

The questions we're gonna try to resolve are:

  • How container engine really works?
  • How are containers created by "facilitators" like Docker?

First of all we have to start with some tools that we will use. Below is a brief summary of the tools (present in Linux) that we will use to build a container:

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. So, thanks by feature, we can limit what the "process can see". We will set up namespaces using Linux kernel syscalls. The namespaces man page tells us there are 3 system calls that make up the API:

pivot_root is a Linux API that changes the root mount in the mount namespace of the calling process.

Cgroups (control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc...) of a collection of processes.
There are two different versions of cgroups:

  • version 1
    originally written by Paul Menage and Rohit Seth, are based on a set of hierarchies. Each of them is composed by a set of cgroups arranged in a tree. Each hierarchy has an instance of the cgroup virtual filesystem associated with it. Each hierarchy is a partition of all tasks in the system.
  • version 2
    Based on a single process hierarchy where cgroups form a tree structure and every process in the system belongs to one and only one cgroup. All threads of a process belong to the same cgroup.

Dependencies

In order to compile Understanding-containers is necessary to install the following libraries:

  • libcap-dev
  • seccomp-dev
  • iptables-dev
~$  sudo apt install libcap-dev seccomp-dev iptables-dev -y

Compile

To create your homemade container you will need to compile the source code in the "src" directory. I personally recommend using cmake to do this. Move to the Understanding-containers directory and then run the following commands

~$  mkdir build && cd build
~$  cmake .. && make -j $(getconf _NPROCESSORS_ONLN)

Now in the build folder you'll have the executable ready to run! Here the help of the tool:

Usage: sudo ./MyDocker <options> <entrypoint>

<options> should be:
	- a	run all namespaces without the user namespace
	- U	run a user namespace using unprivileged container
	- c	cgrops used to limit resources.
		This command must be chained with at least one of:
		- M <memory_limit> 				[1-4294967296]		default: 1073741824 (1GB)
		- C <percentage_of_cpu_shares> 			[1-100]			default: 25
		- P <max_pids> 					[10-32768]		default: 64
		- I <io_weight> 				[10-1000]		default: 10

Feel the thrill of your new container now by running. An example of a command can be:

~$  sudo ./MyDocker -aUc -C 50 -I 20 -P 333 /bin/bash

In this case the following cgroup resource limits are applied:

Resource Applied value
memory_limit 1GB
cpu_shares 50
max_pids 333
io_weight 20

Now you'll be running bash inside your container. You can, for example, control the processes that are active inside it and notice how these are different from those of the host machine.

When you want you can finish your container killing the process of his bash exit

Tree of the directors of this repository

The folders in this repository are:

β”œβ”€β”€ cmake
β”œβ”€β”€ root_fs
β”œβ”€β”€ src
β”‚Β   β”œβ”€β”€ capabilities
β”‚Β   β”œβ”€β”€ helpers
β”‚Β   β”œβ”€β”€ namespaces
β”‚Β   β”‚  β”œβ”€β”€ cgroup
β”‚Β   β”‚  β”œβ”€β”€ mount
β”‚   β”‚  β”œβ”€β”€ network
β”‚Β   β”‚  └── user
|   └── seccomp 
└── tools
  • root_fs [the root filesystem where your container will run]
  • src [the source folder]
  • capabilities [capabilities dropped for the new namespace]
  • helpers [helpers files]
  • namespaces [support for various namespaces]
  • cgroup [control group support]
  • mount [mount namespace reference folder]
  • network [network namespace reference folder]
  • user [user namespace reference folder]
  • seccomp [seccomp configuration to block some syscalls]
  • tools [tools folders]

Each folder (except root_fs) contains a more detailed instruction file called README.md the tools folder contains a binary file and scripts that will allow your namespace to connect to the internet. to support this you need a CNI, one of the scripts provided will allow you to use Polycube to do this, so you can also take advantage of ebpf technology!

Let's get started!

If you want to better understand how namespaces work then you need to explore the src folder. Discover the README.md files inside the various folders, they will guide you in understanding the realization of the containers. I suggest you also take a look at the source files because they are full of useful comments that will help you understand how virtualization works.

Suggestions, corrections and feedback

Please report any issues, corrections or ideas on GitHub

Pointers