Create a naive linux container

25年 2月 28日 Friday
1281 words
7 minutes

NOTE: the post is heavily based on great post by kevin boone, take a look if you feel this post lack of detail

prepare the linux system

the following commands is used to prepare the working linux system based on arch linux distro

text
fdisk
mkfs.ext4
mount
pacstrap

through the post, I'll assume the following:

  • the partition is mounted on /srv/container/test/ (although just use new directory is sufficient, you don't need to create a partition if you want, just a matter of preference)
  • the /srv/container/test/ is already installed with standard-minimal linux system
  • user demo is created inside the container

entering new root fs with chroot

execute chroot to start the container, but wrap it with systemd-run, ip and unshare command, why? I'll explain it later

text
sudo systemd-run --tty --slice=demo.slice ip netns exec container-ns unshare -mpfu chroot /srv/container/test /usr/local/bin/init.sh

init.sh is stored on /srv/container/test/usr/local/bin/init.sh:

text
#!/usr/bin/env bash

mount -t proc proc /proc
mount -t devtmpfs dev /dev
mount -t devpts devpts -o gid=5,mode=0620 /dev/pts
mount -t tmpfs -o nosuid,nodev tmpfs /dev/shm

exec bash

network configuration

on the host (as root):

text
ip link add veth-host type veth peer name veth-container
ip addr add 10.0.3.1/24 dev veth-host
ip link set veth-host up
ip netns add container-netns
ip link set veth-container netns container-netns

the above commands will do the following:

  • create a pair of veth interface
  • create the container-netns namespace
  • add the veth-container interface to the container-netns namespace

inside the container:

text
ip addr add 10.0.3.2/24 dev veth-container
ip link set veth-container up

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -A FORWARD -o ens3 -i veth-host -j ACCEPT
iptables -A FORWARD -i ens3 -o veth-host -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.3.2/24 -o ens3 -j MASQUERADE
ip route add default via 10.0.3.1

the commands executed in the container only need executed once, unless you delete the container-netns namespace, therefore no need to add it on init.sh script

  • the above commands will do the following:
  • assign ip address to the veth-container interface
  • enable the interface
  • allow internet access:
    • enable packet forwarding
    • add default route

hardening container's security

to prevent a bad guy from taking over your system either by somehow escalate/escape from the container or other possible way, we setup some protection:

  • chroot only isolate the root filesystem, but other virtual filesystem such as proc still inherited from the host system, and unshare is play a role in this part.
  • network isolation done by creating new network namespace with ip netns command instead of unshare
  • to control resources, use cgroup through systemd-run, see the SYSTEMD.RESOURCE-CONTROL(5) manual page, additionally, we also use ulimit
  • limiting ssh features/functionality/capabilities such as packet forwarding

/usr/local/lib/systemd/system/demo.slice:

text
[Slice]
MemoryMax=4G
CPUQuota="40%"

/etc/security/limits.conf:

text
# /etc/security/limits.conf
#
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
#
#Also note that configuration files in /etc/security/limits.d directory,
#which are read in alphabetical order, override the settings in this
#file in case the domain is the same or more specific.
#That means, for example, that setting a limit for wildcard domain here
#can be overridden with a wildcard setting in a config file in the
#subdirectory, but a user specific setting here can be overridden only
#with a user specific setting in the subdirectory.
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#
#<domain>      <type>  <item>         <value>
#

*               hard    nproc            50
#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4

# End of file

container usage

start an openssh server

you can just install openssh or copy necessary files from the host system (assuming it already have installed openssh package).

the following is required files to install if you choose to copy it manually from the host system.

text
cp -v /usr/bin/{sshd,ssh-keygen} /srv/container/test/usr/local/bin

-A option on ssh-keygen: Generate host keys of all default key types (rsa, ecdsa, and ed25519) if they do not already exist.

/etc/pam.d/sshd:

text
#%PAM-1.0

auth      include   system-remote-login
account   include   system-remote-login
password  include   system-remote-login
session   include   system-remote-login

UsePAM is required if you use ulimit, otherwise limits.conf won't be applied

/etc/ssh/sshd_config:

text
# Include drop-in configurations
Include /etc/ssh/sshd_config.d/*.conf

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/local/sbin:/usr/local/bin:/usr/bin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options override the
# default value.

#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key

# Ciphers and keying
#RekeyLimit default none

# Logging
#SyslogFacility AUTH
#LogLevel INFO
LogLevel DEBUG3

# Authentication:

#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#PermitRootLogin yes
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

#PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile	.ssh/authorized_keys

#AuthorizedPrincipalsFile none

#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no

# Change to no to disable s/key passwords
#KbdInteractiveAuthentication yes

# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no

# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the KbdInteractiveAuthentication and
# PasswordAuthentication.  Depending on your PAM configuration,
# PAM authentication via KbdInteractiveAuthentication may bypass
# the setting of "PermitRootLogin prohibit-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and KbdInteractiveAuthentication to 'no'.
#UsePAM no
UsePAM yes

#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
#X11Forwarding no
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#UseDNS no
#PidFile /run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none

# no default banner path
#Banner none

# override default of no subsystems
Subsystem	sftp	/usr/lib/ssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#	X11Forwarding no
#	AllowTcpForwarding no
#	PermitTTY no
#	ForceCommand cvs server
Match User demo
    X11Forwarding no
    AllowTcpForwarding no
    PermitTTY no

Title:Create a naive linux container

Author:ReYuki

Link:https://www.reyuki.site/posts/naive-container [copy]

Last updated:


This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You are free to share and adapt it, as long as you give appropriate credit, don’t use it for commercial purposes, and distribute your contributions under the same license. Provided under license CC BY-NC-SA 4.0