KVM on Arch Linux
UNDER CONSTRUCTION: The document is currently being modified! |
Introduction
This is a tutorial on how to automate the setup of VMs using KVM on Arch Linux. This tutorial utilizes QEMU as a back-end for KVM using libvirt. System base images will be generated using Packer. And finally, Vagrant and vagrant-libvirt will be utilized for KVM test environments.
Tutorial environment consists of a database server, a DNS server, a web server and one or more test servers (which may or may not be clones of the three main servers). Additional servers should be able available on demand for any use case. All machine images should be built in-house so that image security can be maintained.
Installation
Before getting started there are a few packages that will be needed to set all of this up.
# pacaur -S bridge-utils dnsmasq ebtables libguestfs libvirt openbsd-netcat openssl-1.0 \ ovmf packer-io qemu-headless qemu-headless-arch-extra vagrant |
vagrant-libvirt
The libvirt plugin installation for vagrant requires some cleanup first.
# sudo mv /opt/vagrant/embedded/lib/libcurl.so{,.backup} # sudo mv /opt/vagrant/embedded/lib/libcurl.so.4{,.backup} # sudo mv /opt/vagrant/embedded/lib/libcurl.so.4.4.0{,.backup} # sudo mv /opt/vagrant/embedded/lib/pkgconfig/libcurl.pc{,backup} |
Then build the plugin.
# vagrant plugin install vagrant-libvirt |
Hugepages
Enabling hugepages can improve the performance of virtual machines. First add an entry to the fstab, make sure to first check what the group id of the group kvm is.
# grep kvm /etc/group # sudoedit /etc/fstab |
hugetlbfs /dev/hugepages hugetlbfs mode=1770,gid=999 0 0 |
Instead of rebooting, remount instead.
# sudo umount /dev/hugepages # mount /dev/hugepages |
This can then be verified.
# sudo mount | grep huge # ls -FalG /dev/ | grep huge |
Now to set the number of hugepages to use. For this one has to do a bit of math, for each gigabyte of the system RAM that you want to use for VMs you divide the size in megabytes by two.
On my setup I will dedicated 12GB out of the 16GB of system RAM to VMs. This means (12 * 1024) / 2 or 6144 |
Set the number of hugepages.
# echo 6144 | sudo tee /proc/sys/vm/nr_hugepages |
Also set this permanently by adding a file to /etc/sysctl.d.
vm.nr_hugepages = 6144 |
Again verify the changes.
# grep HugePages_Total /proc/meminfo |
KVM Group
Create a user for KVM.
# sudo useradd -g kvm -s /usr/bin/nologin kvm |
Then modify the libvirt QEMU config to reflect this.
user = "kvm" group = "kvm" |
Fix permission on /dev/kvm
# sudo groupmod -g 78 kvm |
systemd as of 234 assigns dynamic IDs to groups, but KVM expects 78 |
Add the current user to the kvm group.
# sudo gpasswd -a kyau kvm |
OVMF & IOMMU
The Open Virtual Machine Firmware (OVMF) is a project to enable UEFI support for virtual machines and enabling IOMMU will enable PCI pass-through among other things. This extends the possibilities for operating system choices significantly and also provides some other options.
Enable IOMMU on boot by adding an option to the kernel line in GRUB.
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on" |
Re-generate the GRUB config.
# sudo grub-mkconfig -o /boot/grub/grub.cfg |
Reboot the machine and then verify IOMMU is enabled.
# sudo dmesg | grep -e DMAR -e IOMMU |
If it was enabled properly, there should be a line similar to [ 0.000000] DMAR: IOMMU enabled.
Adding the OVMF firmware to libvirt.
nvram = [ "/usr/share/ovmf/ovmf_code_x64.bin:/usr/share/ovmf/ovmf_vars_x64.bin" ] |
LVM
During the installation of the KVM host machine a data volume group was created for VMs. Before carving out disk space for virtual machines, create the volume(s) that will exist outside of the virtual machines. These will be used for databases, web root directories and any other data that needs to persist between VM creation and destruction.
# sudo lvcreate -L 256G data --name http |
I am only using a single LVM volume and then creating directories inside of this for each machine |
Create a directory for the volume.
# sudo mkdir /http |
Format the new volume with ext4.
# sudo mkfs.ext4 -O metadata_csum,64bit /dev/data/http # sudo mount /dev/data/http /http |
Set proper permissions and mod the http user's home directory.
# sudo chown http:http /http # sudo usermod -m -d /http http |
Add the volume to fstab so that it mounts upon boot.
/dev/mapper/data-http /http ext4 rw,relatime,stripe=256,data=ordered,journal_checksum 0 0 |
Volumes will now need to be created for each virtual machine, for this an LVM thin pool can be utilized.
LVM Thin Provisioning
Thin provisioning creates another virtual layer on top of your volume group, in which logical thin volumes can be created. Thin volumes, unlike normal thick volumes, do not reserve the disk space for the volume on creation but instead do so upon write; to the operating system they are still reported as full size volumes. This means that when utilizing LVM directly for KVM it will perform similarly to a "dynamic disk" meaning it will only use what disk space it needs regardless of how big the virtual hard drive actually is. This can also be paired with LVM cloning (snapshots) to create some interesting setups, like running 1TB of VMs on a 128GB disk for example.
WARNING: The one disadvantage to doing this is that without proper disk monitoring and management this can lead to over provisioning (overflow will cause volume drop) |
Use the rest of the data volume group for the thin pool.
# sudo lvcreate -l +100%FREE data --thinpool qemu |
Pulling up lvdisplay can verify that it created a thin pool.
# sudo lvdisplay data/qemu |
__LV Size <1.50 TiB Allocated pool data 0.00% |
Finally lvs should show the volume with the t and tz attributes as well as a data percentage.
# sudo lvs |
__LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert http data -wi-ao---- 256.00g qemu data twi-a-tz-- <1.50t 0.00 0.43 root neutron -wi-ao---- 63.93g |
Adding volumes to the thin pool is very similar to adding normal volumes, add one for the first VM.
# sudo lvcreate -V 20G --thin -n dns data/qemu |
These volumes can be shrunk or extended at any point.
# sudo lvextend -L +15G data/dns |
Or even removed entirely.
# sudo lvremove data/dns |
Verify the new base volume was added correctly to the thin pool.
# sudo lvs |
The volume should be marked in pool qemu, have a data of 0.00% and attributes V and tz.
__LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert dns data Vwi-a-tz-- 20.00g qemu 0.00 |
Packer
Packer is a tool for automating the creation of virtual machines, in this instance it will be used to automate the creation of Vagrant boxes. I have already taken the time to create a packer template for Arch Linux based off of my installation tutorials, but I encourage you to use this only as a basis and delve deeper to create your own templates. I could have very easily just have downloaded someone else's templates, but then I would lack understanding.
GitHub: kyau/packer-kvm-templates |
The Packer templates are in JSON format and contain all of the information needed to create the virtual machine image. Descriptions of all the template sections and values, including default values, can be found in the Packer docs. For Arch Linux, the template file archlinux-x86_64-base-vagrant.json will be used to generate an Arch Linux qcow2 virtual machine image.
# git clone https://github.com/kyau/packer-kvm-templates # cd packer-kvm-templates/archlinux-x86_64-base |
To explain the template a bit, inside of the builders section the template is specifying that it is a qcow2 image running on QEMU KVM. A few settings are being imported from user variables that are being set in the previous section, this includes the ISO url and checksum, the country setting, disk space for the VMs primary hard drive, the amount of RAM to dedicate to the VM, how many vCores to dedicated to the VM, whether or not it is a headless VM or not, and the login and password for the primary SSH user. These are all set as user variables and placed in a section at the top to be able to make quick edits. The template also specifies that the VM should use virtio for the disk and network interfaces. Lastly the builtin web server in Packer and the boot commands; the http_directory specifies which directory will be the main root of the builtin web server (this enables one to host files up for the VM to access during installation). The boot_command is an array of commands that are to be executed upon boot in order to kick-start the installer. Finally, the qemuargs should be rather apparent as they are the arguments passed to QEMU.
# cd packer-kvm-templates |
Looking then at the provisioners section which is executing three separate scripts after the machine has booted. These scripts are also being passed the required user variables that are set at the top of the file as shell variables. The install.sh script is the one that installs Arch Linux, hardnening.sh is the script that applies hardening the Arch Linux installation and finally cleanup.sh is there for general cleanup after the installation is complete.
While the README.md does have all of this information for the packer templates, it will also be detailed here.
For added security generate a new moduli for your VMs (or copy from /etc/ssh/moduli.
# ssh-keygen -G moduli.all -b 4096 # ssh-keygen -T moduli.safe -f moduli.all # mv moduli.safe moduli && rm moduli.all |
Enter the directory for the Arch Linux template and sym-link the moduli.
# cd archlinux-x86_64-base/default # ln -s ../../moduli . && cd .. |
Build the base virtual machine image.
# ./build archlinux-x86_64-base-vagrant.json |
This runs: PACKER_LOG=1 PACKER_LOG_PATH="packer.log" packer-io build archlinux-x86_64-base-vagrant.json, it logs to the current directory |
Once finished, there should be a qcow2 vagrant-libvirt image for Arch Linux in the box directory.
Add this image to Vagrant.
# vagrant box add box/archlinux-x86_64-base-vagrant-libvirt.box --name archlinux-x86_64-base |
Vagrant-libvirt
Vagrant can be used to build and manage test machines. The vagrant-libvirt plugin adds a Libvirt provider to Vagrant, allowing Vagrant to control and provision machines via the Libvirt toolkit.
To bring up the first machine initialize Vagrant in a new directory first create a directory for the machine.
# cd # mkdir testmachine # cd testmachine |
Init the machine the Vagrant.
# vagrant init archlinux-x86_64-base |
Then bring up the machine.
# vagrant up |
Then SSH into the machine directly.
# vagrant ssh |
QEMU
While Vagrant is a great tool for working with test and development environments, for the more permanent VMs on the system, utilizing QEMU directly will allow the VMs to run directly off of LVM thin volumes. Currently vagrant-libvirt cannot do this, due to it's own snapshotting interfering with it; thankfully LVM has snapshotting of its own.
For this a separate Packer template was created, one with all of the Vagrant stuff removed. To build one of these simply use the other JSON file in the Arch Linux template directory.
# ./build archlinux-x86_64-base.json |
This can then be output directly to the LVM thin volume.
# sudo qemu-img convert -f qcow2 -O 'raw' 'qcow2/archlinux-x86_64-base.qcow2' '/dev/data/dns' |
Then because it copied a thick volume onto a thin volume it will be using all of the disk space.
# sudo lvs |
__LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert dns data Vwi-a-tz-- 20.00g qemu 100.00 |
The disk merely needs to be sparsified.
# sudo virt-sparsify --in-place /dev/data/dns |
The disk should now be reading properly.
# sudo lvs |
__LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert dns data Vwi-a-tz-- 20.00g qemu 7.17 |
Network Bridge
Setting up a network bridge for KVM is simple with systemd. Replace X.X.X.X with the host machine's IP address and update the Gateway and DNS if not using OVH.
[NetDev] Name=kvm0 Kind=bridge |
[Match] Name=kvm0 [Network] DNS=213.186.33.99 Address=X.X.X.X/24 Gateway=Y.Y.Y.254 IPForward=yes |
[Match] Name=eth0 [Network] Bridge=kvm0 |
And finally restart networkd.
# sudo systemctl restart systemd-networkd |
The bridge should now be up and running, this should be verified.
# ip a |
Once the bridge is up and running QEMU can be directed to use it. Create a directory in /etc/ for QEMU and then make a bridge.conf.
# sudo mkdir /etc/qemu # sudoedit /etc/qemu/bridge.conf |
allow kvm0 |
Then set cap_net_admin on the binary helper.
# sudo setcap cap_net_admin=ep /usr/lib/qemu/qemu-bridge-helper |
WARNING: I had major issues using the bridge as a regular user, I actually had to remove the setuid bit to get it working: sudo chmod u-s /usr/lib/qemu/qemu-bridge-help |
NAT
To get NAT working inside of each VM IP forwarding will need to be enabled.
net.ipv4.ip_forward = 1 |
Rules will also need to be appended to nftables.
table inet filter { … chain foward { type filter hook forward priority 0; oifname kvm0 accept iifname kvm0 ct state related, established accept iifname kvm0 drop } … } |
Rebooting at this point to make sure all these networking settings were set correctly would be a wise idea.
# sudo systemctl reboot |
Network Test
The network on the VM should now be fully tested, for this a connection can be made using the SPICE protocol. On a local client machine install vinagre.
# pacaur -S vinagre |
Using the OVH/SyS Manager setup two failover IP addresses to the same virtual MAC. The following arguments will launch the virtual machine. Be sure to input the proper virtual MAC so that it matches the one that OVH assigned.
# /usr/bin/qemu-system-x86_64 --enable-kvm -machine q35,accel=kvm -device intel-iommu \ -m 512 -smp 1 -cpu host -drive file=/dev/data/dns,cache=none,if=virtio,format=raw \ -net bridge,br=kvm0 -net nic,model=virtio,macaddr=00:00:00:00:00:00 -vga qxl \ -spice port=5900,addr=127.0.0.1,disable-ticketing \ -monitor unix:/tmp/monitor-dns.sock,server,nowait |
Once launched, you should be able to connect to the KVM using a SPICE client such as Vinagre. Click Connect in Vinagre, set the Host: to localhost and then make sure Use host is checked with your KVM host server name filled in "as a SSH tunnel". Connect and enter your SSH key password.
The KVM virtual machine should now be visible through Vinagre.
Login as root, if this was built using packer-kvm-templates the default password is password.
Edit the network interface configuration for systemd. This first VM is going to be acting as my DNS server, therefore it will be assigned two IP addresses.
[Match] Name=eth0 [Network] Address=FAILOVER_IP_1/32 Address=FAILOVER.IP.2/32 DNS=213.186.33.99 Peer=HOST_GATEWAY/32 [Gateway] Gateway=HOST_GATEWAY Destination=0.0.0.0/0 |
This is exactly how OVH says it should be setup, however this was not enough as the VM still did not have a default route.
TODO: Fix this section, this is an ugly hack |
To fix the routing create a service on boot.
[Unit] Description=Start KVM Network After=network.target Before=multi-user.target shutdown.target Conflicts=shutdown.target Wants=network.target [Service] ExecStart=/usr/local/bin/kvmnet [Install] WantedBy=multi-user.target |
And the script that does the routing.
#!/bin/bash ip route add Y.Y.Y.254 dev eth0 ip route add default via Y.Y.Y.254 dev eth0 |
Don't forget to make it executable.
# sudo chmod +rx /usr/local/bin/kvmnet |
Enable the service.
# sudo systemctl enable kvmnet |
Reboot the VM and then verify it has internet access.
# sudo reboot # ping archlinux.org |
Finally, verify it can be SSH into from the outside via BOTH IP addresses.
Libvirt
To launch the virtual machines on boot there are two options. The first option involves importing the virtual machines into libvirt with virsh. The second option is to setup a systemd service. Given that management will be loads easier with virt-manager I will opt for this option.
On the KVM host machine enable and start libvirtd.
# sudo systemctl enable libvirtd # sudo systemctl start libvirtd |
Then enable access to libvirtd to everyone in the kvm group.
/* Allow users in kvm group to manage the libvirt daemon without authentication */ polkit.addRule(function(action, subject) { if (action.id == "org.libvirt.unix.manage" && subject.isInGroup("kvm")) { return polkit.Result.YES; } }); |
Virsh
Virsh is the command line interface for libvirt. It can be used to import the QEMU arguments into an XML format that libvirt will understand.
Save the QEMU arguments used before to a temporary file.
# echo "/usr/bin/qemu-system-x86_64 --enable-kvm -machine q35,accel=kvm -device intel-iommu \ -m 512 -smp 1 -cpu Broadwell -drive file=/dev/data/dns,cache=none,if=virtio,format=raw \ -net bridge,br=kvm0 -net nic,model=virtio,macaddr=00:00:00:00:00:00 -vga qxl \ -spice port=5900,addr=127.0.0.1,disable-ticketing \ -monitor unix:/tmp/monitor-dns.sock,server,nowait" > kvm.args |
Temporarily changing the CPU because virsh cannot recognize host |
Convert this to XML format.
# virsh domxml-from-native qemu-argv kvm.args > dns.xml |
Then open up the XML file in an editor and change the name, cpu and graphics block.
… <name>DNS (Arch64)</name> … <cpu mode='host-passthrough' /> … <graphics type='spice' port='5900' autoport='no' listen='127.0.0.1'> <listen type='address' address='127.0.0.1' /> </graphics> … |
The last two qemu:commandline arguments can also be removed as they were setting up the SPICE server which is done through the graphics block.
The XML should now be in a similar state as to when it was executed with the QEMU binary.
Import the XML into libvirt.
# sudo virsh define dns.xml |
The VM can now be launched.
# sudo virsh start DNS |
SSH and SPICE over SSH should both now work and the machine should be running. Use the following to start the machine on boot.
# sudo virsh autostart DNS |
A reboot of the host machine at this point should yield the virtual machine DNS starting up automatically.
Virt-manager
Virt-manager can be used to manage the virtual machines remotely.
Virt-manager can now be installed on the local machine (the one viewing this tutorial not the KVM host machine), this can be used to connect to libvirt remotely via SSH.
# pacaur -S virt-manager |
Connect remotely to QEMU/KVM with virt-manager over SSH and the virtual machine should be shown as running.