Jekyll2019-03-01T10:15:54+00:00https://amoldighe.github.io/feed.xmlAmol DigheTechnology, Devops, ExperienceJenkins - Deploying App2018-08-17T00:00:00+00:002018-08-17T00:00:00+00:00https://amoldighe.github.io/2018/08/17/jenkins-app<p>Jenkins is the most widely used build-n-deploy tool used in Devops arena. In past, I have used Jenkins for infrastructure code deployments, where we were deploying compute nodes in openstack infrastructure. If I have to sum up the high level build & deploy stages, we were getting the openstack packages from one of our local repo,
saltstack is called to build the configuration file for the compute nodes, the packages are deployed with the necessary configuration and the compute node is rebooted.
Sounds simple, but it’s actualy quite complex considering the jenkins configuration and various build scripts playing their part for the deployment.
Recently while working with the application team, I came across a deployment requirement which was quite intresting.</p>
<h2 id="requirement">Requirement</h2>
<ul>
<li>
<p>Setup a repository sync for a server running node.js code.</p>
</li>
<li>
<p>The node js code should run inside forever.</p>
</li>
<li>
<p>The forever session should run under a screen session as the terminal output of the screen session is required by the developers.</p>
</li>
</ul>
<h2 id="solution">Solution</h2>
<p>We will be using our Jenkins server for syncing the code from github repository to the node js server which ia a VPS instance on AWS.</p>
<ul>
<li>Prepeare the SCM</li>
</ul>
<p>First we need to prepare the Jenkins server to get the code from github using the SCM (Source Code Management) option.</p>
<p>Add the repository under SCM along with an existing or new credential key.</p>
<p>Add the public key of the keypair as deploy key to repository. <a href="https://developer.github.com/v3/guides/managing-deploy-keys/#deploy-keys">Click here to know more about deploy key</a></p>
<ul>
<li>Run job on Jenkins master</li>
</ul>
<p>While creating the Jenkins job, we need to specify that, it runs on Master, by selecting “Restrict where this project can be run” under General option and select the “Label Expression” as “master”</p>
<ul>
<li>Configure the SCM</li>
</ul>
<p>Under Source Code Management, add the git repository URL and select the key as credential. Jenkins will immediatly try to talk to the git repository and check if it can connect.</p>
<p>We are going to use ‘Poll SCM’ and select the Schedule to sync and build the repo every hour</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H * * * *
</code></pre></div></div>
<p>Under “Build Environment” we will be using Binding option to assign a variable (SSHKEY) to our credential key file. This is the credential key file which will be used to talk to the node js server instance. The credential is added using Jenkins control panel using “Jenkins » Credential » Add credentials”</p>
<ul>
<li>Next is the Build step, where we will be executing few shell commands :</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rsync -avhpzi -e "ssh -i ${SSHKEY}" --delete --exclude ".git" --exclude "logs" --exclude ".session" ./ "user@node-vps-instance:/home/ubuntu/ReportScript/"
ssh -i "${SSHKEY}" user@node-vps-instance "screen -X -S reportscript quit"
ssh -i "${SSHKEY}" user@node-vps-instance "screen -S reportscript -d -m bash -c 'cd /home/user/nodedata; sudo NODE_ENV=prod forever bin/www'"
</code></pre></div></div>
<h2 id="execution">Execution</h2>
<p>Lets see what happens when we execute the Jenkins job</p>
<ul>
<li>
<p>Jenkins master is going to connect to the git repository and fetch the changes on the master branch</p>
</li>
<li>
<p>The changes will be synced to the workspace on jenkins master</p>
</li>
<li>
<p>Next the shell commands under build will be executed by Jenkins</p>
</li>
</ul>
<p>rsync will use the SSHKEY and copy changes in code from the workspace directory on Jenkins master to the mentioned path on node js server</p>
<p>second shell command is to kill if there is any existing screen session on the node js server</p>
<p>thrid shell command will set the node js environment start a forever session to execute the node js code under a screen session.</p>
<p>Please note the screen session will be named and started in detached mode.</p>Jenkins is the most widely used build-n-deploy tool used in Devops arena. In past, I have used Jenkins for infrastructure code deployments, where we were deploying compute nodes in openstack infrastructure. If I have to sum up the high level build & deploy stages, we were getting the openstack packages from one of our local repo, saltstack is called to build the configuration file for the compute nodes, the packages are deployed with the necessary configuration and the compute node is rebooted. Sounds simple, but it’s actualy quite complex considering the jenkins configuration and various build scripts playing their part for the deployment. Recently while working with the application team, I came across a deployment requirement which was quite intresting.Moving Proxmox-LXC container2018-07-24T00:00:00+00:002018-07-24T00:00:00+00:00https://amoldighe.github.io/2018/07/24/proxmox-move<p>Proxmox supports migration of container in shutdown mode as well as taking snapshot of container. One caveat that needs to be consider for container snapshot feature is that the under laying disk file system for the proxmox host need to support snapshot. I had a requirement to take complete backup of a container and move it to another proxmox host, incidently this Proxmox host was not part of the cluster and was configured on regular lvm file system which does not support snapshot of container. This was tackled using the below steps:</p>
<ul>
<li>Container test-move needs to be moved to another proxmox box</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox1:~$pct list
VMID Status Lock Name
111 running app1
114 stopped db-store
119 running consul
120 running test-move
root@proxmox1:~$pct enter 120
root@test-move:~# pwd
/root
root@test-move:~# mkdir test
root@test-move:~# cd test
root@test-move:~/test# echo "test 1 test 1 test 2 test 2" > test-move
root@test-move:~/test# ls
test-move
root@test-move:~/test# cat test-move
test 1 test 1 test 2 test 2
</code></pre></div></div>
<ul>
<li>Shutdown & make copy of the container</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox1:~$pct stop 120
</code></pre></div></div>
<ul>
<li>Copy the container image to a shared partition</li>
</ul>
<p>I have a NFS shared partition mounted on all proxmox host. Copying the container image from /var/lib/vz/images/container-id/container-id.raw to a nfs shared partition</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox1:/mnt/bkp/images$cp /var/lib/vz/images/120/vm-120-disk-1.raw .
root@proxmox1:/mnt/bkp/images$ls -lth
total 635M
-rw-r----- 1 root root 5.0G Feb 26 20:48 vm-120-disk-1.raw
</code></pre></div></div>
<ul>
<li>Copy the configuration file for the container to nfs shared</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox1:~$cat /etc/pve/nodes/red/lxc/120.conf
arch: amd64
cores: 1
hostname: test-move
memory: 512
net0: name=eth1,bridge=vmbr3,gw=192.168.56.1,hwaddr=A6:01:FE:E0:67:36,ip=192.168.56.161/16,type=veth
ostype: ubuntu
rootfs: local:110/vm-120-disk-1.raw,size=5G
swap: 512
</code></pre></div></div>
<p>Before restoring the containers raw image from nfs share on the destination proxmox box, create a dummy place holder container on the destination proxmox host with similar cpu, ram, disk size.
This should create the necessary files and entries in /etc/pve/ and a file entry at /etc/pve/.rrd</p>
<ul>
<li>Stop this place holder container and delete/move its raw file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox2:pct stop 160
root@proxmox2:/var/lib/vz/images$mv 160/vm-160-disk-1.raw /tmp/
</code></pre></div></div>
<ul>
<li>Copy the raw image from nfs share to /var/lib/vz/images/container-id/</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox2:/var/lib/vz/images/160$cp /mnt/bkp/images/vm-120-disk-1.raw vm-160-disk-1.raw
root@proxmox2:/var/lib/vz/images/160$ls
vm-160-disk-1.raw
</code></pre></div></div>
<ul>
<li>Copy the configuration file content from nfs share to /etc/pve/nodes/studio/lxc/container-id.conf</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox2:/etc/pve$cat nodes/studio/lxc/160.conf
arch: amd64
cores: 1
hostname: test-move
memory: 512
net0: name=eth1,bridge=vmbr3,gw=192.168.56.1,hwaddr=A6:01:FE:E0:67:36,ip=192.168.56.161/16,type=veth
ostype: ubuntu
rootfs: local:120/vm-120-disk-1.raw,size=5G
swap: 512
</code></pre></div></div>
<ul>
<li>Replace the string referencing the old container id with new container id in the configuration file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rootfs: local:120/vm-120-disk-1.raw,size=5G
TO
rootfs: local:160/vm-160-disk-1.raw,size=5G
</code></pre></div></div>
<ul>
<li>Start the container and verify its content</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox2:/etc/pve$pct list
VMID Status Lock Name
130 stopped test-snap
160 running test-move
root@proxmox2:/etc/pve$pct enter 160
root@test-move:/# ls
bin boot dev etc fastboot home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var
root@test-move:/# cd
root@test-move:~# ls
test
root@test-move:~# cd test
root@test-move:~/test# ls
test-move
root@test-move:~/test# cat test-move
test 1 test 1 test 2 test 2
</code></pre></div></div>
<p>As the above procedure allows to take backup of raw image files of the container, the same file can be stored as backup of container.</p>Proxmox supports migration of container in shutdown mode as well as taking snapshot of container. One caveat that needs to be consider for container snapshot feature is that the under laying disk file system for the proxmox host need to support snapshot. I had a requirement to take complete backup of a container and move it to another proxmox host, incidently this Proxmox host was not part of the cluster and was configured on regular lvm file system which does not support snapshot of container. This was tackled using the below steps:Proxmox2018-07-03T00:00:00+00:002018-07-03T00:00:00+00:00https://amoldighe.github.io/2018/07/03/proxmox<p>Since past few months I have been working on Proxmox container solution which is based of LXC containers. Proxmox is similar to Docker in terms of its containerzation philisophy, but different in terms of how the technology is implemented. Here’s a comparision of Docker, LXC and Virtual Machine, this should give some prespective on what I mean in my earlier statement.</p>
<table>
<thead>
<tr>
<th>Docker</th>
<th>LXC</th>
<th>VM</th>
</tr>
</thead>
<tbody>
<tr>
<td>Very quick OS boot/intialization</td>
<td>Very quick OS boot/intialization</td>
<td>OS booting takes time</td>
</tr>
<tr>
<td>Requires custom baked container images</td>
<td>Requires custom baked container images</td>
<td>Standard OS iso can be used for booting</td>
</tr>
<tr>
<td>Shared kernel of the base machine</td>
<td>Shared kernel of the base machine</td>
<td>Full functional kernel for each VM</td>
</tr>
<tr>
<td>Runs a stripped down OS</td>
<td>Contains a fully functional OS with its own filesystem</td>
<td>Contains a fully functional OS with its own filesystem</td>
</tr>
<tr>
<td>Docker is designed to run one application per container</td>
<td>Multiple application can be installed within a LXC container</td>
<td>Runs multiple applications</td>
</tr>
<tr>
<td>Lightweight as compared to LXC & VM</td>
<td>Lightweight compared to VM</td>
<td>Increased resource consumption than containers</td>
</tr>
<tr>
<td>Filesystem is based of read only layers via AUFS</td>
<td>Provides complete independent filesystem</td>
<td>Provides complete independent filesystem</td>
</tr>
<tr>
<td>Emhemeral data storage within container, persistent data storage is supported using external storage mounts</td>
<td>Persistent data can be stored within LXC</td>
<td>Persistent data can be stored within VM</td>
</tr>
<tr>
<td>Cross platform solution</td>
<td>LXC is linux only solution</td>
<td>Cross platform solution</td>
</tr>
</tbody>
</table>
<p>Implementing a proxmox standalone server or a cluster is fairly simple and covered in their documentation. Proxmox uses corosync for clustering and has a very good UI for managing its server / cluster. What I liked about Proxmox is the supports creation of containers as well as creation of VM’s. One thing to remember while setting up a Proxmox cluster is to use the base server partition as lvm-thin or ZFS or any other storage file system which supports creation of snapshot. In case standard LVM storage is used on the base machine, it will restrict creation of container & VM snapshot as no metadata is stored in case of standard LVM. Lets cover some more of Proxmox based on the two issues and its solution that I came across while using this container technology.</p>
<h2 id="issue-1--lxc-container-fails-to-start">ISSUE 1 : LXC container fails to start</h2>
<ul>
<li>LXC container fails to start with ERROR in /var/log/lxc/<container_id>.log</container_id></li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
lxc-start 20181016102940.512 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
lxc-start 20181016102940.512 ERROR lxc_start - start.c:__lxc_start:1365 - Failed to spawn container "155".
lxc-start 20181016102941.255 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 32.
lxc-start 20181016102941.255 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "155".
lxc-start 20181016102946.260 ERROR lxc_start_ui - tools/lxc_start.c:main:366 - The container failed to start.
lxc-start 20181016102946.260 ERROR lxc_start_ui - tools/lxc_start.c:main:368 - To get more details, run the container in foreground mode.
lxc-start 20181016102946.260 ERROR lxc_start_ui - tools/lxc_start.c:main:370 - Additional information can be obtained by setting the --logfile and --logpriority options.
lxc-start 20181016103925.588 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:234 - No such file or directory - failed to change apparmor profile to lxc-container-default-cgns
lxc-start 20181016103925.588 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
lxc-start 20181016103925.588 ERROR lxc_start - start.c:__lxc_start:1365 - Failed to spawn container "155".
lxc-start 20181016103926.340 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 32.
lxc-start 20181016103926.340 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "155".
lxc-start 20181016103931.344 ERROR lxc_start_ui - tools/lxc_start.c:main:366 - The container failed to start.
lxc-start 20181016103931.344 ERROR lxc_start_ui - tools/lxc_start.c:main:368 - To get more details, run the container in foreground mode.
lxc-start 20181016103931.344 ERROR lxc_start_ui - tools/lxc_start.c:main:370 - Additional information can be obtained by setting the --logfile and --logpriority options.
</code></pre></div></div>
<ul>
<li>The error indicate an apparmor issue, check the apparmor modules loaded with command</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox:~$aa-status
apparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
</code></pre></div></div>
<p>Try loading the apparmor profiles</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox:~$apparmor_parser -R /etc/apparmor.d/
apparmor_parser: Unable to remove "/usr/bin/lxc-start". Profile doesn't exist
</code></pre></div></div>
<h2 id="solution-1">SOLUTION 1:</h2>
<ul>
<li>First load the lxc-container profile</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox:~$apparmor_parser -r /etc/apparmor.d/lxc-containers
root@proxmox:~$aa-status
apparmor module is loaded.
4 profiles are loaded.
4 profiles are in enforce mode.
lxc-container-default
lxc-container-default-cgns
lxc-container-default-with-mounting
lxc-container-default-with-nesting
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
</code></pre></div></div>
<ul>
<li>
<p>Now try starting the container, it shoukd work</p>
</li>
<li>
<p>Load rest of the profiles in /etc/apparmor.d/</p>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox:/etc/apparmor.d$apparmor_parser -r /etc/apparmor.d/
root@proxmox:/etc/apparmor.d$aa-status
apparmor module is loaded.
6 profiles are loaded.
6 profiles are in enforce mode.
/usr/bin/lxc-start
/usr/sbin/named
lxc-container-default
lxc-container-default-cgns
lxc-container-default-with-mounting
lxc-container-default-with-nesting
0 profiles are in complain mode.
25 processes have profiles defined.
16 processes are in enforce mode.
lxc-container-default-cgns (31021)
lxc-container-default-cgns (31305)
lxc-container-default-cgns (31325)
lxc-container-default-cgns (31364)
lxc-container-default-cgns (31469)
lxc-container-default-cgns (31473)
lxc-container-default-cgns (31491)
lxc-container-default-cgns (31684)
lxc-container-default-cgns (31689)
lxc-container-default-cgns (31697)
lxc-container-default-cgns (31962)
lxc-container-default-cgns (31967)
lxc-container-default-cgns (31968)
lxc-container-default-cgns (32020)
lxc-container-default-cgns (32024)
lxc-container-default-cgns (32025)
0 processes are in complain mode.
9 processes are unconfined but have a profile defined.
/usr/bin/lxc-start (7185)
/usr/bin/lxc-start (11478)
/usr/bin/lxc-start (13471)
/usr/bin/lxc-start (18829)
/usr/bin/lxc-start (24157)
/usr/bin/lxc-start (26359)
/usr/bin/lxc-start (26710)
/usr/bin/lxc-start (30969)
/usr/sbin/named (1402)
</code></pre></div></div>
<h2 id="issue-2-container-fails-to-mount-nfs-path">ISSUE 2: Container fails to mount NFS path</h2>
<p>To share data across containers and Proxmox host machine, we had setup a NFS box. The mounts on Proxmox host machine were working flawlessly but container failed to mount NFS path.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@container1:~$mount 192.168.50.4:/home/bkp /mnt/bkp/
mount: block device 192.168.100.4:/home/bkp is write-protected, mounting read-only
mount: cannot mount block device 192.168.50.4:/home/bkp read-only
</code></pre></div></div>
<h2 id="solution-2">SOLUTION 2:</h2>
<ul>
<li>Tail syslog on container host</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Jan 29 19:10:10 proxmox kernel: [36785137.211208] audit: type=1400 audit(1548769210.717:50467): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxc-container-default-cgns" name="/mnt/bkp/" pid=3535 comm="mount" fstype="nfs" srcname="192.168.50.4:/home/bkp"
</code></pre></div></div>
<p>The error indicates that the apparmor profile is denying the NFS mount</p>
<ul>
<li>On the Proxmox host, add the below line to /etc/apparmor.d/lxc/lxc-default-cgns</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@proxmox:~$cat /etc/apparmor.d/lxc/lxc-default-cgns
# Do not load this file. Rather, load /etc/apparmor.d/lxc-containers, which
# will source all profiles under /etc/apparmor.d/lxc
profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/lxc/container-base>
# the container may never be allowed to mount devpts. If it does, it
# will remount the host's devpts. We could allow it to do it with
# the newinstance option (but, right now, we don't).
deny mount fstype=devpts,
mount fstype=cgroup -> /sys/fs/cgroup/**,
mount options=(rw, nosuid, noexec, remount, relatime, ro, bind),
}
</code></pre></div></div>
<ul>
<li>Add mount to fstab
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@container1:~$cat /etc/fstab
192.168.50.4:/home/bkp /mnt/bkp nfs rw,async,hard,intr 0 0
</code></pre></div> </div>
</li>
<li>Incase of error
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@container1:~$mount -a
mount: wrong fs type, bad option, bad superblock on 192.168.50.4:/home/bkp,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so
</code></pre></div> </div>
</li>
<li>
<p>Install nfs-common package and then try mount command</p>
</li>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://forum.proxmox.com/threads/lxc-aa_profile-is-deprecated-and-was-renamed-to-lxc-apparmor-profile.38505/">LXC Apparmor</a></p>
<p><a href="https://unix.stackexchange.com/questions/254956/what-is-the-difference-between-docker-lxd-and-lxc">LXC - DOCKER</a></p>
<p><a href="https://www.linkedin.com/pulse/docker-vs-lxc-virtual-machines-phucsi-nguyen">LXC - DOCKER - VM</a></p>Since past few months I have been working on Proxmox container solution which is based of LXC containers. Proxmox is similar to Docker in terms of its containerzation philisophy, but different in terms of how the technology is implemented. Here’s a comparision of Docker, LXC and Virtual Machine, this should give some prespective on what I mean in my earlier statement.ntopng2018-05-04T00:00:00+00:002018-05-04T00:00:00+00:00https://amoldighe.github.io/2018/05/04/ntopng<p>One of my public facing squid server needed network monitoring, this is when I came across this amazing realtime monitoring tool - ntopng</p>
<p>In this post I am going to talk about instllation of ntopng on Ubuntu 14.04 as I have my production squid server running Ubuntu 14.04. I will also cover some of the
challenges encountered during the installation process. Here are the steps to follow:</p>
<ul>
<li>Setup the repository for ntopng</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget http://apt-stable.ntop.org/14.04/all/apt-ntop-stable.deb
dpkg -i apt-ntop-stable.deb
</code></pre></div></div>
<ul>
<li>Update the ubuntu repo list and install ntopng</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apt-get update
apt-get install ntopng
</code></pre></div></div>
<ul>
<li>This gave an error as it had a dependency on libmaxminddb0</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@squid:~# apt-get install ntopng
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
ntopng : Depends: libmaxminddb0 but it is not installable
Recommends: ntopng-data but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
</code></pre></div></div>
<ul>
<li>Add the following repository and install the dependency</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@squid:~# add-apt-repository ppa:maxmind/ppa
More info: https://launchpad.net/~maxmind/+archive/ubuntu/ppa
Press [ENTER] to continue or ctrl-c to cancel adding it
gpg: keyring `/tmp/tmpdz2i3kct/secring.gpg' created
gpg: keyring `/tmp/tmpdz2i3kct/pubring.gpg' created
gpg: requesting key DE742AFA from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpdz2i3kct/trustdb.gpg: trustdb created
gpg: key DE742AFA: public key "Launchpad PPA for MaxMind" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK
apt-get update
apt-get install libmaxminddb0 libmaxminddb-dev mmdb-bin
apt-get install ntopng
</code></pre></div></div>
<ul>
<li>Setup /etc/ntopng/ntopng.conf adding the following line to the configuration file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--user=ntopng
--interface=eth0
-w=<add server IP here>:3005
--community
--daemon
#--dump-flows=logstash # optional
#--disable-autologout # optional
#--disable-login=1 # optional
</code></pre></div></div>
<p><strong>PLEASE NOTE</strong> - the IP used in configuration file. To get the web interface on public IP, you need to mentioned the same public IP in the configuration file as ntopng will bind the public IP and port mentioned to expose the web interface.</p>
<ul>
<li>While starting ntopng, you might come across error</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(Re)Starting ntopng...
* Stopping ntopng
* Missing /etc/ntopng/ntopng.start. Quitting
</code></pre></div></div>
<p>Fix is to touch the file /etc/ntopng/ntopng.start and then start ntopng</p>
<p>As I am runing ntopng on a proxy server, it allows me to view the realtime traffice flowing through the server. This traffic is displayed as flow talkers on the dashboard.</p>
<p><img src="/img/ntopng-1.png" /></p>
<p>It further allows to drill down the flows to specifc port, in my case the squid proxy port 3128</p>
<p><img src="/img/ntopng-2.png" /></p>
<p>Another feature I think is useful is setting up the “Alert Endpoint” by integrating a webhook URL. It’s mentioned spcifically for Slack, but I tried it out with Flock webhook, which worked prefectly.</p>
<p><img src="/img/ntopng-3.png" /></p>
<p>ntopng started sending me alerts on the flock channel for - Flows Flood, Blacklisted Flow, Suspicious Activity on the host.</p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://www.linode.com/docs/networking/diagnostics/install-ntopng-for-network-monitoring-on-debian8/">Install ntopng</a></p>
<p><a href="https://www.ntop.org/guides/ntopng/web_gui/index.html">Ntopng GUI</a></p>One of my public facing squid server needed network monitoring, this is when I came across this amazing realtime monitoring tool - ntopngMy system is slow2018-02-11T00:00:00+00:002018-02-11T00:00:00+00:00https://amoldighe.github.io/2018/02/11/system-slow<p>Investigating a slow performing system</p>
<ul>
<li>Check system resources - provide output of tools displaying output of cpu, ram, disk io, network io performance</li>
<li>Ask customer question to quantify the slowness he is facing</li>
</ul>
<p>Below are the methodologies to localise and identify the cause of the slowness in a system.</p>
<ul>
<li><strong><em>Problem Statement Method</em></strong></li>
</ul>
<p>Ask question to customer related to the performance issue - what, when, where, which, changes ?</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> What makes you think there is a performance problem ?
Has the system ever performed well ?
What has changed recently - hardware, software, load ?
Can the performance degradation be expressed in terms of latency or run time ?
Does the performance affect other people or other application or its just you ?
What is the environment - hardware, software, instance type, version, configuration ?
</code></pre></div></div>
<ul>
<li><strong><em>Workload characterization method</em></strong></li>
</ul>
<p>Who is causing the load ?
Get information regarding the PID, UID, IP address.</p>
<p>Why is the load called ?
Is it related to some code, a directory path, stack trace.</p>
<p>What is the load ?
IOPs, through put, read / write</p>
<p>How is the load changed overtime ?</p>
<ul>
<li><strong><em>USE methodologies</em></strong></li>
</ul>
<p>Check these for every resource:</p>
<p><strong><em>Utilization</em></strong> - Average time the system resource was busy serviceing a request.
The metric for utilization can be defined as a percentage over time interval e.g. “one of the cpu core is running at 95% utilization”</p>
<p><strong><em>Saturation</em></strong> - A degree to which a system resource has extra work which it cannot service or is being queued.
The queue length of task to be serviced by the cpu.</p>
<p><strong><em>Errors</em></strong> - Count of error for each system resource.
Each of the system resource writes error messages to a log file which help in investigation of an issue.</p>
<ul>
<li><strong><em>Check System Metrices</em></strong>
Check output of performance tools like metrics or graphs depicting the metrices for</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> CPUs: cpu load
Memory: memory capacity load
Network interfaces : recieve data, transmit data
Storage devices: IOPS, Capacity, Throughput, Latency
</code></pre></div></div>
<p>To explain the above terms for storage device metrices, I am going to take reference from an article by <a href="http://rickardnobel.se/storage-performance-iops-latency-throughput/">Rickard Nobel</a> here is some information in extract from the same article which I feel is important for our explaination.</p>
<p>Throughput is usually expressed in Megabytes / Second (MB/s). The maximum throughput for a disk could be for example 170 MB/s.</p>
<p>IOPS means IO operations per second, which means the amount of read or write operations that could be done in one seconds time. A certain amount of IO operations will also give a certain throughput of Megabytes each second, so these two are related. A third factor is however involved: the size of each IO request. Depending on the operating system and the application/service that needs disk access it will issue a request to read or write a certain amount of data at the same time. This is called the IO size and could be for example 4 KB, 8 KB, 32 KB and so on. The minimum amount of data to read/write is the size of one sector, which is 512 byte only.</p>
<p><code class="highlighter-rouge">
Average IO size x IOPS = Throughput in MB/s
</code></p>
<p>Each IO request will take some time to complete, this is called the average latency. This latency is measured in milliseconds (ms) and should be as low as possible. There are several factors that would affect this time. Many of them are physical limits due to the mechanical constructs of the traditional hard disk.</p>
<p><strong><em>Linux Performance Tools</em></strong></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Observebility Tools - uptime, top, atop, htop, ps, vmstat, mpstat, iostat, free, strace, tcpdump, netstat, pidstat, lsof, swapon, sar, ss
Benchmarking Tools - fio, dd, sysbench, iperf
Tuning Tools - sysctl, ethtool, ip, route, nice, ulimit, chcpu, tune2fs, ionice, hdparm
Static Tools - df, ip, route, lsblk, lsscsi, swapon, lscpu, lshw, sysctl, lspci, ldd, sysctl
</code></pre></div></div>
<p><strong><em>Profiling</em></strong></p>
<p>Check output of system components for a certain time period to locate the problem area.</p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="http://www.brendangregg.com/usemethod.html">Brendan Gregg USE method</a></p>
<p><a href="https://youtu.be/FJW8nGV4jxY?list=PLwZOquYxKJS5_4UhfSCvOa-89LBuI0IIQ">Brendan Gregg Video</a></p>
<p><a href="http://rickardnobel.se/storage-performance-iops-latency-throughput/">Rickard Nobel</a></p>Investigating a slow performing systemExploring Ceph2018-01-20T00:00:00+00:002018-01-20T00:00:00+00:00https://amoldighe.github.io/2018/01/20/exploring-ceph<p>I have been working on ceph cluster operations since past 3 years - reparing cluster osd issues, adding new osd’s, adding a new node, tweaking pg’s. During this time I came across differenet concepts of Ceph, which I am collating in this post and try to piece together all this information to get a holistic understanding of ceph.</p>
<p><strong><em>Ceph</em></strong> is a open source, scalable, fault tolerant, self managing, software defined storage cluster. We have been using ceph as our storage backend on production cloud since Openstack Havana deployment. Underlay ceph is an object store, but has the ability to provide object storage, block storage and file system access to ceph clients or application. The below ceph architecture diagram will explain further how this is accomplished :</p>
<p><img src="/img/ceph-architecture-1.png" /></p>
<ul>
<li>Underlay ceph utilize - RADOS (Reliable, Autonomous, Distributed Object Store)</li>
<li>An application can directly talk to ceph using LIBRADOS (RADOS Library).</li>
<li>A ceph client application can access the object store using RADOSGW (RADOS Gateway).</li>
<li>The host machine or a VM can utilize ceph block storage capability using RBD (RADOS Block Device).</li>
<li>
<p>File system access can be leveraged using CEPH-FS.</p>
</li>
<li><strong><em>CEPH OSD & CEPH MON</em></strong></li>
</ul>
<p><img src="/img/ceph-architecture-2.png" /></p>
<p>Ceph storage cluster consist of OSD (Object Storage Daemon) & ceph-mon (Ceph Monitor). Each disk on a storage node is recognized as an OSD which store the data as a object. Ceph OSD Daemons handle the read/write operations on the storage disks. Each OSD will hold several 4MB chunks, any file and block entering the cluster will be split into 4MB chunks, written in different OSDs of the cluster, and then replicated to other OSD’. Ceph maintains 3 copies of a object data to guarantee redundancy.</p>
<p>Ceph monitor maintains the cluster state, autherntication, logging, monitor map, manager map, OSD map, CRUSH map. These maps are used by ceph daemons to coordinate with each other. A cluster should have atleast one ceph monitor, to avoid single point of failure the ceph-mon is maintained in a quorum of 3 ceph-mon nodes. In a cluster of monitors, latency and other faults can cause one or more monitors to fall behind the current state of the cluster. For this reason, Ceph must have agreement among various monitor instances regarding the state of the cluster. Ceph always uses a majority of monitors (e.g., 1, 2:3, 3:5, 4:6, etc.) and the Paxos algorithm to establish a consensus among the monitors about the current state of the cluster.</p>
<p>Ceph’s OSD Daemons and Ceph Clients are cluster aware. Like Ceph clients, each Ceph OSD Daemon knows about other Ceph OSD Daemons in the cluster. This enables Ceph OSD Daemons to interact directly with other Ceph OSD Daemons and Ceph Monitors. Additionally, it enables Ceph Clients to interact directly with Ceph OSD Daemons.</p>
<ul>
<li><strong><em>CRUSH</em></strong></li>
</ul>
<p>CRUSH map knows the topology of the cluster and is system aware. Data stored in CRUSH map is indicated using logical buckets for each host and their corresponding osd’s. Each host bucket contains an unique negative integer, weight of the bucket, alorithm used, hash algorithm, osd id and their weight.</p>
<p><code class="highlighter-rouge">
host ceph-002 {
id -2
# weight 2.727
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.909
item osd.2 weight 0.909
item osd.4 weight 0.909
}
</code></p>
<p>Ceph Clients and Ceph OSD Daemons both use the CRUSH algorithm to efficiently compute information about object location, instead of having to depend on a central lookup table. Ceph stores data as objects within logical storage pools. Using the CRUSH (Controlled Replication Under Scalable Hashing) algorithm, Ceph hashes the object to be stored, and dynamically calculates which placement group should contain the object. The CRUSH map is refered to further determine which Ceph OSD Daemon should store the placement group. The CRUSH algorithm enables the Ceph Storage cluster to scale, rebalance, and recover dynamically.</p>
<p><img src="/img/crush.jpg" /></p>
<ul>
<li><strong><em>Ceph Journal</em></strong></li>
</ul>
<p>Each Ceph OSD has a journal data associated with it, Ceph OSD Daemons write a description of the operation to the journal and apply the operation to the filesystem. Every few seconds the Ceph OSD Daemon stops writes and synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to trim operations from the journal and reuse the space. On failure, Ceph OSD Daemons replay the journal starting after the last synchronization operation. In production environment, it is recomended to store journal data on SSD for better performance.</p>
<ul>
<li><strong><em>Ceph Pool & Placement Group (PG)</em></strong></li>
</ul>
<p>A ceph pool is a logical partitions for storing objects. Each pool has a number of placement groups. CRUSH maps PGs to OSDs dynamically. The CRUSH algorithm maps each object to a placement group and then maps each placement group to one or more Ceph OSD Daemons. This layer of indirection allows Ceph to rebalance dynamically when new Ceph OSD Daemons and the underlying OSD devices come online. With a copy of the cluster map and the CRUSH algorithm, the client can compute exactly which OSD to use when reading or writing a particular object.</p>
<p><img src="/img/ceph-pg-osd.png" /></p>
<ul>
<li><strong><em>Clock Sync</em></strong></li>
</ul>
<p>One of the critical aspect of ceph is the clock sync on the storage cluster. All the mon nodes need to be in time sync, as a scale-out system with synchronous replications, nodes needs to have the same exact time, otherwise bad things can happen. By default the maximum allowed drift between nodes is 0.05 seconds!! In our production cluster we had add this to our monitoring for determining the ceph clock skew and alerts were sent out.</p>
<ul>
<li><strong><em>OSDs Service Clients Directly</em></strong></li>
</ul>
<p>Ceph Clients contact Ceph OSD Daemons directly, which increases both performance and total system capacity simultaneously, while removing a single point of failure. Ceph Clients can maintain a session when they need to, and with a particular Ceph OSD Daemon</p>
<ul>
<li><strong><em>OSD Membership and Status</em></strong></li>
</ul>
<p>Ceph OSD Daemons join a cluster and report on their status. At the lowest level, the Ceph OSD Daemon status is up or down reflecting whether or not it is running and able to service Ceph Client requests. The OSDs periodically send messages to the Ceph Monitor, if the Ceph Monitor doesn’t see that message after a configurable period of time then it marks the OSD down. This mechanism is a failsafe, however. Normally, Ceph OSD Daemons will determine if a neighboring OSD is down and report it to the Ceph Monitor(s). This assures that Ceph Monitors are lightweight processes.</p>
<ul>
<li><strong><em>Reblancing</em></strong></li>
</ul>
<p>Adding a new osd in the ceph cluster causes the cluster to rebalance. The cluster map gets updated with the new osd, consequently the object placement changes which changes the input for calculation. During rebalance process, some PG’s migrate to the new osd’s, but many of the PG’s stay in the same OSD. This might led to increase or decrease in disk space for an OSD.</p>
<p>When a Ceph OSD Daemon goes down, a placement group falls into a degraded state, etc.–the cluster map gets updated to reflect the current state of the cluster. Additionally, the Ceph Monitor also maintains a history of the prior states of the cluster.</p>
<ul>
<li><strong><em>Replication</em></strong></li>
</ul>
<p><img src="/img/ceph-replication.png" /></p>
<p>Like Ceph Clients, Ceph OSD Daemons use the CRUSH algorithm, but the Ceph OSD Daemon uses it to compute where replicas of objects should be stored (and for rebalancing). A client uses the CRUSH algorithm to compute where to store an object, maps the object to a pool and placement group, then looks at the CRUSH map to identify the primary OSD for the placement group. The client writes the object to the identified placement group in the primary OSD. Then, the primary OSD with its own copy of the CRUSH map identifies the secondary and tertiary OSDs for replication purposes, and replicates the object to the appropriate placement groups in the secondary and tertiary OSDs (as many OSDs as additional replicas), and responds to the client once it has confirmed the object was stored successfully.</p>
<ul>
<li><strong><em>Scrubing</em></strong></li>
</ul>
<p>As part of maintaining data consistency and cleanliness, Ceph OSDs can also scrub objects within placement groups. That is, Ceph OSDs can compare object metadata in one placement group with its replicas in placement groups stored in other OSDs. Scrubbing (usually performed daily) catches OSD bugs or filesystem errors. OSDs can also perform deeper scrubbing by comparing data in objects bit-for-bit, Deep scrubbing is usually performed weekly.</p>
<ul>
<li><strong><em>Ceph Authentication & Session</em></strong></li>
</ul>
<p>Ceph cluster uses a cephx authentication system to authenticate clients and daemons. Cephx uses shared secret keys for authentication, meaning both the client and the monitor cluster have a copy of the client’s secret key.</p>
<p><img src="/img/ceph-session.png" /></p>
<p>A user/actor invokes a Ceph client to contact a monitor.</p>
<p>The monitor returns an authentication ticket that contains a session key for use in obtaining Ceph services. This session key is itself encrypted with the user’s permanent secret key, so that only the user can request services from the Ceph Monitor(s).</p>
<p>The client then uses the session key to request its desired services from the monitor.</p>
<p>The monitor provides the client with a ticket that will authenticate the client to the OSDs that actually handle data.</p>
<p>Ceph Monitors and OSDs share a secret.</p>
<p>The client uses the ticket provided by the monitor with any OSD or metadata server in the cluster.</p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="http://docs.ceph.com/docs/master/architecture/">Ceph Architecture</a></p>
<p><a href="https://www.youtube.com/watch?v=7I9uxoEhUdY">Ceph Architecture Video explaination</a></p>I have been working on ceph cluster operations since past 3 years - reparing cluster osd issues, adding new osd’s, adding a new node, tweaking pg’s. During this time I came across differenet concepts of Ceph, which I am collating in this post and try to piece together all this information to get a holistic understanding of ceph.KVM Networking - Bridge2017-12-30T00:00:00+00:002017-12-30T00:00:00+00:00https://amoldighe.github.io/2017/12/30/kvm-bridge-networking<p>In a bridged network, the guest VM is on the same network as your host machine i.e. if your host machine IP is 192.168.122.25 then your VM will have a IP address like 192.168.122.30. The virtual machine can be accessed by all computers in your host network as it is part of same network IP range.</p>
<p>To explore bridge interface, I have setup a Ubuntu Desktop VM (hostname = ubuntu-desk) which will act as a HOST machine. This host machine ubuntu-desk has a VM running on it, in bridge mode.</p>
<ul>
<li>Add the bridge interface on host machine</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# brctl addbr vmbr0
root@ubuntu-desk:~# ip a | grep vmbr0 -B6
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.192/24 brd 192.168.122.255 scope global dynamic ens3
valid_lft 2254sec preferred_lft 2254sec
inet6 fe80::1a0c:acc4:1487:b3a5/64 scope link
valid_lft forever preferred_lft forever
3: vmbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether de:84:3e:7a:06:4f brd ff:ff:ff:ff:ff:ff
</code></pre></div></div>
<ul>
<li>Attach it to the physical interface</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# brctl addif vmbr0 ens3
root@ubuntu-desk:~# ip a | grep vmbr0 -A 4
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.192/24 brd 192.168.122.255 scope global dynamic ens3
valid_lft 2237sec preferred_lft 2237sec
inet6 fe80::1a0c:acc4:1487:b3a5/64 scope link
valid_lft forever preferred_lft forever
3: vmbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
</code></pre></div></div>
<p>You would notice vmbr0 is attached to the physical interface ens3</p>
<ul>
<li>The bridge interface will also reflect in brctl show</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# brctl show
bridge name bridge id STP enabled interfaces
vmbr0 8000.525400d92219 no ens3
</code></pre></div></div>
<ul>
<li>Run dhclient to get IP from dhcp server for vmbr0</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# dhclient vmbr0
</code></pre></div></div>
<ul>
<li>Start the VM by connecting it to the vmbr0 bridge on host machine</li>
</ul>
<p>I have connected the bridge network interface directly to an existing VM on the host machine using virtual machine manager.
Following is the dump of it’s xml.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# virsh list
Id Name State
----------------------------------------------------
1 br-node1 running
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# virsh dumpxml 1 | grep bridge -A5
<interface type='bridge'>
<mac address='52:54:00:4a:f2:00'/>
<source bridge='vmbr0'/>
<target dev='vnet0'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
</code></pre></div></div>
<ul>
<li>brctl will show the virtual interface to the bridge is UP</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# brctl show
bridge name bridge id STP enabled interfaces
vmbr0 8000.525400d92219 no ens3
vnet0
</code></pre></div></div>
<ul>
<li>Following interfaces are created on the host machine.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# ip a | grep vmbr0 -A11
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.192/24 brd 192.168.122.255 scope global dynamic ens3
valid_lft 3543sec preferred_lft 3543sec
inet6 fe80::1a0c:acc4:1487:b3a5/64 scope link
valid_lft forever preferred_lft forever
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.192/24 brd 192.168.122.255 scope global vmbr0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fed9:2219/64 scope link
valid_lft forever preferred_lft forever
4: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
link/ether fe:54:00:4a:f2:00 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc54:ff:fe4a:f200/64 scope link
valid_lft forever preferred_lft forever
</code></pre></div></div>
<ul>
<li>The VM on the host machine has a IP on the same network 192.168.122.xxx</li>
</ul>
<p><img src="/img/vm-bridge.png" /></p>
<p>What we saw above is a temporary solution to build & attach a bridge network.
For a permanent solution the bridge interface details need to be added to /etc/network/interfaces file on the host machine.</p>
<p>My host machine is running Ubuntu 17.10 Desktop version, where I discovered that network configuration is managed by Netplan instead of the /etc/network/interfaces file. Digging further on Dr Google, led me to few good webpages which I have listed below, this gave me an understanding as where to add this new yaml configuration in netplan for adding an permanent bridge interface.</p>
<ul>
<li>Add a new network configuration YAML file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# cat /etc/netplan/01-network-vmbr0.yaml
network:
version: 2
renderer: networkd
ethernets:
ens3:
dhcp4: true
bridges:
vmbr0:
interfaces: [ens3]
dhcp4: true
parameters:
stp: false
forward-delay: 0
</code></pre></div></div>
<p>Ubuntu desktop uses Network Manager as renderer for network configuration yaml. The network manager GUI can be accessed through nm-connection-editor command to configure/add a new network interface. I wanted to set my configuration the old school way rather than using the Network Manager GUI, hence I used networkd as the renderer for to managing my network configuration to setup a permanent bridge interface.</p>
<ul>
<li>Apply the configuration in yaml file.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~#netplan apply
</code></pre></div></div>
<ul>
<li>This should bring up the bridge interface post every restart</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@ubuntu-desk:~# ip a | grep vmbr0 -A 10
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether 52:54:00:d9:22:19 brd ff:ff:ff:ff:ff:ff
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0e:32:57:ff:6f:f5 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.133/24 brd 192.168.122.255 scope global dynamic vmbr0
valid_lft 3267sec preferred_lft 3267sec
inet6 fe80::c32:57ff:feff:6ff5/64 scope link
valid_lft forever preferred_lft forever
4: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
link/ether fe:54:00:4a:f2:00 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc54:ff:fe4a:f200/64 scope link
valid_lft forever preferred_lft forever
</code></pre></div></div>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://wiki.ubuntu.com/Netplan">https://wiki.ubuntu.com/Netplan</a></p>
<p><a href="https://websiteforstudents.com/configure-static-ip-addresses-on-ubuntu-18-04-beta/">https://websiteforstudents.com/configure-static-ip-addresses-on-ubuntu-18-04-beta/</a></p>
<p><a href="https://www.dedoimedo.com/computers/kvm-bridged.html">https://www.dedoimedo.com/computers/kvm-bridged.html</a></p>In a bridged network, the guest VM is on the same network as your host machine i.e. if your host machine IP is 192.168.122.25 then your VM will have a IP address like 192.168.122.30. The virtual machine can be accessed by all computers in your host network as it is part of same network IP range.KVM Networking - NAT & Host-Only2017-12-20T00:00:00+00:002017-12-20T00:00:00+00:00https://amoldighe.github.io/2017/12/20/kvm-networking<p>As the title suggest, we are going to explore networking for Virtual (guest) machines and host machine, for NAT interface and Host Only interface. I will be covering Bridge networking in a later post. To start with, I am going to list a few concepts for better understanding of kvm networking. Lets go through the type of networks a Virtual Machine can be attached to:</p>
<ul>
<li>
<p>Host-Only: The VM will be assigned a IP which is only accessible by the host machine where the VM is running on. Only the host machine can talk to the VM, no other host can access it.</p>
</li>
<li>
<p>NAT: The VM will be assigned a IP on different subnet that the host machine, but the VM has ability to talk to ouside network just like the host machine, but hosts from outside cannot access to your VM directly.</p>
</li>
<li>
<p>Bridged: The VM will be in the same network as your host, if your host IP is 192.168.100.25 then your VM will be like 192.168.100.30. It can be accessed by all computers in your host network.</p>
</li>
</ul>
<p>The below interfaces will be noticed on the host machine:</p>
<p>virbr is a virtual bridge interfaces provided by libvirt library. It’s acts as a bridge, to switch packets (at layer 2) between the interfaces (real or other) that are attached. There is a gateway IP attached to this virtual bridge which is visible on the host machine. The bridge is connected to a virtual router which provides DHCP which basically gives the virtual machine an IP address on that subnet to which the bridge is connected.</p>
<p>vnet interfaces is a type of virtual interface called tap interfaces. It is attached to a qemu-kvm process for reading-writing data between the host and the VM. A vnet will be added to a bridge interface for plugging the VM into virtual network.</p>
<ul>
<li>Lets first go through the default network on my host machine</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-list --all
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
</code></pre></div></div>
<ul>
<li>Get details of default network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-info default
Name: default
UUID: 8f7e6477-61df-4468-83fc-966d41d22302
Active: yes
Persistent: yes
Autostart: yes
Bridge: virbr0
</code></pre></div></div>
<ul>
<li>Let see the virtual bridge interface</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# brctl show
bridge name bridge id STP enabled interfaces
virbr0 8000.52540035ac06 yes virbr0-nic
vnet0
vnet1
</code></pre></div></div>
<p>virbr0 virtual bridge is attached to interface vnet0 & vnet1</p>
<ul>
<li>Let see what type of network is configured on default network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-dumpxml default
<network connections='1'>
<name>default</name>
<uuid>8f7e6477-61df-4468-83fc-966d41d22302</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr0' stp='on' delay='0'/>
<mac address='52:54:00:35:ac:06'/>
<ip address='192.168.122.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.122.2' end='192.168.122.254'/>
</dhcp>
</ip>
</network>
</code></pre></div></div>
<p>The output indicates that default network is configured a forward for NAT. Hence the VM on this network should be able to connect to outside networks.</p>
<ul>
<li>Check the VM’s and IP’s assigned on this network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-dhcp-leases default
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
2018-01-21 12:56:45 52:54:00:58:e4:7f ipv4 192.168.122.113/24 ubuntu1404 -
2018-01-21 12:56:45 52:54:00:ae:45:66 ipv4 192.168.122.181/24 node-1 -
</code></pre></div></div>
<p><strong>Now lets track steps to create another NAT network and a Host-Only network</strong></p>
<ul>
<li>Create a network configuration xml for NAT network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# cat nat.xml
<network>
<name>natntw</name>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr1' stp='on' delay='0'/>
<ip address='192.168.100.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.100.10' end='192.168.100.254'/>
</dhcp>
</ip>
</network>
</code></pre></div></div>
<p>We have defined the network name, network mode is set to NAT, a new bridge named as virbr1, network gateway, network subnet mask and IP range.</p>
<ul>
<li>Define the new network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-define nat.xml
Network natntw defined from nat.xml
</code></pre></div></div>
<p>Network bridge details are setup as soon as the new network is defined.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-info natntw
Name: natntw
UUID: 0c3b9a99-c14c-4317-84c4-e2fe245becbc
Active: no
Persistent: yes
Autostart: no
Bridge: virbr1
</code></pre></div></div>
<ul>
<li>Start the network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-start natntw
Network natntw started
</code></pre></div></div>
<p>This should start the virtual bridge virbr1 on the host machine</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:b8:ff:b7 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr1
valid_lft forever preferred_lft forever
</code></pre></div></div>
<ul>
<li>Set network to autostart</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-autostart natntw
Network natntw marked as autostarted
</code></pre></div></div>
<ul>
<li>I am going to repeat the above steps to create a host-only (isolated) network for VMs using the below xml file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# cat hostonly.xml
<network>
<name>hostntw</name>
<bridge name='virbr2' stp='on' delay='0'/>
<ip address='192.168.110.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.110.10' end='192.168.110.254'/>
</dhcp>
</ip>
</network>
</code></pre></div></div>
<ul>
<li>Virtual bridge virbr2 is started after starting the host only network - hostntw</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# ip a | grep virbr2 -A 7
10: virbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:48:7b:d9 brd ff:ff:ff:ff:ff:ff
inet 192.168.110.1/24 brd 192.168.110.255 scope global virbr2
valid_lft forever preferred_lft forever
11: virbr2-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr2 state DOWN group default qlen 1000
link/ether 52:54:00:48:7b:d9 brd ff:ff:ff:ff:ff:ff
</code></pre></div></div>
<ul>
<li>Lets list all the virtual networks</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
hostntw active yes yes
natntw active yes yes
</code></pre></div></div>
<p><strong>NEXT - Let’s see the behaviour of a VM on Host-Only Network</strong></p>
<ul>
<li>Spawning a VM on hostntw - our Host-Only network</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virt-install --name node-2 --vcpus=1 --memory 512 --disk path=/var/lib/libvirt/images/trusty-server-cloudimg-amd64-disk1-node2.i
mg,bus=virtio,cache=writeback --disk path=/var/lib/libvirt/images/config-drive-node2.iso,device=cdrom --graphics vnc,listen=0.0.0.0 --network bridge:v
irbr2,model=virtio --noautoconsole --os-type=linux --boot=hd
WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.
Starting install...
Creating domain... | 0 B 00:00:00
Domain creation completed.
</code></pre></div></div>
<p>As we are aware that virbr2 is the bridge attached to hostntw, we should get an interface attached on hostntw connected to the VM’s process.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:/var/lib/libvirt/images# brctl show virbr2
bridge name bridge id STP enabled interfaces
virbr2 8000.525400487bd9 yes virbr2-nic
vnet2
</code></pre></div></div>
<p>The vnet2 interface is attached to the VM - node-2</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh dumpxml node-2 | grep vnet -B 3 -A 4
<interface type='bridge'>
<mac address='52:54:00:d7:8d:61'/>
<source bridge='virbr2'/>
<target dev='vnet2'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
</code></pre></div></div>
<p>IP assigned to VM on hostntw</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:/var/lib/libvirt/images# virsh net-dhcp-leases hostntw
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
2018-01-21 14:26:15 52:54:00:d7:8d:61 ipv4 192.168.110.116/24 node-2 -
</code></pre></div></div>
<p>I am able to ssh to the VM but cannot ping outside the VM’s hostonly network.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# ssh -i /home/amol/.ssh/id_rsa ubuntu@192.168.110.116
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-139-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Sun Jan 21 07:57:53 UTC 2018
System load: 0.05 Processes: 71
Usage of /: 36.5% of 2.13GB Users logged in: 0
Memory usage: 10% IP address for eth0: 192.168.110.116
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
0 packages can be updated.
0 updates are security updates.
Last login: Sun Jan 21 07:57:56 2018 from 192.168.110.1
ubuntu@node-2:~$ ping google.com
ping: unknown host google.com
</code></pre></div></div>
<p><strong>NEXT - Attach a NAT interface to an existing VM to reach outside network</strong></p>
<p>We will attach a NAT interface/network to the existing VM, node-2 to reach outside through the NAT interface.
I will be using the newly created natntw network having the IP subnet 192.168.100.XXX to accomplish the same.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-info natntw
Name: natntw
UUID: 0c3b9a99-c14c-4317-84c4-e2fe245becbc
Active: yes
Persistent: yes
Autostart: yes
Bridge: virbr1
root@amol-hp-elite:~# virsh net-dhcp-leases natntw
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
root@amol-hp-elite:~# brctl show virbr1
bridge name bridge id STP enabled interfaces
virbr1 8000.525400b8ffb7 yes virbr1-nic
</code></pre></div></div>
<ul>
<li>Attached natntw to VM node-2</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh attach-interface --domain node-2 --type bridge --source virbr1 --model virtio --config
Interface attached successfully
</code></pre></div></div>
<p>Please note - I had to restart the VM, as restarting networking did not work, maybe because I had ssh’ed in the VM using eth0</p>
<p>Virtual bridge got an vnet3 interface attached for the VM node-2 for network natntw</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# brctl show virbr1
bridge name bridge id STP enabled interfaces
virbr1 8000.525400b8ffb7 yes virbr1-nic
vnet3
</code></pre></div></div>
<p>IP from natntw network was assigned to the VM</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# virsh net-dhcp-leases natntw
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
2018-01-21 14:56:36 52:54:00:c6:3c:03 ipv4 192.168.100.197/24 node-2 -
</code></pre></div></div>
<ul>
<li>Test outside connectivity from inside the VM</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-hp-elite:~# ssh -i /home/amol/.ssh/id_rsa ubuntu@192.168.110.116
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-139-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Sun Jan 21 08:26:31 UTC 2018
System load: 0.0 Memory usage: 9% Processes: 52
Usage of /: 36.5% of 2.13GB Swap usage: 0% Users logged in: 0
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
0 packages can be updated.
0 updates are security updates.
Last login: Sun Jan 21 08:09:43 2018 from 192.168.110.1
ubuntu@node-2:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:d7:8d:61 brd ff:ff:ff:ff:ff:ff
inet 192.168.110.116/24 brd 192.168.110.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fed7:8d61/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:c6:3c:03 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.197/24 brd 192.168.100.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fec6:3c03/64 scope link
valid_lft forever preferred_lft forever
ubuntu@node-2:~$ ping google.com
PING google.com (172.217.27.206) 56(84) bytes of data.
64 bytes from bom07s15-in-f14.1e100.net (172.217.27.206): icmp_seq=1 ttl=52 time=30.4 ms
64 bytes from bom07s15-in-f14.1e100.net (172.217.27.206): icmp_seq=2 ttl=52 time=69.5 ms
64 bytes from bom07s15-in-f14.1e100.net (172.217.27.206): icmp_seq=3 ttl=52 time=52.8 ms
^C
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 30.477/50.960/69.533/16.003 ms
</code></pre></div></div>
<p>Ping to google.com is now working as it connecting to outside network using the newly attached NAT interface on VM’s eth1.</p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://kashyapc.fedorapeople.org/virt/create-a-new-libvirt-bridge.txt">https://kashyapc.fedorapeople.org/virt/create-a-new-libvirt-bridge.txt</a></p>
<p><a href="https://jamielinux.com/docs/libvirt-networking-handbook/nat-based-network.html">https://jamielinux.com/docs/libvirt-networking-handbook/nat-based-network.html</a></p>
<p><a href="https://libvirt.org/formatnetwork.html#examplesNAT">https://libvirt.org/formatnetwork.html#examplesNAT</a></p>As the title suggest, we are going to explore networking for Virtual (guest) machines and host machine, for NAT interface and Host Only interface. I will be covering Bridge networking in a later post. To start with, I am going to list a few concepts for better understanding of kvm networking. Lets go through the type of networks a Virtual Machine can be attached to:Collectd for VM monitoring2017-11-22T00:00:00+00:002017-11-22T00:00:00+00:00https://amoldighe.github.io/2017/11/22/collectd<p>Here’s an intresting use case - There is a need check the usage of customer VM’s without installing an agent on their VM.</p>
<p>Solution - Use collectd with libvirt plugin enabled on your KVM host.</p>
<p>I have tried this solution on my Ubuntu laptop which is running 2 KVM-VMs</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# virsh list --all
Id Name State
----------------------------------------------------
1 ubuntu14 running
2 ubuntu14-new running
</code></pre></div></div>
<ul>
<li>Install collectd on KVM server</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apt-get install collectd
</code></pre></div></div>
<ul>
<li>Collectd has several plugins available and enabled in</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/etc/collectd/collectd.conf
</code></pre></div></div>
<ul>
<li>Configure collectd to load libvirt, better know as - virt plugin by creating this config file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# cat /etc/collectd/collectd.conf.d/libvirt.conf
<LoadPlugin virt>
Globals false
</LoadPlugin>
<Plugin "virt">
Connection "qemu:///system"
RefreshInterval 60
Domain "dom0"
BlockDevice "name:device"
InterfaceDevice "name:interface"
IgnoreSelected true
HostnameFormat "name"
</Plugin>
</code></pre></div></div>
<p>We will be adding a new config file to /etc/collectd/collectd.conf.d/ which will be included as per this directive in /etc/collectd/collectd.conf</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><Include "/etc/collectd/collectd.conf.d">
Filter "*.conf"
</Include>
</code></pre></div></div>
<ul>
<li>Restart collectd to enable virt plugin</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>service collectd restart
</code></pre></div></div>
<ul>
<li>Now to view the metrics collected by collectd-virt plugin, we will be setting up collectd-web, using the below process:</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd /usr/local/
git clone https://github.com/httpdss/collectd-web.git
cd collectd-web/
chmod +x cgi-bin/graphdefs.cgi
</code></pre></div></div>
<p>The python script at /usr/local/collectd-web/runserver.py is configured to run on localhost IP 127.0.0.1:8888</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:/usr/local/collectd-web# ./runserver.py
Collectd-web server running at http://127.0.0.1:8888/
</code></pre></div></div>
<p>To configure it to run on all interface acttached to KVM host, replace 127.0.0.1 to 0.0.0.0, and start the python server</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:/usr/local/collectd-web# ./runserver.py &
[1] 3592
root@hp-envy:/usr/local/collectd-web# Collectd-web server running at http://0.0.0.0:8888/
</code></pre></div></div>
<p>I have my local wifi router DHCP assigned the IP - 192.168.0.51</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# ip a | grep wlo1
3: wlo1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
inet 192.168.0.51/24 brd 192.168.0.255 scope global dynamic wlo1
</code></pre></div></div>
<p>Now I can hit the IP http://192.168.0.51:8888/ in my local browser to bring up the collectd web UI</p>
<p><img src="/img/collectd-web.png" /></p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://www.tecmint.com/install-collectd-and-collectd-web-to-monitor-server-resources-in-linux/">https://www.tecmint.com/install-collectd-and-collectd-web-to-monitor-server-resources-in-linux/</a></p>
<p><a href="https://syedali.net/monitoring/">https://syedali.net/monitoring/</a></p>
<p><a href="https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_virt">https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_virt</a></p>
<p><a href="https://github.com/httpdss/collectd-web">https://github.com/httpdss/collectd-web</a></p>Here’s an intresting use case - There is a need check the usage of customer VM’s without installing an agent on their VM.Cloud Image & CloudInit2017-10-10T00:00:00+00:002017-10-10T00:00:00+00:00https://amoldighe.github.io/2017/10/10/cloud-image-cloud-init<p>In one of our earlier exercise while booting a cloud image on KVM, we noticed the following warning messages.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2017-11-16 14:22:35,359 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [112/120s]: request error [HTTPConnectionPool(host='192.168.122.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 111] Connection refused)]
2017-11-16 14:22:42,367 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [119/120s]: request error [HTTPConnectionPool(host='192.168.122.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 115] Operation now in progress)]
2017-11-16 14:22:49,374 - DataSourceCloudStack.py[CRITICAL]: Giving up on waiting for the metadata from ['http://192.168.122.1//latest/meta-data/instance-id'] after 126 seconds
</code></pre></div></div>
<p>The message indicate that the above python script are trying to connect to network gateway IP 192.168.122.1 to fetch meta-data for the VM. As this is a cloud image we are trying to boot on KVM, it has cloud-init prebaked in the image. Cloud-init is a collection of python scripts configured to fetch metadata like - cloud specific host information like instance ID numbers, hostname, IP address, etc. User data can also be passed to the VM using cloud-init which can execute various commands, scripts to configure the VM. More details regarding cloud-init can be refered at the links ate the end of the post.</p>
<p>Cloud-init is configured to fetch this metadata or userdata from a metadata server OR a config-drive. To run a cloud image locally on KVM, we will be setting up a config drive to fetch this meta-data. Cloud-init is programmed to search for the config-drive to fetch meta-data before looking for meta-data @ 169.254.169.254 OR the gateway IP in the above case. We will be creating a config-drive iso containing the metadata and attach the same to as a cdrom drive while booting the VM as follows :</p>
<ul>
<li>Create a directory to house the meta-data & user-data files.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# ls -lh cloud-configdir/
total 4.0K
-rw-r--r-- 1 root root 533 Dec 27 21:13 meta-data
-rw-r--r-- 1 root root 0 Dec 27 21:16 user-data
</code></pre></div></div>
<p>PLEASE NOTE - both these files are required, even if you do not want to pass as user-data to the VM.</p>
<ul>
<li>Generate UUID using command uuidgen</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:/mnt/srcVM# uuidgen
378e5f70-de64-4944-a93f-190e29e49956
</code></pre></div></div>
<ul>
<li>Add the following meta data for KVM -VM</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# cat cloud-configdir/meta-data
instance-id: 378e5f70-de64-4944-a93f-190e29e49956
hostname: node-1
local-hostname: node-1
public-keys:
- |
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD
</code></pre></div></div>
<ul>
<li>Generate the file for iso drive</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# mkisofs -o /var/lib/libvirt/images/config-drv.iso -V cidata -r -J --quiet cloud-configdir
</code></pre></div></div>
<ul>
<li>OR use this script <a href="https://github.com/larsks/virt-utils/blob/master/create-config-drive">create-config-drive.sh</a> to generate the iso file</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@amol-HP-ENVY-15-Notebook-PC:~# bash ./create-config-drive.sh -k /home/amol/.ssh/id_rsa.pub -h cfg01 /var/lib/libvirt/images/config-drv.iso
adding pubkey from /home/amol/.ssh/id_rsa.pub
generating configuration image at /var/lib/libvirt/images/config-drv.iso
</code></pre></div></div>
<ul>
<li>Spawn the VM using the cloud image & config drive iso</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:/var/lib/libvirt/images# virt-install --name ubuntu14-new --vcpus 1 --memory 1024 --disk path=/var/lib/libvirt/images/ubuntu-server-1404-NEW.img,bus=virtio,cache=writeback --disk path=/var/lib/libvirt/images/config-drv.iso,device=cdrom --graphics vnc,listen=0.0.0.0 --network bridge:virbr0,model=virtio --noautoconsole --os-type=linux --boot hd
WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.
Starting install...
Creating domain... | 0 B 00:00:00
Domain creation completed.
</code></pre></div></div>
<ul>
<li>List the new VM</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# virsh list --all
Id Name State
----------------------------------------------------
1 ubuntu14 running
5 ubuntu14-new running
</code></pre></div></div>
<ul>
<li>List the active network to further list the DHCP IP assigned to the VM</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# virsh net-dhcp-leases default
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
2017-12-27 23:25:04 52:54:00:e4:53:22 ipv4 192.168.122.238/24 cfg01 -
2017-12-27 23:30:53 52:54:00:f4:2f:df ipv4 192.168.122.210/24 node-1 -
</code></pre></div></div>
<ul>
<li>SSH to the new VM</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@hp-envy:~# ssh -i /home/amol/.ssh/id_rsa ubuntu@192.168.122.210
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-137-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Wed Dec 27 15:51:01 UTC 2017
System load: 0.05 Processes: 73
Usage of /: 1.9% of 49.18GB Users logged in: 0
Memory usage: 5% IP address for eth0: 192.168.122.210
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
0 packages can be updated.
0 updates are security updates.
New release '16.04.3 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Dec 27 15:51:24 2017 from hp-envy
ubuntu@node-1:~$
</code></pre></div></div>
<p>Congratulations Again!! you have successfully booted a cloud image on KVM.</p>
<ul>
<li><strong><em>Reference Links</em></strong></li>
</ul>
<p><a href="https://www.digitalocean.com/community/tutorials/how-to-use-cloud-config-for-your-initial-server-setup">https://www.digitalocean.com/community/tutorials/how-to-use-cloud-config-for-your-initial-server-setup</a></p>
<p><a href="https://www.cloudsigma.com/an-introduction-to-server-provisioning-with-cloudinit">https://www.cloudsigma.com/an-introduction-to-server-provisioning-with-cloudinit</a></p>
<p><a href="http://ibm-blue-box-help.github.io/help-documentation/nova/Metadata_service_FAQ/">http://ibm-blue-box-help.github.io/help-documentation/nova/Metadata_service_FAQ/</a></p>In one of our earlier exercise while booting a cloud image on KVM, we noticed the following warning messages.