January 3, 2019

A lightweight dev system (Debian + LAMP)

Good morning,

to make my web development work a bit smoother, I want to begin using separate virtual machines for my various projects. Each VM is going to be reserved for a specific project and should contain the complete development environment including a LAMP stack. However, I still want to keep these machines free of unnecessary "stuff". For this reason, I want to setup a baseline Debian system with UI and a LAMP stack, which can then be customized as needed.

Do bare in mind, that this installation is rather insecure and therefore not fit for production use!

Debian Installer: all packages unselected

System installation

The basis of the system is going to be a bare bone Debian installation. During installation unselect all software (especially the desktop environment). You may want to keep basic system utilities, I personally install them only when the need arises (which it usually doesn't).

Once you are logged in to your base system, perform the usual upgrades:

sudo apt update
sudo apt upgrade

Next, install the display server and the LXDE core system:

sudo apt install xorg lxde-core

You now have a complete desktop available at any time via sudo startx. Also, the next time you reboot the system, you will automatically be presented the graphical login screen. 

The LAMP stack

Currently, the Debian package repository does not include a newer version of PHP than 7.0. Since I want to use some newer functionality, we will add the respective repository and install PHP 7.2. (If you want to use PHP 7.3, just replace php7.2 with php7.3 in all following commands.):

sudo apt install ca-certificates apt-transport-https
wget -q https://packages.sury.org/php/apt.gpg -O- | sudo apt-key add -
echo "deb https://packages.sury.org/php/ stretch main" | sudo tee /etc/apt/sources.list.d/php.list

sudo apt update
sudo apt install php7.2 php7.2-mysql

You might also want to run sudo apt upgrade, as the sury repos usually contain some other newer packages.

We can now install Apache and MariaDB:

sudo apt install apache2 libapache2-mod-php7.2 mariadb-server

Since I typically use mod_rewrite in my projects, I want to configure rewriting for apache. First, let's enable mod_rewrite:

sudo a2enmod rewrite

Next, we will have to allow rewriting for our web directory (which should be /var/www). You will find the required setting in the apache configuration /etc/apache2/apache2.conf:

sudo nano /etc/apache2/apache2.conf

<Directory /var/www/>
    Options Indexes FollowSymLinks
    AllowOverride NoneAll
    Require all granted
</Directory>

Now, restart apache to apply the configuration.

sudo service apache2 restart

Finally, let's install phpMyAdmin to make database administration easier:

sudo apt install phpmyadmin

For my local development machines I use the MariaDB root user without password (which is obviously an absolute no-go on any production system). However, current MySQL and MariaDB releases do not allow connection via root account without at least using sudo. So we need to change the login plugin back to the "normal" password based method:

sudo mysql -u root
UPDATE mysql.user SET plugin = 'mysql_native_password' WHERE User='root'; FLUSH PRIVILEGES;

phpMyAdmin also does not like logins without password, which we will change in the configuration:

sudo nano /etc/phpmyadmin/config.inc.php

Find and uncomment the line

// $cfg['Servers'][$i]['AllowNoPassword'] = TRUE;

You now have a very lightweight development system with a LAMP stack and phpMyAdmin to play around with. Have fun!


September 8, 2018

Virtualization using KVM on Debian Stretch 9.5

KVM on Debian

Good morning,

a short while ago, I purchased a neat little micro ITX board and Pentium G4400 processor on Ebay. Put it in a roomy case, add some hard drives and there you go: a nice and efficient home server. What mischief could we do with that? Let's virtualize!

Preamble

Much of the following applies to a wide variety of Linux distributions. Since there are subtle differences, e.g. in regards to package management, I will be focusing on Debian here. Accordingly, before we start, you'll need to install Debian 9.5. on your system. If you need guidance for installing Debian, please take a look at the Debian manual.

When we talk about running a VM on KVM, we usually mean a wide array of software modules, most of which are not actually a part of KVM:

  • KVM is the Kernel Virtual Machine, which is the virtualization layer included in the Linux Kernel.
  • qemu is an emulator, which, thanks to its modular architecture, can emulate a large variety of hardware platforms. In our case, we will be using it to "emulate" our virtual machine hardware using KVM.
  • libvirt is an API built on top of qemu and KVM. It includes a daemon and various tools, which make VM management on qemu a lot easier.
  • virsh is the command line tool to manage libvirt instances (e.g. startig and stopping VMs).
  • virt-install is a command line tool to configure virtual machines for use with libvirt.
  • virt-manager is a GUI client built on top of libvirt. It allows management of VMs using a graphical interface. Since I'd rather use the command line tooling, virt-manager is not part of this article.

I found many tutorials on the setup of KVM and specifically libvirt to be lacking in clarity, specifically in regards to users and access rights. So let me get some basics out of the way first.

Libvirt has two different modes of operation, denoted by the respective session you connect to: using the url qemu:///system you connect to the libvirt daemon (libvirtd), a system-wide session, which is typically running under root privileges. This is the session tools like virt-manager normally connect to. Since libvirtd is running as root, it has all the necessary permissions to access network bridges, block devices and so on. Images, VM configurations, etc. are by default stored in /var/lib/libvirt. Also, autostarting VMs on system boot is only possible in qemu:///system.

Using the url qemu:///session creates a new libvirt instance for the current user. It is possible to run many user specific sessions on a single system. Privileges of the user-specific instance depend on the user starting the session. VMs, images, etc. are stored in the users home directory., thus no special access rights are needed there. You will however not be able to access block devices without setting up some udev rules to allow access to /dev/[blockdevice] for the given user. Access to network bridges may also require some additional access rights.

You can execute virsh uri to find out which session you are currently connected to. The default session will most likely be qemu:///session. To change this, you can add the following environment variable to your .bashrc, log out and log in again (otherwise you will have to add --connect qemu:///system to all virsh and virt-install commands).

export LIBVIRT_DEFAULT_URI="qemu:///system"

Since qemu:///system is widely accepted as the default mode of operation, I will be using it in this article.

Setup

Without further ado, let us start by installing the required packages:

sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system virtinst

You will typically want your VMs to have network connectivity, for which you need to create a virtual network bridge to your physical NIC. To do that, edit the network configuration as follows:

nano /etc/network/interfaces
# The primary network interface
# Make sure the physical NIC does not use dhcp
allow-hotplug eth0
iface eth0 inet dhcp
iface eth0 inet  manual
iface eth0 inet6 manual

# Virtual network bridge
auto vmbr0
iface vmbr0 inet dhcp
    bridge_ports eth0    # Replace eth0 with your interface
    bridge_stp off
    bridge_maxwait 0
    bridge_fd 0

Now reboot the system.

Creating the first VM

With the preparation out of the way, let's download a boot image for our first VM. I'm going to use a Debian 9.5 netinstal image, which you can easily get with

sudo wget "https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-9.5.0-amd64-netinst.iso" -O "/var/lib/libvirt/boot/debian-9.5.iso"

Finally, let us try to create our first VM:

virt-install --virt-type kvm --name debian9 --memory 512 --cdrom /var/lib/libvirt/boot/debian-9.5.iso --disk size=10 --network bridge=vmbr0 --os-variant debian9

You should be seeing something like this:

WARNING  Graphics requested but DISPLAY is not set. Not running virt-viewer.
WARNING No console to launch for the guest, defaulting to --wait -1
Starting install...
Allocating 'debian9.qcow2' | 10 GB 00:00:00
Creating domain... | 0 B 00:00:01
Domain installation still in progress. Waiting for installation to complete.

Here is what the above command means:

--virt-type kvm: The VM is meant for use with KVM

--name debian9: The VM is called "debian9". This is the name displayed using the command virsh list. It is also what you would enter to edit the VM: virsh edit debian9. Note this is NOT the hostname of the VM.

--memory 512: We want to give 512 MB of memory to our VM.

--cdrom /var/...: The ISO image to boot the VM with.

--disk size=10: Creates a new virtual hard drive with a size of 10 GB alongside the VM. The resulting drive is stored in /var/lib/libvirt/images and is named like the VM.

--network bridge=vmbr0: The VM uses our previously created network bridge.

--os-variant debian9: We will be using the VM to run Debian 9. This allows for various system specific optimizations. To get a list of possible values, run osinfo-query os (you may need to install libosinfo-bin). The often mentioned virt-install --os-variant list does not work anymore.

Connecting to the VM

You can now connect to your new VM to... wait you don't know how to connect? Right, we are running on a headless server. Hmm... let's first stop the domain installation using Ctrl+C. Let us also remove the VM we just created:

virsh destroy debian9
virsh undefine debian9
sudo rm /var/lib/libvirt/images/debian9.qcow2

There are two ways to initialize a VM on a headless server in such a way that one can install on OS on it.

1. Make the VMs console accessible in our terminal session. This is highly OS specific and will not be usable for any Windows guest. There is also an annoying restriction: the -x argument cannot be combined with --cdrom. Once upon a time, it was possible to mount an .iso-image using the -l argument, however this does no longer work. You will instead have to provide a valid install tree, which I could only get to work using an Ubuntu netinstal image:

virt-install --virt-type kvm --name ubuntu --memory 512 -l http://archive.ubuntu.com/ubuntu/dists/bionic/main/installer-amd64/ --disk size=10 --network bridge=vmbr0 --os-variant ubuntu17.10 -x='console=tty0 console=ttyS0,115200n8' --nographics

2. Make the guest accessible via VNC. This is a much better option, because it works with any guest OS and also allows the use of a desktop environment in the client. Don't forget to insert your host machines IP address in the argument --graphics vnc,listen=xyz.

virt-install --virt-type kvm --name debian9 --memory 512 --cdrom /var/lib/libvirt/boot/debian-9.5.iso --disk size=10 --network bridge=vmbr0 --os-variant debian9 --graphics vnc,listen=0.0.0.0 --noautoconsole

You can easily find the VNC port of a VM using

virsh vncdisplay debian9

Start your favorite VNC viewer, plug in the IP and port and voilà - a working VM using KVM on Debian Stretch!

June 17, 2018

On 3rd Party Libraries

Good Morning,

One of those things, oftentimes preached in the coding community is code reuse. Ideally a problem should be solved exactly once. Everybody who has the same problem, should use this one implementation and contribute to it if changes are necessary.1 There are three main points backing up this idea:

The Good

  • Less development time, because large parts of an application can be made of existing code.
  • Libraries are used and worked on by many developers, so they should have good code quality and should be well tested.
  • Less maintenance, because libraries get maintained by others.

In addition, it is often recommended to concentrate on open source libraries. The reason being, that one can access, change or even fork the library if necessary. Open source is a whole different topic, I do not want to go into right now. Just keep in mind that open source is not only upsides either.

I have been a firm believer of the idea of code reuse for many years. And I do think, that you should write and package your own code in a reusable manner and create a set of high quality in-house libraries to be used in any of your applications.

When it comes to 3rd party libraries, though, I have over time found several disadvantages, which I want to bring to your attention.

The Bad

  • Do you know all those 425 libraries?Any library of decent size has a learning curve. Typically, the library is initially used in a wrong way, maybe even against its design. And it leads to bad application design, and in some bad cases even to the addition of more libraries solving the same problem. No one will want to maintain this code.
  • Libraries often get "plugged together" in a more or less random way. Using 3rd party libraries requires special attention to the architecture and the correct inclusion of those libraries into an application. As architecture is a huge issue in many applications, anyways, gluing together a bunch of libraries can only make it worse.
  • Most libraries offer way more features than we actually require. You might say: "What's so bad about having more features than we need?". Well, those features are overhead. Code we have to deliver to the customer without them needing it. They lead to a greater learning curve because we are required to understand all those features to be able to decide whether to use them or not.
  • And then there are those pesky developers; in my experience, any feature available, will be used at some point. This is especially contra productive when we believe a feature to be a bad idea. Let me give you an example:
    I am a huge friend of dependency injection. As far as .NET goes, there are various good libraries out there, LightInjectNinjectAutofac, etc. But I have rolled my own.2 Why?
    Well, I believe that dependency injection requires a very clear statement about the dependencies a class requires. And this statement is best given in the form of a constructor which enforces the provision of all necessary dependencies. I thus want to use constructor injection only. All those libraries I quoted, allow for property injection. This is nice for the developer, because less code is needed. The downside is that I can create and use an object without filling in all properties. Do you want to account, in every single method, for the possibility that your required dependencies are not there? I don't! That's why my DI-container only supports constructor injection, nothing more, nothing less.
  • We usually find (or stumble upon) some use cases the library does not account for. To solve this, we typically add to it. This is done with wrappers, custom derivatives, extension methods and so on. While this does not at first sound too bad, I have found most of these cases to lead to extremely bad code and architecture.
    The problem is, that we don't choose a library we expect to not have the features we need.3 So when we do find a missing feature, we are usually stumped and in a hurry to get it implemented. What makes it even worse is, that many libraries are not exactly built to be easily extended from the outside.4 While they work fine for what they are designed to do, they often fail when asked to do more.
    This is where the open source argument comes in. Just add the feature and contribute to the library. But does that not defeat our goals? When I use a 3rd party library, I want to rid myself of the responsibility to extend and maintain it. I don't want to understand it's code, I want it to just work. What I certainly do not want is, that I have to get my hands dirty implementing additional features into or even forking the library.

The Ugly

You ask me "What now? Should I be using 3rd party libraries or not?". My answer is a firm and ascertaining "It depends". As all things in live, there is no easy answer, no one right way. What I can offer you is this: Let us consider some questions one should ask themselves before using a 3rd party library:

  1. Do you believe you can write a maintainable, well tested, reusable piece of software that solves the problem at hand? If the answer is "No", don't even bother. Invest your time into finding the best library out there, that fits your needs.
  2. Do you know a library (from deep experience working with it) that fits the bill? If you or one of your more advanced developers know such a library, go for it.5
  3. Do you think (as opposed to "know") there is a library that would solve your problem? This is where it gets complicated. Find out everything there is about the library. Try it out, test it to the core. Try to consider all use cases you may run into (especially those you do not want to run into). And pay extra attention to the correct use of the library in your application.
  4. If none of the above are true, meaning that you cannot find a library that solves your problem, the learning curve is too steep and you get a headache thinking of all those library features, that you do not want to have in your application. Then, you should consider rolling your own. But as always, make it reusable for yourself. Implement automatic tests. Pay special attention to the interface and architecture in general. Your libraries should have an even higher code standard than your applications!

Whenever you use 3rd party libraries, you must only consider those, that are "a safe bet". Make sure the license is acceptable, that it is well maintained and actively developed by a decently sized group, that the code is of high quality and well tested. And do not look at the price.6 A library usually is a lifetime investment, as far as the application using it is concerned. Never choose the "viable alternative" over the "perfect match" because it is free. Those decision will come back to haunt you tenfold in your maintenance budget.

I hope I could give you some pointers as to the selection of 3rd party libraries. Do not make hasty decisions. Think it through, find the best answer you can, and stick with it. And I am sure you will be successful with your decision.

  1. This is, of course, not possible as there are always multiple ways to solve a complex problem, none of which are perfect. The goal still stands, though.
  2. Which you can find at https://github.com/programmersdigest/Injector
  3. Although we should! Do not expect any library to meet all your needs. Think ahead. Implement ways to add functionality before you need it. It will be worth it quicker than you may think.
  4. I'm looking at you, Microsoft, with your sealed classes.
  5. Of course you should still check it, test it and so on. But having a developer who knows your needs well, whose advice you trust, who has lots of experience with the library in question and will most certainly stay in your team for a long time to come, is a pretty good start.
  6. This argument is to be seen in the context of a company. If you are a private developer, the price may absolutely affect your decision (it does for me). But then again, the investment into a library and the risks attached to a wrong decision are not as high as they are for a company with tens or even hundreds of employees working on the application in question.
June 10, 2018

LINQ: .NET Collection Classes

Good morning,

Various collections

LINQ is one of the most compelling features of C#. Whenever we are dealing with data in lists (which is to say, almost all the time), we require methods to retrieve and manipulate this data. LINQ provides a) an easy and consistent way of working with lists, and b) a functional approach to list manipulation.

This article is the first in a series of articles on LINQ. In this series we will

  • take a look at the collection classes of .NET
  • learn about lambdas, closures, Action<> and Func<>
  • use the power of extension methods
  • explore the IEnumerable<> interface
  • dive into the various LINQ methods
  • and try some PLINQ (parallel LINQ).

Without further ado, let's dive right into the various collection classes available in .NET Framework1 and explore their uses and limitations. Since most business applications are purely data driven (how many serious programs without a database have you worked on?), knowledge on the collection classes of a language, to me, is as fundamental as knowledge on the respective languages basic data types.

Most important the collections in .NET are contained in the namespace System.Collections.Generic. Their non-generic equivalents contained in the System.Collections namespace should not be used. They only remain for compatibility reasons. Concurrent collections can be found in the System.Collections.Concurrent namespace.

Array / List<T>

An array (or a List<T>, which is basically a wrapper around the array) in C# is pretty much what you would expect an array to be in any language: a collection of items contained in a single continuous block in memory. As such, access via index is very fast (similar to pointer arithmetic in C). Finding items by attributes, however, requires a full scan of the array.

Adding items to an array is typically cheap, because C# reserves array space in blocks. Only inserting and removing items in the middle of an array is somewhat costly, since all following items have to be moved. If this is your use case, consider using a LinkedList<T> (see bellow).

HashSet<T>

As the name implies, the HashSet<T> saves a hash per each item in the collection. As opposed to an array, a HashSet<T> does not guarantee the order of items to be preserved. In addition, a specific item may only be contained once, which makes it optimal in case every item in the collection must be unique. The equality of items in the HashSet<T> is computed using the Equals() method of the items.2

The HashSet<T> shines when it comes to finding an item via instance (or rather via "thing" that is considered equal). A good example is a random list of strings. Since identical (in regards to content) strings are considered equal, converting a List<string> to HashSet<string> makes for an easy way to make every string in a random list unique.

var list = new List<string> {
    "One", "Two", "Three", "Two", "Three"
};
var set = new HashSet<string>(list); // Output: "One", "Two", "Three"

Dictionary<T, U>

The Dictionary<T, U> is a key-value-collection. Each key (of type T) maps to an item (of type U). Access via key is very efficient, which makes the Dictionary<T, U> excellent for retrieving items by means of an attribute (such as its primary key).

The keys of a Dictionary<T, U> have to be unique. Whereas the HashSet<T> quietly "overlook" duplicate entries, the Dictionary<T, U> throws an exception when a key is added twice. As with the HashSet<T>, equality of keys is checked using the Equals() method.

Whenever you find yourself iterating over a collection multiple times, to find a single item via a specific field (e.g. a person by name in a List<Person>), consider using a Dictionary<T, U>. Even temporarily creating a Dictionary<T, U> for only a few iterations may provide considerable performance enhancements. LINQs ToDictionary() method is your friend.3

ConcurrentDictionary<T, U>

All concurrent collections allow for parallel read and write access from multiple threads. Since they typically need to hold a copy of the collection per thread and copies have to be synchronized, using a concurrent collection in a single-threaded use-case is not advisable. They are, however, powerful tools in highly parallel scenarios.

The most useful concurrent collection is the ConcurrentDictionary<T, U>. It provides a thread-safe implementation of the Dictionary<T, U> and is especially useful for data caching and multi-threaded service implementations.

Even though concurrent collections are in essence thread-safe, not all operations on these collections may be thread-safe. The documentation actually denies thread-safety of explicit interface implementations, extension methods (LINQ anyone?) and methods which take delegate parameters.4 Always check the specific methods documentation, especially when using LINQ against a concurrent collection.

Other useful collections

Of course, there are a number of other useful but more specific collections:

  • Stack<T>: A first in last out (FILO) collection
  • Queue<T>: A first in first out (FIFO) collection
  • LinkedList<T>: Your typical linked list. Useful in case you often want to insert or remove items from the middle of a large collection of items.
  • SortedList<T>, SortedSet<T>, SortedDictionary<T>: Variants of List<T>, HashSet<T> and Dictionary<T, U> which order the contained items.5
  • The other collections in the System.Collections.Concurrent namespace: ConcurrentStack<T> and ConcurrentQueue<T> are concurrent implementations of Stack<T> and Queue<T> respectively. ConcurrentBag<T> is somewhat special, as it really is a bag of items: Items are unsorted and not accessible by index or instance. You stuff things into it and iterate over everything whenever you need an item (am I the only one reminded of my drawers?).
  • The various collections from the System.Collections.ObjectModel namespace: They are typically used for UI development. Especially the ObservableCollection<T>, which implements INotifyCollectionChanged and is often used for DataBindings in WPF.
  • The new and shiny ImmutableCollections from the System.Collections.Immutable NuGet package:6 The idea is simple: Collections which cannot be changed, can freely be accessed from multiple threads. There are implementations of all the typical collections, ImmutableList<T>, ImmutableDictionary<T, U>, ImmutableQueue<T>, etc. Even though these collections are not directly a part of .NET framework, I wanted to mention them because they provide an additional way to use collections over multiple threads.7
June 6, 2018

Microsoft buys GitHub

Good morning,

GitHub Octocat

There have been rumors in the net, now it is finally confirmed: Microsoft has bought GitHub1 - and the community is in panic. Many open source projects fear that Microsoft will use its influence to harm them and push their own agenda. Let's take a look at the pros and cons of the "GitHub sellout" and the involved parties goals.

Why would Microsoft buy GitHub?

I think this one is easy: Microsoft has used GitHub as a code hosting platform for many years. They moved some of their biggest projects to GitHub in 2014/20152 in preparation of closing their own platform CodePlex.3 Today Microsoft has over 1800 repositories on GitHub in their main account. In addition, Microsoft has embraced Git as their main versioning tool and has moved all of the windows source code into "the largest Git repo on the planet".4 Strategically, it makes perfect sense to me, that Microsoft would buy GitHub. This move allows them to tailor their main code hosting platform to their needs as well as the needs of their customers (= developers on the Microsoft platforms).

Why would GitHub be sold?

In 2012, Chris Wanstrath, one of the founders of GitHub, stepped down to let Tom Preston-Werner (another co-founder) take the lead. In 2014 Chris Wanstrath again became CEO, but by the end of 2017 wanted to step down again (Mid 2018, GitHub had still not found a replacement for him). By then, the company had a recurring annual revenue of $200 million and was valuated at $2 billion.5 They never earned a penny, though. If you were in this position and Microsoft would come to you, saying: "Well, this company is worth $2 billion, we'll give you $7.5 billion for it (and we have a good CEO for it, as well)", what would you do? I know I would sell!

And what about us?

Microsoft logo

So what do we, as users of GitHub, take away from that? I personally think, that Microsoft will not change GitHub in a radical way. There will be changes, don't get me wrong, and some will be great, some will be not so great. Overall, however, I think GitHub has finally become a stable company (and with that, platform). Microsoft does not need GitHub to make money, they need developers to use their products, they need a community. And that's what they try to get from this acquisition. Also, I am happy to see that there is again a good CEO leading GitHub. Not that Wanstrath was not good (he made GitHub what it is today), but he did not want to continue and there was no replacement in sight. Well, this problem is solved now.

All in all, I look forward to the development of GitHub as part of Microsoft. There are risks, of course, but there are also great opportunities for GitHub to become even better. I hope the latter will be for us to enjoy.