Friday, September 26, 2014

How to protect against the Bash Bug (ShellShock)

I will not explain in technical details how yesterday's freshly released Bash Bug exploit works as there are already millions of articles spreading like wild fire on this subject. But in a nutshell, the problem is that while it's okay to define a function in an environment variable, bash is not supposed to execute the code after it.

Let me give you an easy example to determine whether my Bash version is impacted or not by this vulnerability. If I run the code below, I should get echoed "Vulnerable" if indeed I am:

The patch has been released yesterday so there's no excuse in not fixing your servers. Additionally RedHat/Centos/Fedora/Debian/Ubuntu and the likes, have updated their bash package as well. Updating bash will make the previous code not able to run any commands after the function:

Thursday, September 25, 2014

Read ebooks on Linux

If you are an avid online book reader, an essential tool is an epub reader. Check out FBReader. It is one of the best free tools I have used so far and works on many platforms, including Windows, OSX and Linux.

Friday, September 19, 2014

MySQL with HAProxy for Failover and Load Balancing

As discussed in a previous blog post about different types of MySQL HA architectures, using HAProxy for failing over and load balancing over clusters of MySQL can be very effective for most  situations where a transparent application manual failover is required (OK I coined this term after years working on Oracle systems). In this article I will explain how to setup an architecture similar to Figure 1.

Figure 1 - Two MySQL "clusters" load balanced by HAProxy


For simplicity, the cluster will be made up of one node (the master), and for this setup we will use three machines:

  • m-mysql01 - First MySQL cluster 10.250.1.101
  • m-mysql02 - Second MySQL cluster 10.250.1.102
  • mysql-cluster - HAProxy 10.250.1.100

In this article I will assume that you have already installed MySQL on m-mysql01 and m-mysql02 and set them up as master-master replication. In the next steps we will create a user for HAProxy to determine the status of the MySQL servers (a non-privileged user with a blank password accessible only from HAProxy) and another user for the application to use to connect through HAProxy:


Please note that these users will be automatically replicated to the other node. Before we start looking at the HAProxy part, let's install MySQL client and test the connectivity to both nodes:
If you are able to list the databases, we can move on to install HAProxy:

Create a new HAProxy configuration:

A bit of explanation on this configuration might be handy for you guys especially if you're not familiar with HAProxy. The most important blocks are the last two where we tell HAProxy to listen to the network interface and based on the rules forward the requests. In the first listen block we are accepting MySQL connections (from the application) on 10.250.1.100. HAProxy does not understand the MySQL protocol, but it understands TCP (hence mode TCP). The "option" tag is used to determine the node status by trying to TCP connect using the "haproxy_check" user. In the next two lines we put the MySQL nodes. Since in my particular case I would like to have one active server at a particular time (since the application is not robust enough to handle node crashing with async replication), I am commenting the second server.

In the second listen block I am configuring a simple stats application which comes by default on HAProxy. It is now time to start HAProxy:

When I point my browser to http://10.250.1.100:8080 I can see the "cluster" status:

A green row indicates that HAProxy is able to communicate with the MySQL node. We can also perform another test using MySQL protocol (i.e. MySQL client):

And that's it! Now we can go on and test failovers by replacing commenting out m-mysql01 and activate m-mysql02 instead. And now for some stress tests I use "mysqlslap" tool.

Stress testing against HAProxy:

Stress testing against one of the MySQL nodes directly:

The stress test which was run a number of times on a cold and warm instance, shows that HAProxy acqually managed connections better, resulting in faster queries. Note that the stress test introduces both INSERTS and SELECT statements.

Another cool thing you can do with HAProxy is to limit the maximum number of connections to the MySQL servers. This can make sense not just to protect you against DoS attacks but to actually improve performance especially if your data files are not on a multidisk SAN. I normally like to set the maximum connections to 20, but this is subject to your environment:

A Comparison of MySQL HA Architectures

I was recently asked to design a new MySQL HA architecture for an internal project which currently runs on a Master-Slave. The acceptable criteria were pretty much defined and agreed:

  • Provide High Availability (no need to be automatic failover)
  • Easy Failover (everyone should be able to do it without being a DBA)
  • Seamless Failover (the application should not be modified on a failover)
  • Scale Reads (for reports and DWH applications)
  • The performance should be reasonable good or at least not worse than the current setup
  • The application is not robust enough to handle crash failures in an async master-master setup (ie distributing writes is out of the question)

With this in mind, we were discussing several setups:

Setup #1 MHA:

This is a very popular setup in the MySQL community, and if setup well it provides you with zero downtime if one node crashes. 

In this architecture we have 2 elements: MHA manager (x1) and MHA nodes (x3 in our case). The MHA Manager can run on a separate server and can manage multiple master-slave clusters. It polls the nodes of the clusters and, on finding a non-functioning master it promotes the most up-to-date slave to be the new master and then redirects all the other slaves to it. 

The failover is transparent to the application. An MHA node runs on each MySQL server and facilitates failover with scripts that monitor the parsing & purging of logs. The architecture being proposed here is shown in figure 1.
Figure 1 - MHA with 3 nodes


The problem with this setup is that it is very difficult to maintain if something goes wrong. Also I hate Perl.


Setup #2 Manual Managed Replication (Master-Slave-Delayed Slave):

In this architecture we make use of traditional self managed master-slave replication with an additional delayed slave. The application always points to the master and should the master goes down, we have to manually promote the slave to master and point the application to it. 

Doing a failover entails DBA knowledge - the downtime, in comparison to architecture 2, will be a bit longer. The benefit with this architecture is its simplicity and lack of Perl scripts. This architecture is shown in figure 2. 

Figure 2 - Simple MySQL replica with a delayed Slave

A delayed slave is useful if a MySQL user accidentally drops a table or deletes a row. The problem with this setup is that on a failover the application needs to be changed to point to the new master.

Setup #3 Percona XtraDB Cluster:

I will talk more about this setup in detail in a future blog. I personally installed my first Percona XtraDB/Galera cluster last April 2014 on a Rackspace infrastructure. Writes and reads scaled beautifully and the failover was seamless. But I was experiencing random node failures, network partitioning and corruption. Was it a bug in Percona XtraDB or Galera? Was is due to Rackspace infrastructure? I filed bug reports and I never had the time to investigate further so I ditched this setup completely. I feel like this product needs to mature a bit more before being production ready.

Setup #4 MySQL Clustered by HAProxy:

When designing this architecture, I kept in mind all the pitfalls of the previous setups. In this architecture we are making use of HAProxy to handle all the failover between two clusters of master-slave nodes. The application will write to just one cluster at any point in time, but the master-master replication between both clusters will make sure that they are always in sync. To failover, we point haproxy to the other cluster, as depicted in Figure 3. 

Note that during the failover there is no downtime on the application. Therefore this can be used to do real time and downtime-less changes on the application-database stack.

Figure 3 - Failover using HAProxy and master-master replication between the clusters

This is personally my favorite setup due to my positive experience with HAProxy. Additionally, this setup ticks all the boxes for our requirements of then new architecture. As an extra bonus we can setup one of the slaves as a delayed. While writes will not be scaled (to satisfy our acceptable criteria), the reads can be scaled if we wanted to.

How would I make this setup fool-proof that even non-DBAs can failover? Simple - we can use Jenkins or Bamboo tasks to switch the HAProxy cluster as per diagram.

In the next blog post I will show in detail how to setup an HAProxy load balanced MySQL cluster.

Tuesday, August 05, 2014

Installing a DigiCert star SSL certificate in AWS Load Balancer

This should be quite a straightforward task, especially since I have been installing countless of HAProxy SSL terminated load balancers. When I was reading that setting an AWS load balancer with SSL can be a royal pain, I confess my first reaction was 'n00bs!'.

However I want to be quite clear here, the load balancer dashboard on AWS is a bit buggy. Let's take you through the process of setting up the load balancer for SSL termination as documented by AWS:



So we first start by created the port mapping between the ELB and the instances. If you want to terminate the SSL on port 80, you can set both ports as 80 on the instance. I prefer to terminate them on different ports so I make an explicit rewrite from HTTP to HTTPS. Example: I set up instance ports to 80 and 81, the latter being the "SSL" (although in reality, internally we have standard HTTP). If someone requests resource by http, I have a rewrite to https, which will redirect to port 81 by the ELB.

After you follow the next screens (read, click click click) you get to a point where you "upload" (read, copy paste) your SSL certificates. Now this is the trickiest part, which should not be in reality - so I do not know if this is a bug in AWS or there is something wrong integration-wise with DigiCert star certificates and AWS.


The dialog asks you to enter four pieces of information:


  • Certificate Name – The name you want to use to keep track of the certificate within the AWS console.
  • Private Key – The key file you generated as part of your request for certificate.
  • Public Key Certificate – The public facing certificate provided by your certificate authority.
  • Certificate Chain – An optional group of certificates to validate your certificate.


The private key is normally called star_<domain_name>.key, the public key certificate star_<domain_name>.crt and the Certificate Chain is a concatenation of the previous two and the DigiCertCA.crt intermediate certificate. But here comes the cockup. When you arrive at this screen, just fill the Private Key and Public Key Certificate and click Create.

Once the Load Balancer is created, go to the Listeners tab and click Change SSL certificate. Upload a "new one", by repeating the same process as before, but this time let's fill the Certificate Chain. Unlike traditional Certificate Chain, AWS expects just the Intermediate Certificate here, so just paste the contents of DigiCertCA.crt.

Note: You might ask that instead of repeating the last step, why don't we just paste the Certificate Chain at the LB setup. Now this is why I stated that AWS might be buggy - if you past the Certificate Chain at the ELB setup, a cryptic error will occur stating that the intermediate certificate is not valid. This is the only way I know it works (and which I haven't seen documented anywhere in the interwebs).

To check that you have the Chain installed correctly, use curl:
─james@darktech  ~
╰─$ curl -v https://<domain_name>.com                                                                                                                                                             60 ↵
* Rebuilt URL to: https://<domain_name>.com/
* Adding handle: conn: 0x23f2970
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x23f2970) send_pipe: 1, recv_pipe: 0
* About to connect() to <domain_name>.com port 443 (#0)
*   Trying 1.2.3.4...
* Connected to omarsys.com (1.2.3.4) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_RSA_WITH_AES_128_CBC_SHA
* Server certificate:
* subject: CN=*.<domain_name>.com,OU=IT,O=acme Limited,L=Sliema,C=MT
* start date: Dec 06 00:00:00 2013 GMT
* expire date: Dec 11 12:00:00 2014 GMT
* common name: *.omarsys.com
* issuer: CN=DigiCert High Assurance CA-3,OU=www.digicert.com,O=DigiCert Inc,C=US

The part marked in bold should state the details of the CA, signing Certificate and encryption cipher.


The 12 Factor App

Quoted from 12factor.net, this is how an application infrastructure should be built - no exceptions to the rule!

I. Codebase

One codebase tracked in revision control, many deploys

II. Dependencies

Explicitly declare and isolate dependencies

III. Config

Store config in the environment

IV. Backing Services

Treat backing services as attached resources

V. Build, release, run

Strictly separate build and run stages

VI. Processes

Execute the app as one or more stateless processes

VII. Port binding

Export services via port binding

VIII. Concurrency

Scale out via the process model

IX. Disposability

Maximize robustness with fast startup and graceful shutdown

X. Dev/prod parity

Keep development, staging, and production as similar as possible

XI. Logs

Treat logs as event streams

XII. Admin processes

Run admin/management tasks as one-off processes

Thursday, July 17, 2014

If you can RTFM, we WANT YOU!

So yesterday I was tasked to take care of putting up a job description for a devops engineer in our team. This is what I came up, and to my surprise, even non-techies enjoyed it: