HOWTO

Debugging a (particular) failing boot service on Linux

At work I recently rolled out a newer version of the Dell OpenManage tools which included for the first time a build of Openwsman. We didn't specifically need this functionality, but it's good to stay current with the OpenManage tools. To load in the (unrelated) new kernel on a test machine, I rebooted the machine using Cobbler's power management functionality on our administrative system, but after 5 minutes the machine was still not responding to pings so something was broken. I used remote desktop to hop on our one Windows server in the datacenter which we use to get at the interactive consoles of our servers (Thankfully the new DRAC6 card's have console applets that work on Macs!), and pulled up the console for this machine.

The boot process was hung on "Starting openwsman" and didn't seem to be doing anything. Doh!

I restarted the machine again, and at the grub boot menu added a "S" to the boot string to start up the system in single user mode, and booted things up. "chkconfig openwsman off" to disable the service, and another reboot to get the machine back up and running to let me troubleshoot a little better. I took a look in /etc/init.d/openwsman to see what might be hanging, and nothing immediately looked suspicious. It was a pretty standard init script, with the extra feature of generating OpenSSL certificates if they didn't exist already:

if [ ! -f "/etc/openwsman/serverkey.pem" ]; then
        if [ -f "/etc/ssl/servercerts/servercert.pem" \
                -a -f "/etc/ssl/servercerts/serverkey.pem" ]; then
            echo "Using common server certificate /etc/ssl/servercerts/servercert.pem"
            ln -s /etc/ssl/servercerts/server{cert,key}.pem /etc/openwsman/
        else
            echo "Generating Openwsman server public certificate and private key"
            FQDN=`hostname --fqdn`
            if [ "x${FQDN}" = "x" ]; then
                FQDN=localhost.localdomain
            fi
cat << EOF | sh /etc/openwsman/owsmangencert.sh > /dev/null 2>&1
--
SomeState
SomeCity
SomeOrganization
SomeOrganizationalUnit
${FQDN}
root@${FQDN}
EOF
        fi
    fi

It's a little strange, but not unheard of practice to do this, and shouldn't cause any problems. (Puppet and Func, two other systems tools we use, generate their certs in the application which is a lot more common.)

I extracted the only possible culprit from the owsmangencert.sh script and tried running the openssl command manually:

openssl req -days 365 $@ -config /etc/openwsman/ssleay.cnf \
  -new -x509 -nodes -out cert.out \
  -keyout key.out

and it seemed that this was indeed the problem. It just sat there and didn't complete with the speediness I expect from OpenSSL. Time for strace!

cat << EOF | strace openssl req -days 365 -config ./ssleay.cnf.2    -new -x509 -nodes -out cert.out   -keyout key.out
> --
> SomeState
> SomeCity
> SomeOrganization
> SomeOrganizationalUnit
> test
> root@test
> EOF

This ended up doing a long read with output like:

open("/dev/random", O_RDONLY)           = 3
read(3, "\323K\372u_ya'\27\266\320\25\22\373\240\330~'\224\310\243\356\225\350.\245\362\3058\230Zb"..., 1024) = 128
read(3, "K\7:\273Zdr\274\25\227\263\366\260U\337Owp\6y\2333c\361\322\334\217\370.k\375]"..., 896) = 128
read(3, "dH\375V\327\230Bi\221\342\326\26R\301v^Qv5f\347\303g7\2747\345\360\207A!\227"..., 768) = 128
read(3, "X&\254r\331\353<:\36!\333\340\353", 640) = 13
read(3, "\357F\27\347\372atf", 627)     = 8
read(3, "\231\347\232\362\345\215n\227", 619) = 8
read(3, "\324\304\323\30\325\10G\332", 611) = 8

Looks like /dev/random wasn't returning random data nearly fast enough, which makes a whole lot of sense! /dev/random is "good" random data because it is based on environmental entropy and the entropy data is only used once, but on a modern multi-core systems doing lots of things, there usually isn't much entropy available. That means that while this command would eventually finish, it could take a very long time.

The fix: using /dev/urandom instead. It is "not quite as good" random data because the output may have less entropy than /dev/random, and it uses internal entropy bits multiple times to generate it's output, but it's "good enough" for generating cryptographic keys. And, it is non blocking which means that a caller will never have to wait inane amounts of time for enough "random" data. (See http://en.wikipedia.org/wiki//dev/random for a longer explanation.

I replaced the two occurrences of /dev/random, one in /etc/openwsman/ssleay.cnf and one in /etc/openwsman/owsmangencert.sh, and initial startup of openwsman (including key generation) became pretty instantaneous. "chkconfig --levels 2345 openwsmand on" to turn it back on, and a reboot (after removing the generated keys and certs) to confirm, and the machine booted up as expected. To make this work everywhere, I customized those two config files and added them to our Puppet system so that all Dell servers would get Openwsman set up properly when the update is run globally:

 file {
    "ssleay.cnf":
      path => "/etc/openwsman/ssleay.cnf",
      source => "puppet://$server/dell/ssleay.cnf",
  }
 
  file {
    "owsmangencert.sh":
      path => "/etc/openwsman/owsmangencert.sh",
      source => "puppet://$server/dell/owsmangencert.sh",
  }

Problem solved and all machines will automatically get the correct fix, so the next time a machine won't finish starting up, it will be a new and different problem to debug.

Monitoring Nginx with ZenOSS

If you're using ZenOSS for network monitoring, and you have a few loadbalancers (or servers of some sort) running Nginx, chances are pretty good that you want to see what your load balancers are up to inside of ZenOSS. It requires gluing some things together, but once you know what kind of glue to use, it's a pretty straightforward process. First, you need to add status pages for Nginx. This means that it must be compiled with "--with-http_stub_status_module", but if you're using Nginx from a package provider like EPEL, it already has this included. Add nginx_status to your configuration by adding something like:

location /nginx_status {
    stub_status on;
    access_log   off;
    allow IP.OF.MONITORING.SERVER;
    deny all;
}

to your first server {} block. Reload nginx and visit http://IP.OF.NGINX.SERVER/nginx_status and you should get some stats like:

Active connections: 8 
server accepts handled requests
 455010 455010 781977 
Reading: 0 Writing: 2 Waiting: 6 

Up next is getting this information into ZenOSS. We'll do this using a nagios plugin grabbed from here: check_nginx. This plugin does most of the work, but it's not quite enough to use here because it bases the numbers it provides on the difference in the accepts/connects over 2 runs 1 second apart instead of real numbers, and the format doesn't work with ZenOSS's input parser, so you'll need to apply this patch:

--- check_nginx.sh	2009-11-24 07:31:35.000000000 -0800
+++ check_nginx.sh.1	2009-11-24 06:45:56.000000000 -0800
@@ -181,17 +181,13 @@
     if [ "$secure" = 1 ]
     then
         wget_opts="-O- -q -t 3 -T 3 --no-check-certificate"
-        out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
-        sleep 1
-        out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
+        out=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
     else        
         wget_opts="-O- -q -t 3 -T 3"
-        out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
-        sleep 1
-        out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
+        out=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
     fi
 
-    if [ -z "$out1" -o -z "$out2" ]
+    if [ -z "$out" ]
     then
         echo "UNKNOWN - Local copy/copies of $status_page is empty."
         exit $ST_UK
@@ -199,13 +195,9 @@
 }
 
 get_vals() {
-    tmp1_reqpsec=`echo ${out1}|awk '{print $10}'`
-    tmp2_reqpsec=`echo ${out2}|awk '{print $10}'`
-    reqpsec=`expr $tmp2_reqpsec - $tmp1_reqpsec`
-
-    tmp1_conpsec=`echo ${out1}|awk '{print $9}'`
-    tmp2_conpsec=`echo ${out2}|awk '{print $9}'`
-    conpsec=`expr $tmp2_conpsec - $tmp1_conpsec`
+    reqpsec=`echo ${out}|awk '{print $10}'`
+
+    conpsec=`echo ${out}|awk '{print $9}'`
 
     reqpcon=`echo "scale=2; $reqpsec / $conpsec" | bc -l`
     if [ "$reqpcon" = ".99" ]
@@ -220,7 +212,7 @@
 }
 
 do_perfdata() {
-    perfdata="'reqpsec'=$reqpsec 'conpsec'=$conpsec 'conpreq'=$reqpcon"
+    perfdata="reqpsec=$reqpsec conpsec=$conpsec conpreq=$reqpcon"
 }
 
 # Here we go!
@@ -247,17 +239,17 @@
 then
     if [ "$reqpsec" -ge "$warning" -a "$reqpsec" -lt "$critical" ]
     then
-        echo "WARNING - ${output} | ${perfdata}"
+        echo "WARNING - ${output} | ${perfdata};"
 	exit $ST_WR
     elif [ "$reqpsec" -ge "$critical" ]
     then
-        echo "CRITICAL - ${output} | ${perfdata}"
+        echo "CRITICAL - ${output} | ${perfdata};"
 	exit $ST_CR
     else
-        echo "OK - ${output} | ${perfdata} ]"
+        echo "OK - ${output} | ${perfdata}; ]"
 	exit $ST_OK
     fi
 else
-    echo "OK - ${output} | ${perfdata}"
+    echo "OK - ${output} | ${perfdata};"
     exit $ST_OK
 fi

by saving that to "check_nginx.patch" and running "patch -p0 < check_nginx.path" from the folder where you have the check_nginx script saved. Make sure that the ZenOSS user can run the script, and make sure it works. The output should look like:

OK - nginx is running. 782995 requests per second, 455656 connections per second (1.71 requests per connection) | reqpsec=782995 conpsec=455656 conpreq=1.71;

With all this working, now you'll just need to create a new Template in ZenOSS, add a COMMAND data source to it with a command like:

/home/zenoss/scripts/check_nginx_ng.sh -N -H ${dev/manageIp}

Then add two datapoints, reqpsec and conpsec. Make sure to set both to the "COUNTER" type because the patched check_nginx script reports constantly increasing numbers instead of the GAUGE that it was before! Bind this new template to any devices or device classes where servers are running Nginx, and create any graphs you like. I have a graph on each device that shows reqpsec and conpsec, as well as a report that shows the aggregate reqpsec and conpsec for all of the loadbalancers. If your command is named "check_nginx" in the template, you can use the variables in any report by adding "checK_nginx_reqpsec" as a data point.

Without too much trouble, you now have fancy graphs in ZenOSS for your Nginx statistics, and you can set thresholds for these if you have conditions that should send out alarms.

Hairpinning with a Cisco ASA

What a long battle with Cisco IOS this has been, but after quite a bit of tinkering I've gotten things working the way that I would like. Here's a technical description of the details in hope that this helps someone else.

The Setup

  • Load balancers with private IP address like 172.16.0.10 on a /24, running example.com
  • Cisco ASA Firewalls running 7.2(1) or newer, that map public IP addresses (I'll use 192.168.0.193 on a /24 here instead of a real public IP)
  • Internal DNS servers that map loadbalancer.private to 172.16.0.10
  • External DNS servers that map example.com to 192.168.0.193
  • Random application server behind the firewall with no public IP address and a private IP of 172.16.0.20

The Problem

Applications behind the firewall need to access other applications behind the firewall using the public DNS name (example.com) instead of the private one (loadbalancer.private).

Some possible solutions

As an easy-to-set-up solution, we currently have the internal dns servers set up to map example.com to 172.16.0.10 which works fine, except it requires updating DNS records in multiple places. Our naming scheme slowly got a bit more complex, and I've had to add explicit relay rules to our DNS server configuration files to relay certain lookups from the internal DNS servers to the external DNS server's internal IP address. Sending it to the DNS server's external IP address doesn't work because the Cisco ASA will not send traffic back out on the same interface that it came in on, even after network translations have been done. (For a different portion of our external IP space, I added some static routes to the core router but when we move those IPs behind this firewall, this ASA feature will break those routes as well)

The current mapping of public IPs to private IPs looks like:

static (inside,outside) 192.168.0.193 172.16.0.10 netmask 255.255.255.255

One feature that Cisco suggests to solve our problem is using "DNS Doctoring" which is just simply adding the 'dns' keyword to the end of the mapping like:

static (inside,outside) 192.168.0.193 172.16.0.10 netmask 255.255.255.255 dns

which modifies DNS queries going through the firewall from the inside interface to change the IP from 192.168.0.193 to 172.16.0.10. This would great, if your DNS server is outside of the firewall, which ours is not. Our internal DNS queries never travel through the ASA so this didn't do anything for us.

Up next was trying out

same-security-traffic permit intra-interface

which "permits communication in and out of the same interface" which sounds like it's the exact right solution for the problem because that was the limitation that broke things. However, adding this in didn't seem to change anything and traffic still was not permitted in and out the same interface.

The Solution

After a lot of troubleshooting, which involves an ASA 5510 and a 3524-XL on the floor under my desk, downloading and installing new versions of IOS, a lot of Googling, a lot of cursing, and a lot of sketching possible things out on paper, I finally figured out the missing piece: Hairpinning which is "the process by which traffic is sent back out the same interface on which it arrived." Here is the configuration that finally got traffic flowing from 172.16.0.10 to 192.168.0.193 on the ASA back out to 172.16.0.10 on the same interface it started on:

!--- Output suppressed.
!
interface Ethernet0/0
 nameif outside
 security-level 0
 ip address 192.168.0.192 255.255.255.0 
!
interface Ethernet0/1
 nameif inside
 security-level 100
 ip address 172.16.0.1 255.255.0.0 
! 
!--- Output suppressed.
!
same-security-traffic permit intra-interface
access-list outside_in extended permit icmp any any 
access-list outside_in extended permit tcp any any 
!
!--- Output suppressed.
!
global (outside) 1 interface
nat (inside) 1 172.16.0.0 255.255.0.0
alias (inside) 192.168.0.193 172.16.0.10 255.255.255.255
alias (inside) 10.0.0.20 172.16.0.20 255.255.255.255
static (inside,outside) 192.168.0.193 172.16.0.10 netmask 255.255.255.255 
access-group outside_in in interface outside
!
!--- Output suppressed.

The trick here was, combined with "same-security-traffic permit intra-interface" to add the alias lines, the first one:

alias (VLAN100) 192.168.0.193 172.16.0.10 255.255.255.255

does something sensible and aliases 192.168.0.193 to 172.16.0.10 on the inside interface so any time traffic comes in here matching that IP, it gets rewritten. The second line is also required but doesn't make as much sense:

alias (inside) 10.0.0.20 172.16.0.20 255.255.255.255

This line is telling the ASA to take any traffic coming in destined to 10.0.0.20 and map it to 172.16.0.20, however, we don't have any devices on 10.0.0.0/8 and there are no routes for this, so there will never be any traffic coming in to 10.0.0.20. That said, this line has to exist so that there is a mapping back to 172.16.0.20 in the alias table so that the ASA knows it's alright to send traffic to it. Using a "real" public IP here would both use up our public IPs and perhaps pose some security risk, so it's safer to use these non-public IPs and add a rule to prevent incoming traffic from the outside from reaching them. If the alias command would work for an IP range instead of one host, this would be pretty much perfect.

The result

Things finally work! Here is a trace of a ping from 172.16.0.20 to 192.168.0.193 (which works now!):

ICMP echo request from VLAN100:172.16.0.20 to VLAN100:192.168.0.193 ID=12034 seq=0 len=56
ICMP echo request translating VLAN100:172.16.0.20 to VLAN100:10.0.0.20
ICMP echo request untranslating VLAN100:192.168.0.193 to VLAN100:172.16.0.10

So the ASA is doing the translating the proper way and not doing anything with 10.0.0.20. This is good news because it means that our naming and routing architecture can be greatly simplififed:

  • All relay rules for external facing domains that have previously required this "split-horizion" DNS can be removed, returning the DNS server configurations to a generic state
  • All crazy static routes for external IP addresses can be removed from our core router
  • All external facing domain zones can be removed from the internal DNS servers, and updates when things are moved only have to be done in one place

The only penalty for this is adding in the alias lines to our ASA configuration for each existing static mapping that we have, as well as adding an alias line for each server that needs to communicate with the external IP addresses of things behind the same ASA which should be limited to the internal DNS servers and a few application servers.

References

EDIT: Another way to do this

After sharing this with some coworkers, it turns out that 'hairpinning' is definitely the key word and one of them stumbled across this article:

Setup U-Turn (Hairpinning) on Cisco ASA

It solves the same issue with a slightly more graceful solution because no alias entries are needed for non-public services, in fact, no aliases are needed at all. To have the exact same functionality as above, here is the working configuration for the problem above with this new methodology:

!--- Output suppressed.
!
interface Ethernet0/0
 nameif outside
 security-level 0
 ip address 192.168.0.192 255.255.255.0 
!
interface Ethernet0/1
 nameif inside
 security-level 100
 ip address 172.16.0.1 255.255.0.0 
! 
!--- Output suppressed.
!
same-security-traffic permit intra-interface
access-list outside_in extended permit icmp any any 
access-list outside_in extended permit tcp any any 
!
!--- Output suppressed.
!
global (outside) 1 interface
global (inside) 1 interface
nat (inside) 1 172.16.0.0 255.255.0.0
static (inside,outside) 192.168.0.193 172.16.0.10 netmask 255.255.255.255 
static (inside,inside) 192.168.0.193 172.16.0.10 netmask 255.255.255.255 
access-group outside_in in interface outside
!
!--- Output suppressed.

Windows 7 64-bit on a MacBook Pro

I have a MacBook Pro for work, and have Windows on it in a separate partition for the occasional thing that requires it. I had to get it's logic board replaced and for various reasons it made more sense to reinstall OS X and Windows once everything was replaced and back to me. Mac OS X was easy, but Windows was a bit more of a pain because while I have several legal copies of XP, i've lost track of which license keys go with which ISOs and Microsoft wants to call me to activate things. Thankfully, this is a work computer and work as a Microsoft volume license plan so I grabbed a 64-bit Windows 7 Professional ISO and Key, and gave that a shot. First problem, the DVD wouldn't boot: it instead presented "Select CD-Rom Boot Type" and 2 options, and the keyboard was unresponsive.

After a bit of Googling around, the following is the way to get around this, assuming you have another Windows system or VM available:

  1. Grab a copy of oscdimg.exe (heres one) to somewhere like C:\
  2. Put the DVD you already burned into your Windows box
  3. Run: oscdimg -n -m -bd:\boot\etfsboot.com d:\ c:\win7×64.iso
  4. Burn this ISO to DVD
  5. If Needed, use Boot Camp Assistant in Mac OS X to partition your disk for Windows
  6. Boot up with this DVD (hold down C when booting, use the Boot Camp Assistant to start it up, etc) and Windows 7 will install!

Once it's up and running and you try and install the BootCamp drivers from a Snow Leopard DVD, you'll run into some more things that try and stop you. The fix for them is to manually run the 64-bit driver installer in Administrator mode. One way to do this:

  1. Open up an Admin mode console by right-clicking on "Command Prompt" in start -> all programs -> accessories, and clicking "Run as administrator"
  2. type in "cd d:/Drivers/Apple" to change to the Drivers/Apple folder on the DVD
  3. type in "msi.exe BootCamp64.msi" to launch the 64-bit installer, and it will install and do it's thing

After those few steps and not too much time, you'll be up and running in Windows 7 on your shiny Mac laptop. Some things I've noticed:

  • It's faster feeling than Windows XP
  • It's prettier than Windows XP
  • It feels like it wakes and sleeps faster than OS X
  • Steam and Team Fortress 2 work and seem faster than Windows XP
  • The Internet works.

Growing a Software RAID Array in Linux

I've been burned too many times by hardware failures, so I keep all of my data on a pair of hard drives in my desktop mirrored with RAID-1. (Note that I'm not relying on this for backup, I have backups onsite and off, but it takes a long time to restore 100s of GB of data. This mirror is to prevent downtime when hard drives fail.) I started off with 2 300G drives several years ago, and set up a new mirror around 3 years ago with 2 750G drives, but the photos I take and videos I record add up pretty quickly. It was almost time to scale up some more when I stumbled across a great deal online and got 2 1.5T drives for about $160. Sunday, I finally got around to migrating things over and instead of building a new array and copying everything over, I tried something a little different that worked pretty well.

  1. Shut down the machine and pull out one of the 750G drives and replace it with a 1.5T drive and turn things back on. Alternatively, if you have the space, you can put all of the drives in a machine to avoid a reboot or two.
  2. Format the new drive using fdisk and use mdadm to add the new drive to the array. Replace "sdf1" with whatever the name of your new partition is and "md0" with whatever the name of your raid device is.
    # mdadm /dev/md0 --add /dev/sdf1

    If you have all the drives in the system, you'll need to fail out and remove an existing one first with something like

    # mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
  3. Run
    # watch cat /proc/mdstat

    so you can see when the array is finished rebuilding. (this took ~4 hours on my machine)

  4. Repeat the same process, replacing the other 750G with the other 1.5T drive
  5. Once both 1.5T drives are in the array and it's completely rebuilt (another 4 hours), it will still show up as a 750G RAID-1. Next, bump up the size of the array to the max size of the drives with:
    # mdadm --grow /dev/md0 --size=max
  6. Again, run
    # watch cat /proc/mdstat

    until the array is finished growing. (another 4 hours)

  7. Finally, it's time to resize the filesystem. This may be different if you're using an old version of Linux or a filesystem other than ext3, but on ext3 it's very straightforward and the filesystem can actually be resized while it is mounted and in use:
    # resize2fs -p /dev/md0

    That will grow the filesystem to fill the whole array and when this finishes in about 15 minutes you'll have a shiny new RAID-1 with a lot more space:

    # df -h /dev/md0
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md0              1.4T  607G  699G  47% /mnt/storage

Drupal Upgrades

I manage a few Drupal sites, and have worked out two simple systems for making upgrades as painless as possible, one using diff and patch, the other using subversion. The two examples I'll use here don't really have customized code outside of .htaccess files and things in the sites/ folder, so these processes may require some tinkering if you've done core level customization. Hopefully these will be helpful to someone new to working with updating PHP applications, and if you're running any PHP applications you need to stay on top of security updates to protect your site and your user's information! Upgrading both of these sites today took me less than 10 minutes to do, but if you're attempting your first upgrade, you should understand what is going on and maybe even read through the whole Introduction to upgrading Drupal guide. I definitely leave out some of the recommended steps.

ckdake.com - no revision control

This site (ckdake.com) is just my personal things and could be restored from backup pretty easily without too much trouble, and I'm the main user so it's not a big deal if it's down for a few hours. Drupal 6.12 came out and here's what I did to upgrade:

cd /var/www/ckdake.com/
wget http://ftp.drupal.org/files/projects/drupal-6.12.tar.gz
tar -zxf drupal-6.12.tar.gz 
diff -uF^f -r htdocs drupal-6.12 > upgrade.patch
vim upgrade.patch
patch -p0 --dry-run < upgrade.patch 
patch -p0 < upgrade.patch

And that's it! In vim, I removed patch entries that did things I didn't want (like overwriting custom things I have in my .htaccess file), and I also looked at the output of the dry run of patch to make sure it looked right. After doing those things I visited ckdake.com/update.php and walked through Drupal's update wizard (which didn't make any changes this go around). Perhaps I should have done a mysqldump to backup my database first, but I have nightly backups of the database and in several years of upgrading Drupal this way have never had a problem updating minor versions this way. There was no downtime for my site since this all worked right (as it usually does) but I've broken things before. And keep in mind, a 5.x to 6.x upgrade is not going to be this easy...

fastermustache.org - revision control

fastermustache.org is used by a lot more people and I try to avoid any downtime there, so this process is a bit more involved and lets me roll back changes a lot more easily. Here's the sequence for that:

cd /var/www/test.fastermustache.org/
mv htdocs drupal-6.12
wget http://ftp.drupal.org/files/projects/drupal-6.12.tar.gz
tar -zxf drupal-6.12.tar.gz
mv drupal-6.12 htdocs
cd htdocs
svn st

I now have a list of all changes for this release, and go through resolving things as needed. `svn st | grep "?"` gives me a list of files that need to be added, and I grep on other svn status codes to verify other changes. Once everything looks good, I update the live site:

svn commit -m "drupal-6.12 upgrade"
cd /var/www/fastermustache.org/htdocs
svn up

And then visit fastermustache.org/update.php to do any database changes required. This process also works for module updating:

cd /var/www/test.fastermustache.org/htdocs/sites/default/modules/
wget http://ftp.drupal.org/files/projects/date-6.x-2.2.tar.gz
tar -zxf date-6.x-2.2.tar.gz
rm date-6.x-2.2.tar.gz
svn st
# resolve all the issues as described above
svn commit -m "date-6.x-2.2 upgrade"
cd /var/www/fastermustache.org/htdocs/sites/default/modules/
svn up

And then one more visit to fastermustache.org/update.php and any needed database changes are made. Again, it would be a really good idea to do mysql database backups before running `svn up` on the live site, but I trust Drupal to treat me well for these sorts of updates. Perhaps once something breaks horribly, I'll post some steps on how to recover from problems.

Priming a MySQL slave in an OpenVZ container using binlogs

At work, I've been migrating a bunch of our internal infrastructure from VMware ESX virtual machines to OpenVZ containers. There is an obvious cost difference here ($$$ to free), but a bigger benefit is that an OpenVZ host can be managed the same way as any of our CentOS servers. Additionally it's easier to tune resources for containers which allows us to fit more services on less hardware and move things around without any service outages. I've written about OpenVZ a little bit before in "VLANs in OpenVZ", but since then things have been pretty straightforward. Aside from live migration not working with that vlan setup, what follows is a the first gotcha I ran into involving OpenVZ and some MySQL trickery.

We use MySQL replication to OpenVZ containers to give us a place to perform backups where the MySQL backups won't affect performance of our servers. The last one of the VMware machines to move was for a pretty busy cluster with several thousand databases. The usual methodology for VMware -> OpenVZ migration was:

  1. Create a new OpenVZ container using the provisioning system, and install MySQL RPMs identical to those on the VMware vm
  2. Stop MySQL on the vm
  3. Copy /var/lib/mysql and /etc/my.cnf from the vm to the container
  4. Change MySQL GRANTs on the master to point to the new slave
  5. Start up MySQL on the container
  6. verify that things are working and then delete the VMware vm

However, for reasons unknown to me, I just couldn't get the files off of the VMware vm. This was the last vm on this particular ESX host, nothing else was running on it, and all the performance numbers looked normal, but I couldn't create a .tgz on the local filesystem, rsync to nfs would stall and timeout, scp to another machine would stall, and any combination of the above things wouldn't work. Frustrating. I'd have to wait until a maintenance window to stop the master to get the data from it (InnoDB doesn't like direct file copies and running mysqldump on all the databases with everything locked to preserve the master_log_pos would cause an outage), so I decided to try something new.

There is a command called LOAD DATA FROM MASTER which is a great idea, but the implementation is pretty busted (and depreciated). Thankfully, this particular master database has all the binlogs it's ever created which allowed me to do something like "LOAD DATA FROM MASTER" but in a way that works, so here is what I did for this server:

  1. Create a new OpenVZ container using the provisioning system, and install MySQL
  2. Set up /etc/my.cnf in a way that works for the memory allocated to this container and the data we have, as well as being a replication slave
  3. Add MySQL GRANTs on the master pointing to the new slave
  4. Start up MySQL on the container
  5. Run "CHANGE MASTER TO" on the container to set the master_host, master_user, master_password. Also set master_log_file to the name of the first log file on the master, and master_log_pos=0
  6. Run "SLAVE START" on the container
  7. Use "SHOW SLAVE STATUS" to watch as data trickles in!

This worked pretty well, and MySQL data copied from the master at ~20Mb/s and loaded into the slave a bit slower than that. It'll probably take this a day or two to fully catch up, but no downtime for the master is a good thing, and we can still run nightly backups from the VMware container until this new slave is caught up.

However, I started getting a "Disk quota exceeded" error that would take out MySQL. Strange, `df` reported that / still had plenty of space, and there weren't any mentions of disk in /proc/user_beancounters. After a bit of poking around and trying things, it turns out that OpenVZ has limitations on /dev/simfs relating to disk usage that are not visible in /proc/user_beancounters. The easiest way to check in on these is by running `vzquota stat ${CID}` on the host the container is running on to get something like:

[root@vm-host /]#vzquota stat ${CID}
   resource          usage       softlimit      hardlimit    grace
  1k-blocks        8918860       209715200      209715200         
     inodes          67775          200000         220000  

The 1k-blocks is associated with blocks on /, and inodes is associated with the inodes for /. Inside the container, you can see this with `df`:

[root@container /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/simfs            200G   41G  160G  21% /
none                  3.9G  4.0K  3.9G   1% /dev
[root@container /]# df -ih
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/simfs               20M    236K     19M    2% /
none                    997K      94    997K    1% /dev

The above is from after I trashed the first container and started over, and watched usage there slowly creep up. To get around this limitation, like any other openvz limit, you can run vzctl to bump things up. The relevant thing to increase here is 'diskinodes', which I did by a factor of 100:

vzctl set ${CID} --diskinodes $((200000 * 100)):$((220000 * 100)) --save

Looking in /proc/user_beancounters on the container, it looked like tcprcvbuf was maxing out as well so I bumped that up, and replication has been humming along. 405 days worth of data to parse through! If you're running MySQL servers and might need to ever change your replication topology, hold on to your binlogs if you have space and they can save you from having to take down your master for changes.