Hosting

ithought. now with a website!

I've finally gotten some things racked at a new datacenter and thought this might be a good time to launch the new www.ithought.org. Everything that used to be at /hosting here has moved there, and theres a lot more new stuff! If you have hosting with me and have questions, or want to refer anyone to me for hosting or consulting stuff, thats the place to send them! If you have hosting with me and want to link to this site instead of ckdake.com, there are some images at the bottom of the new support page.

More updates to come including a far superior status page.

Scaling with PowerDNS and EveryDNS

Ah DNS, the often overlooked aspect of running websites. Many people I've spoken to bought a domain from Network Solutions, then one from GoDaddy, and maybe one or two from their web-hosting provider. Settings are all over the place, and they use the tools provided by each registrar to manage the DNS for domains purchased there. While this certainly works, it can become quite a hassle to change things around especially if you want an overview of all of your domains or need to change the IP address of a server.

Several years ago, I found out about EveryDNS which is a great free DNS hosting service. They've been very solid and while they have been down a few times from DDOS attacks at 50Mbps+, they definitely can scale better than my little rack of servers. I donated some money to them and currently have about 60 domains with ~600 DNS entries total hosted with them. With EveryDNS, all of my DNS entries are in the same place and when someone purchases hosting from me, I have them set the authoritative nameservers for their domain to the EveryDNS nameservers. This means that I don't need access to their account information, but I can have quick and easy access to DNS entries if I need to move anything around.

I'm preparing to make some big changes to my servers and the hassle of the point-and-click interface becomes a bit to much. ~1800 clicks or so is a lot more complicated than it needs to be! Additionally, for almost everything else I do on the internet, I prefer to own the hardware and software that my information lives on. To address both of these, I installed PowerDNS with a MySQL backend on a server, and then set up DNS replication to EveryDNS (docs on this). PowerDNS with MySQL let's me change the IP address of a server with one SQL statement instead of lots of mouse clicking, regardless of how many domains I have. This setup also allows me to include DNS configuration as part of my web hosting provisioning scripts which greatly simplifies the process of adding a new website to one of my servers. My DNS server is not listed in the authoritative servers list for domains, so the only queries that it responds to are the AXFR queries from EveryDNS. The only negative of this is that EveryDNS only checks once an hour so I can't do any tinkering with short TTLs, but thats a price I'm willing to pay for now! Hopefully they will enable DNS Notify support in the future which would allow for instantaneous updates, and if my hosting operation gets big enough, I'll just roll my own live DNS servers.

Servers and hosting

I ordered a new server today to take over as a primary web hosting server so I can clean some things up on the other two boxes. Initially I was going to use Dell but I had some issues with a recent order with them however they resolved those (more on that later) and when I was specing out things today, they had a great coupon: Buy a $2800+ server and get $850 off. Cool! My budget was $3k for this thing, and this deal enabled me to get faster CPUs for significantly less money. So Dell it is. (I'm a fan of having 1 phone number to resolve warranty issues with for 3 years at a reputable company, which Dell gives me. One of my servers is from Monarch Computer which went out of business 9 months after purchasing and it's having a problem with it's RAM or the motherboard :/)

A Dual QUad Core Intel Xeon E5410 2x6MB Cache at 2.33GHz with a 1333Mhz FSB with 8GB of 667Mhz 2GB Dual Ranked DIMMs, a PERC 6i Serial-Attach SCSI PCIe RAID controller with 256MB Cache, 2 146GB 15K RPM Serial-Attach SCSI 3.5in drives, and mounting rails is on the way! (or will be soon atleast.)

When this shows up, all my websites will move onto it (~40 at last count), and the stuff I outlined in my last post can start to happen. Exciting!

Infrastructure and Switching

Not only is it that time of year again, but due some externalities, it's time to upgrade my server setup again. Right now I have 2 1U servers installed, each doing their own thing, and bandwidth purchased by the GB. I'm switching over to a setup with initially 5U of space: 1U for a switch, 3U of servers for me (including a server I'll be purchasing in the next few weeks) and a 1U server for a friend. I'm also switching my bandwidth billing to 95% burstable billing which will allow me to push substantially more traffic for the same amount of money. My monthly cost is only a 33% increase, and the cash flow already works out that it makes sense to do.

This means a few fun things:

  • I'll finally have a new server with a new warranty that I can set up from scratch the "right way" and move sites over to it one by one. It will have all the good suexec and apc things that I've talked about on this site before, and will be set up in such a way that I feel comfortable giving my customers direct access to it. And it will 4x the number of cores and 4x the amount of memory as the current webserver.
  • When that happens, I can finally troubleshoot the faulty memory in aurora and figure out which stick or pair needs replacing, and it can be reborn as a database and mail server with twice the memory of the current db/mail server.
  • Pudge can finally get the reformatting it needs, and will come back as as the admin/stats/monitoring/backup server including acting as a live mysql slave for aurora as well as a backup mailspool.
  • A switch will be in place so I can move data around between servers without having to go through the network in the colo, and can do mysql on a different machine than the webserver without getting nervous. I'll also be able to talk snmp to everything and make more fun graphs
  • Thanks to 95% billing, I can do complete offsite backups with rsync on weekends by setting rsync's maximum usage rate to something that will keep me below my bandwidth commit rate. (Right now, I primarily just do backups of dynamic content like databases)

I'm pretty excited about the new server, and was pretty excited about the switch. Until it arrived today....

I bought a Linksys SRW2016 from Amazon for around $300. 16 Gigabit ports, full VLAN support, web access, as well as management over com, telnet, and ssh. It sounded like a cheaper solution than a Cisco switch that would allow me to be able to easily configure things. It's definitely cheaper. The telnet/com/ssh interface is completely ridiculous. First of all, the navigation is very awkward. Say like you want to log in:

  1. ssh admin@192.168.0.254
  2. press return to activate the login form
  3. type in user, down arrow, password
  4. press escape and wait 1 second to activate the menu
  5. press return to submit the form and wait 1 second for "operation successful"
  6. press any key to get to the next menu

After this there are a shocking small number of menus that let you ping, upload a new firmare, check and change port administrative and operational status, change passwords, restart the switch, and thats about it! Compared to the list of features of the switch... yeah. Nothign about QoS, VLANs, Link Aggregation, SNMP, etc. I wasn't expecting it to be amazing, but it completely lacks the ability to even associate ports with VLANs and everything takes a lot of typing and pausing. Compare this to a Cisco switch:

  1. ssh admin@192.168.0.254
  2. user, return, password, return
  3. "enable" (or just "ena"), return, password, return
  4. "configure terminal" (or just "conf t")
  5. ANY CONFIGURATION COMMAND YOU WANT

Ah well, I can use their web browser interface... so I thought. Firefox on Linux: Nope, FF on Mac OS X: Nope, Safari on Mac: Nope. "IE 5.5 or newer recomended". So I fired up Windows Vista in VMWare on the Mac and was able to get in that way. Another awkward interface with the "save" button for each screen below the scroll and hidden by having the same background color as the logo in the bottom right corner. It won't let me set the administrative VLAN to anything other than 1, relies on popups, and is a complete pain to use.

If the plan for this switch wasn't to configure it once and then drop it in place, I'd return it. If you're thinking about using this switch for any complex network or any network that ever changes, don't buy it. It took me about an hour to get all this figured out and set up how I like, and hopefully will never have to configure anything on this switch again. ~$300 for a 16 port Gigabit switch advertised as fully non blocking that supports VLANs for partitioning, it's still a pretty good deal but yeah.. There is a reason that the biggest sticker on the box and the only paperwork in the box is advertising how you can trade up this switch to a Cisco product right now and get cash back. If only the comparable Cisco switch with a usable command line interface wasn't over $1k. *sigh*

logrotate, munin, and configuration mistakes

I've got several servers and on each of them, I log things and graph things. Munin is one of my favorite graphing tools for keeping an eye on things, and I noticed something odd that was getting odder as the days went by in one of my Munin graphs:

somethingodd

System load on that server was usually low but every night at 4am when the daily scheduled jobs run, the CPU usage of those jobs was steadily getting higher. Not good! I poked around at each of the scheduled jobs on that machine and one of them was taking an unusually amount of time to run: logrotate. This seemed strange as my logs folder was still under 1GB for all the logs on that machine that sees a pretty high volume of web and mail hits, but something needed to be fixed. I ran strace on the logrotate process and it seemed to just be calling localtime thousands of times in a row which seemed to indicate it was broken or it was checking lots of files. Hrm. All the logrotate configuration files looked fine and had been working for a long time. I poked around the logs folder to see if anything unusual was happening in there and there it was. The munin log file folder was only ~100MB but had over 15,000 files in it. Munin definitely didn't need to have that many log files... They were named like "munin_node.log.1.gz.2.gz.1.gz.3.gz.1.gz" which meant that logrotate was rotating out files that it had already rotated, requiring ungzing and gzing and it's pretty obvious how this could get bad fast.

Turns out I made a typo in one of the logrotate config files. Telling it to rotate "/var/log/munin/*" instead of "/var/log/munin/*.log". Doh! It took ~10 minutes to remove all of the extraneous files (rm said too many arguments so I had to use find: `find ./ -name "*.log.*" -exec rm {} \;`) but now all seems to be well and logrotate is running in the expected amount of time. We'll see what happens at 4am tomorrow :)

Pudge turns 1.

Today, pudge.ithought.org turns 1 year old. It's lasted the longest of any of my servers without problems, probably because I bought it new and paid a lot for it and stuck it in a good colo instead of a sketchy colo or the basement of an office that floods frequently. Pudge has continued to handle the growth of the services it hosts, primarily my email, this website, and Faster Mustache. This amounts to ~30GB of traffic a month, over 1 million web server hits a month, and over 100 thousand emails a month. Things have gotten a little shaky when all the search engine bots hit at once, and this is primarily due to the swapping that occurs when the 2GB of ram get full. I'm looking at putting another 2GB in there sometime in the near future. Then it's going to be disk IO that will be the bottleneck and it'll be time for a new box with SAS drives.