BroadRiver and SiteSouth

My old server was dying from the increasing popularity of Faster Mustache and some other projects I was working on. It was in the basement of a friend's dad's office and the 6m/768k DSL pipe was getting a bit full sometimes too. That combined with having to call someone working there to reboot it every 6 hours as it crashed under this load meant it was new server time. The old 700mhz P3/384mb ram server's replacement was a new BOLData dual 2Ghz Opteron with 2gb of ram. It came in, I set it up, and it turned out to be too loud with all its cooling fans to go in the office. This mean the inevitable install-server-in-colo time had come and I did some research.

I ended up picking SiteSouth. They were fairly cheap and at 55 Marietta Street which is the #3 location in the United States in terms of lit fiber bandwidth going there. They didn't allow IRC because of the usual nasty stuff that happens on IRC so I couldn't use my server as a proxy to keep me connected to Freenode to work on Gallery as I had hoped but that was okay. I called them to place my order and was directed to their website to fill out an order form. After filling it out I had to call to verify that it worked and they had to make sure my credit card went through before letting me come down to install the thing. Understandable perhaps, but this left me with a bad gut feeling and would be the first indication of all the problems I was about to have. NOTE: Keep in mind that this is just my experiences with SiteSouth. Your's may certainly vary, and though I'm attempting to stick to the facts of what happened, all of this should be taken as my opinion and my opinion alone.

So later that day I drove over to install the server. Up the elevator and through the clean looking GNAX cages (GNAX is the provider that SiteSouth purchases from) I got to the SiteSouth cage in the back. It has a bunch of locking server rack cabinets and one was opened up to put my server in. The inside of the cabinets were a mess of wires and precariously perched pieces of networking equipment. We pulled my server out of the box and since I had never put sliding rails on a server or in a rack before I let their guy do this. He screwed them onto the server and couldn't get it into the rack so he gave up and my server ended up sitting on top of another server with a piece of PCV pipe keeping it from sagging in the back.

I put in the IP information using the keyboard/monitor/mouse on a crash cart, and made sure I could connect from an outside machine. Everything seemed fine so I headed back home and when I got home 20 minutes later I couldn't connect. Ping was working but nothing else. I called for a reboot and they said it would be taken care of. 15 minutes later it still had not been rebooted. I called again and got no answer but the server was back up 5 minutes later. I called again and left a message asking what happened so I could keep it from crashing again and they never called me back.

I was told that I'd get an email with MRTG and reboot switch login information but after 2 days this still had not shown up. I called again to get this information and finally got the email. The reboot switch worked perfectly but MRTG was weird. The rest of my dealings with SiteSouth were in a series of support tickets.

Ticket #1: 2005.11.23: The MRTG graphs didn't look like mine. They asked for my root password and were not of any help, this issue was later resolved during work on ticket #2.

Ticket#2: 2005.11.23: My server was receiving a lot of traffic that I didn't think it should be receiving:

  • Lots of arp traffic for the same Ips from a cisco router (arp traffic should stop after it gets an IP/arp match or it times out)
  • Requests for things on 192.168 IP addresses (These shouldn't be routable through a router so a machine on the same switch as me was most likely misconfigured)
  • Samba file sharing stuff (Should not exist in a hosting environment like this at all given how insecure it is)
  • Windows Server 2003 network load balancing heartbeat packets (Ok to exist but should not be delivered to my server because I was not running w2k3 and not part of the load balancing cluster.)

All of this seemed to me like a misconfigured router or switch. Their response was that this was from their IDS/IPS system. I asked if they knew of a way to filter this traffic from my graphs and they responded with: "There is no way to filter the traffic as the size will vary depending on information or events. Also, if you do block the traffic we would have to take you out of the IDS system. Two things would could happen, should you be attacked or hacked there would be no way for us to stop or track it. We would also have to block all traffic from your IPs to protect the internal network.” That seemed a bit harsh to me and I responded with that I wouldn't be blocking the traffic, just wanted to know if they could help me get it to not show up in my MRTG graphs and I gave up and closed the ticket.

Ticket #3 2005.11.27: My server notified me that it was under attack and when I checked it was a Nessus scan from one of the GNAX machines. I put in a ticket asking for details about the scan and they replied asking for the ticket where I asked them to run the scan. Uh... Okay. It's apparently their policy to run scans and thats fine with me now that I new what to look for, but two strange things were happening:

  • Eicar test virus signaures. Theres no way they could see the result of these tests so why would they run them? I get alerts for all viruses that hit my box and it was just annoying.
  • My network link went up and down several times. No way that software on a remote server should be able to do that. They replied to this with “We have no records of anyone physically accessing the rack in the last 7 days. This may be something that the IDS system does, though I have no ideal how it would do that. When do you show the NICs were unplugged?” And I sent back my log:
    • Nov 27 03:11:03 pudge tg3: eth0: Link is down.
    • Nov 27 03:11:05 pudge tg3: eth0: Link is up at 100 Mbps, full duplex.
    • Nov 27 03:11:05 pudge tg3: eth0: Flow control is off for TX and off for RX.
    • Nov 27 03:11:24 pudge tg3: eth0: Link is down.
    • Nov 27 03:11:26 pudge tg3: eth0: Link is up at 100 Mbps, full duplex.
    • Nov 27 03:11:26 pudge tg3: eth0: Flow control is off for TX and off for RX.
    • Nov 27 03:11:40 pudge tg3: eth0: Link is down.
    • Nov 27 03:11:45 pudge tg3: eth0: Link is up at 100 Mbps, full duplex.
    • Nov 27 03:11:45 pudge tg3: eth0: Flow control is off for TX and off for RX.
    • Nov 27 03:11:52 pudge tg3: eth0: Link is down.
    • Nov 27 03:11:54 pudge tg3: eth0: Link is up at 10 Mbps, full duplex.
    • Nov 27 03:11:54 pudge tg3: eth0: Flow control is off for TX and off for RX.

They replied with they were escalating the request. A week later I asked for an update and they said "We are sorry we thought this had been answered, You probably saw the network stitches going on and off while the mrtg and other monitoring software was being restarted or updated." MRTG and other monitoring software should be running on a monitoring the box as it only queries the switches with SNMP which almost never gets updates. I figured this was all just a one time thing and they were giving a switch a firmware update or something so I closed the ticket and let it slide.

Ticket #4: 2005.12.11 The NIC went down in the same pattern again after a Nessus scan and I opened another ticket, as these two events should be unrelated. They replied with "We are having a deep investigation into your issue and will update you soon." and then "As stated earlier, this is part of our intrusion detection and security system and analysis. It is network wide and can not, nor would it be, turned off on a client by client basis.I'm sorry you seem to be having an issue with it. It is required of all our clients." I asked "Can you share any more details about how/why the IDS is doing this? I've been a security engineer for a network security company for several years and have used a very large number of IDS/IPS/vulnerability scanning type systems and have never seen one that brought down links on a switch as part of its activities." to which they responded with: "sorry, but the information is confidential. The product is proprietary. It was developed by SiteSouth and outside development companies. In additon to our own system we also run two SNORT based systems and various firewall systems." SNORT cannot unplug links like this and firewalls can't unplug links either, and thats was basically a “We aren't going to tell you anything more. Deal with it” kind of answer so I rescheduled my scheduled tasks that had now been interrupted twice to run at a different time of day.

Ticket #5: 2005.12.15: Huge ammounts of traffic started pouring in and tcpdump indicated that I was getting 8mb/s in of PPPoE PADI [Service-Name] [Host-Uniq 0x80AF43DC] [EOL] packets. They appeard to be PPPoE discovery packets from 00:02:17:64:78:c5 getting sent to ff:ff:ff:ff:ff:ff. I put in a ticket and they quickly responded with "It looks like you may have been a victim of a smurf attack. We have nulled the incoming IP. Please verify it has stopped at this point." Hooray for their fast response, the traffic was gone but I couldn't find any further information online that this was indicative of a smurf attack. Traffic sent to ff:ff:ff:ff:ff:ff shouldn't be getting to my machine unless it came from one of the other machines on that same switch anyways. Another one that I closed and let slide.

Ticket #6: 2005.12.24: My server was rebooted around 2:30 pm. I put in a fairly to the point ticket as this should not happen: "My server was forcibly rebooted today at 2:30pm est. Someone held down the power button for 10 seconds or pulled the plug. I did not request, authorize, or perform this reboot. Why did this happen?" and they responded with "we are going to offer you a refund of $50.00 for the original installation of your server and that you seek a new host before 1/15/2006 as we no longer wish to offer our hosting services." What? This did not make me any less upset and I responded that I would find another host and would call them when I was ready to move my server.

I was pretty upset about this as they gave no explanation for why my server might have been rebooted. I had to go in and rebuild some MySQL databases and repair some log files that got corrupted when it was turned off that prevented the web and database servers from starting back up, and instead of giving me a reason for the hours I had to spend on the day before Christmas to get my equipment working, they canceled my service.

I then emailed BroadRiver as I had gotten quotes from them before. They are further away from my house so the drive/bike over there in emergencies is longer but their prices are comparable. It was a holiday so I got an auto response to give them a call when they got back in for sales and I gave them a call a few days later. Their sales guy emailed me all the forms to print out and sign to bring with me and waived my installation fee, perhaps after hearing my story about SiteSouth. I picked up my server from SiteSouth that day and took it to BroadRiver to install.

I got there earlier than expected, and when I showed up they were pulling a new cat5 drop from the router to the place in the rack for my server. It was a very clean data center with raised flooring and a spot was waiting for me in a rack with power+keyboard+mouse+monitor+ethernet cables all right there. With one look at my sliding rails, the guy said that they probably wouldn't work because they were missing some screws but would try anyways. They seemed flimsy so he went off looking for some other rails for me to use. They were out of spare rails and were unscrewing a shelf for me to use when I found some screws for the rails in the bag of extra motherboard screws that came with my server. The guy helped me assemble the rails and they went right into the rack. (Turns out the guy at SiteSouth was completely wrong about the rails, they screw together and into the rack first and the server slides on them. They do not screw into the server and then get wedged into the rack.) Apparently missing screws and bad rails is a problem that comes up often and someone else there ordered 10 sets of new spare rails while I was standing there.

We then put my server on the rails, plugged it in, and closed up the rack. Looking around, the server of a friend of mine was in the next rack over. (Gooley Photography and a Newswire box.) Should be a good sign. Instead of using a crash cart to set up and access servers, I was pointed to a console outside of the loud data center. With a comfortable chair and big monitor, I can press print-screen and then 2 numbers to connect to my servers external connections. Cool! There are a couple of these consoles and any time of day I can go in to connect to my server and make changes like I was sitting in front of it. I set up the new IP information and everything worked as it should. I then got a tour of the entire datacenter and as I was leaving I asked about payments and they said that that guy was out to lunch and I could just call back later to set it up. Cool.

It's been close to a week at BroadRiver and I've had no issues with anything. Ping times to the outside are not 3ms instead of 16ms, there is no more weird incoming traffic, and their monitoring software is interactive and amazingly thorough instead of just MRTG graphs. Their graphs also line up exactly with my MRTG graphs. I'll post back how things have gone after a few weeks.

comments powered by Disqus