Programming
GPX GPS trace files and elevation gain
Submitted by ckdake on Mon, 2008-06-23 12:07I carry a GPS with me on long bike rides and pull the resulting trace into Google Earth and Garmin's MapSource software. Google Earth is nice for looking at, but doesn't provide much useful information, and MapSource is pretty awful to look at (and will only run in Windows so I have to boot up VMware) but does provide elevation maps (as well as the ability to load maps). I recently started using a bike computer with cadence, and a heart rate monitor, and the last missing piece of information was total elevation gain over a ride. This information is nowhere in MapSource or Google Earth.
I can get GPX format (The standard interchangable format for GPS information) files out of MapSource and it's just XML, so after trying several tools online and several programs I downloaded that didn't work, I wrote a quick python script to get me the info I want. Hopefully this will help someone else:
from xml.dom import minidom
file = minidom.parse('./file.gpx')
min = 1000000
max = 0
gain = 0
loss = 0
last = 0
for node in file.getElementsByTagName("ele"):
cur = float(node.childNodes[0].data)
if (cur > max):
max = cur
if (cur < min):
min = cur
if (last != 0):
if (cur > last):
gain = gain + (cur - last)
elif (cur < last):
loss = loss + (last - cur)
last = cur
print "max: %.2fft" % (float(max * 3.2808399))
print "min: %.2fft" % (float(min * 3.2808399))
print "gain: %.2fft" % (float(gain * 3.2808399))
print "loss: %.2fft" % (float(loss * 3.2808399))
So for my 43 mile ride on sunday:
max: 1110.63ft
min: 773.16ft
gain: 3328.98ft
loss: 3232.78ft
Getting those numbers were a lot harder than it should have been! Good ride though..
Griffin PowerMate and Rhythmbox
Submitted by ckdake on Tue, 2008-06-03 17:25I was going through some drawers and stumbled across my good old Griffin PowerMate that I got back before I started using Linux. It controlled iTunes in Mac OS 10.1 and was great because I could change volume and pause music without having to change programs or anything. These days I use Rhythmbox in Linux to listen to music and theres not a plugin for it. Yet!
Rhythmbox supports plugins written in python, a guy has some skeleton python code for talking to the powermate, and that means something could work out!
I got the powermate working by compiling and loading the powermate module for 2.6 linux kernels (In 2.6.23 it's in Device Drivers -> Input device support -> Miscellaneous devices -> Griffin PowerMate and Contour Jog support), adding a udev.d entry:
# /etc/udev/rules.d/45-powermate.rules
KERNEL=="event*", SYSFS{product}=="Griffin PowerMate", NAME="powermate", GROUP="users", MODE="0660"
I plugged it in, catted /dev/powermate, and with each twist or push it spit out garbage to the screen. Success!
A quick glance through everything shows that Rhythmbox doesn't support threads and the python code here uses polling so I'd need to delve into the Rhythmbox docs to figure out the best way to do that, but Rhythmbox also exposes itself through DBus and there are some examples of using this around the internet. In a few minutes, I hacked together something dirty to cover the basics and perhaps later on I'll make something that works as a Rhythmbox module. Right now pushing the button is play/pause, turning it adjusts the volume, and the LED shows volume when playing and pulses slowly when paused. Here ya go:
#!/usr/bin/python
import powermate
import dbus
EVENT_BUTTON_PRESS = 1
EVENT_RELATIVE_MOTION = 2
DBUS_START_REPLY_SUCCESS = 1
DBUS_START_REPLY_ALREADY_RUNNING = 2
bus = dbus.SessionBus()
(success, status) = bus.start_service_by_name('org.gnome.Rhythmbox')
proxy_obj = bus.get_object('org.gnome.Rhythmbox', '/org/gnome/Rhythmbox/Player')
player = dbus.Interface(proxy_obj, 'org.gnome.Rhythmbox.Player')
pm = powermate.PowerMate("/dev/powermate")
while 1:
event = pm.WaitForEvent(-1)
if (event[2] == EVENT_BUTTON_PRESS and event[4] == 0):
player.playPause(1)
if player.getPlaying():
pm.SetLEDState((int)(player.getVolume() * 255), 0, 0, 0, 0)
else:
pm.SetLEDState(255, 252, 1, 1, 1);
elif (event[2] == EVENT_RELATIVE_MOTION and player.getPlaying()):
player.setVolumeRelative(event[4] * 0.02)
pm.SetLEDState((int)(player.getVolume() * 255), 0, 0, 0, 0)
Download powermate.py and the code above, save the code above as whatever.py, run it, and you'll be able to control rhythmbox with your PowerMate in Linux!
F-Spot EXIF information mangling
Submitted by ckdake on Tue, 2008-05-13 09:56I use F-Spot to manage my photographs. It's fast, clean, simple, and does everything in my current workflow which is JPG on camera -> YYYY/MM/DD folders -> Gallery on my website. Once I start shooting RAW it will get a little more complicated, but F-Spot keeps moving forward so hopefully they'll come up with a plan for that.
When uploading images to Gallery, I noticed that my photo timestamps were off. Conveniently, there was a discussion about this on the F-Spot mailing list at the same time and it turns out that every time you import an image in F-Spot, it adjusts the EXIF Timestamp information based on your timezone. Basically, if you're 5 hours away from GMT, on import F-Spot writes to the file that the image was taken 5 hours later than it actually was. Not only does it do this once, but if you re-import images into F-Spot for whatever reason it does this again, again, and again.
This was a bit of a surprise because EXIF information written by the camera shouldn't be changed by an import program! I thought I'd lost all the actual capture date/times of my ~30,000 photos, and was getting pretty upset that software would do this, but after digging through EXIF headers from all the cameras I've had, it turns out that the "DateTimeOriginal" was still good! I disabled F-Spots ability to write metadata to files (which means I'll have to stop tagging images until this is all resolved upstream) and wrote a little script to fix my files. If you've run into this and would like your original EXIF information back so that photos taken on New Years Eve as the year ticks over aren't at some hour after sunrise on Jan 1st, use this! Just replace $directory with the path to your photo library, store it to a file named "fixer.pl" and run "perl fixer.pl". Note that you'll need find and jhead installed.
EDIT: Note! I looked at this again with my 40D and new version of f-spot. It seems that now the correct EXIF header is "DateTimeDigitized" and _NOT_ "DateTimeOriginal". Please verify things on your setup before running this random script you found on the internet!
#!/usr/bin/perl -w
use strict;
my $directory = "/media/photos/";
my %opts;
my @files;
@files = `find $directory -type f -iregex \'.*\\.\\(jpg\\|jpeg\\)\'`;
foreach my $file (@files) {
chomp $file;
my $dateline = `jhead -v "$file" | grep DateTimeOriginal`;
if (defined($dateline)) {
$dateline =~ /.*\"(.*)\".*/;
my $date = $1;
if (defined($date)) {
$date =~ s/ /-/g;
system("jhead -ts$date \"$file\"");
system("jhead -ft \"$file\"");
}
}
}
nice video cards are loud. sometimes.
Submitted by ckdake on Tue, 2008-02-05 17:05Last summer while in California I ordered a Dell Precision 390 workstation with a nVidia QuadroFX 3450because I wanted to be able to run 2 24" monitors on DVI (which is my current work setup at home). It's always been a little loud but I figured I would worry about it when getting back to Atlanta. School was busy and I never got around to it, but a few weeks ago when all was quiet in my new house, I could hear the frigging thing running. I tinkered with things and turns out the culprit was the fan on the video card. Lame!
I tried changing versions of the nvidia-drivers I'm using in Linux because there was some bug with a newer version that pegs the fanspeed on the card to 100%, but nothing helped. Then I stumbled across nvclock. Trying to set my fanspeed to auto resulted in "Error: This card doesn't support automatic fanspeed adjustments." Doh! It ran much quieter at 20% fan speed, but I didn't want to damage my video card if I was using it heavily for something. Enter a little bash scripting and problem is solved. Cron runs the following script every minute and adjusts the fan speed to keep the temperature below a certain threshold. It's not perfect, but I think it's good enough and it sure beats the fan running at 50% all the time!
With this, my fan speed bounces between 20 and 30 with a target temp of 57C which seems about right? I think I could run it a few degrees hotter with no problem but couldn't find any documentation on the card limits.
#!/bin/bash
target=57
minspeed=19
temp=`nvclock -i | grep "GPU temp" | cut -d':' -f2`
temp=${temp/C/ }
fanspeed=`nvclock -i | grep "Fanspeed" | cut -d':' -f2`
fanspeed=${fanspeed/\%/ }
fanspeed=`echo $fanspeed | sed -n 's/^\(..\).*/\1/p'`
if [ $temp -gt $target ]; then
if [ $fanspeed -lt 91 ]; then
nvclock -f -F +10 > /dev/null
fi
elif [ $temp -lt $target ]; then
if [ $fanspeed -gt $minspeed ]; then
nvclock -f -F -10 > /dev/null
fi
fi
Note: As of time of writing, you will need to get a snapshot of nvclock other than the latest release for it to work with these flags and this video card. If you're using Gentoo, this means using ~x86.
Python and Graphs
Submitted by ckdake on Sat, 2007-06-16 15:47Apparently people are reading this because I got enough comments about me not having updated all week and me having "stopped blogging" that I'm here writing something about something.
So a while ago I wrote a little tool to parse my log files from Gaim and make a pretty graph: pretty talking graph. It shows how many lines I've talked to people on AIM in all of my log files, was a bit of a pain to write, and now that it's been around a few years, I look in the code and wonder what some parts are doing. Plus, I wanted newer, cooler, graphs and something that would play nicely with other programs (like Adium on the Mac), and I've been doing a lot of Python at work, so it was time for a rewrite.
So I wrote a Python tool that crawls all of my logs and puts them in a database, and a graphing tool that does things on the database and spits out html for all kinds of graphs. It runs automatically every night and makes a page with all the details: pretty talking graph page. Everything there is done with HTML/CSS (Thanks to the trends graphs on Google Reader for inspiration) and I think it's pretty cool. More graphs in the future (like rates of change, and predictions for when people will pass other people, etc)! I'd also like to figure out some way to graph some e-mail things... We'll see what I can come up with.
Threads in Python
Submitted by ckdake on Thu, 2007-05-31 23:30I was doing some tinkering around at work in Python to familiarize myself with the way Python works on threads. Why threads? I'm working on something that needs to do lots of little independent things all at the same time and I figured Python would be a better language to use this summer than C++ because I've done so much Perl programming recently (and I had a heck of a lot of trouble just getting a simple "hello world" program to compile and run in C++ with the Google build system!). I hacked out some toy Python code to see how well the thing performed and all I could manage was 100% of one core on one cpu. Uh... With other languages on other boxes I'd been able to peg all the cores on all the CPUs, so it was time for some research. Apparently, Python has a massive lock in the interpreter called the GIL (Global Interpreter Lock). This lock is because not all of python is thread-safe and there are bad things that can happen when multiple threads try to access something non-thread-safe at the same time. The effect of this lock is that, even when using threads, Python is only really doing one thing at a time and can thus only use (the equivalent of) 1 CPU.
At first I was pretty annoyed by this because it sounds ridiculous for a modern programming language to have such a limitation, but after some reading around online I've come to a different conclusion. An e-mail on a mailing list by Guido (The creator of Python and a fellow Googler) got me thinking that threads might not actually be the best way to do things. Each thread has overhead of data structures and with each thread you run into more context switches that are required for your program to run for some amount of time. With thousands of threads I'd be wasting all kinds of CPU cycles! As multi-CPU and multi-core machines are becoming more and more dominant, programmers (like me) need to think about more effective ways to make use of the resources available to them. In my particular situation, running several separate processes that communicate with IPCs and/or shared memory makes a lot more sense. Each process can handle some portion of the independent actions, but can do them serially per "cycle" so things get done in the same amount of clock, but the CPU isn't trying to do thousands of things at the exact same time (and multiple CPUs can be used so more clock time can be used in less wall time). For very simple operations, this saves all kinds of CPU, but it's still useful for more complicated operations.
Python has been discussed as being a lot like Perl but only having one right way to do things, and the consequences of the GIL is an example of that. The right way to use up all the CPU on a machine is to do something other than threads! This should be pretty neat and gives me reason to learn about some parts of Python that are completely new to me. (I never did any IPC/shared memory in Perl or C so this will be completely new!)
Apparently there was some work done on removing the GIL but because of the much finer granularity of locks, it slowed Python down up to 2x on single-CPU machines. Ouch! This was a problem with the first OSs to support SMP, and they got around it by shipping a thread-safe OS and a threadless OS separately. They could do this with Python but it would me a lot more maintenance overhead for the language and it gets messy real quick when people accidentally mix thread-safe and threadless code together. Google around for "GIL" if you're interested in this. Theres a lot more that I read and found interesting!
Python n00b
Submitted by ckdake on Mon, 2007-05-21 23:02Working at Google, I've had to start using Python. It's their scripting language of choice and even though all of my scripting experience is in Bash, Perl, and PHP, too bad! Using a new language isn't too bad because languages are just languages and once you know a couple you know them all, and Python isn't really any exception. I've had to search around for the basics like string operators, objects, and so on, but it's not to bad. I've probably written about 500 lines so far and things are doing what they are supposed to do. However, today I ran into something that stumped me for a while.
My current project had me making a tree like data structure (below is an example of something similar). Think boxes containing some number of smaller boxes which contained some number of smaller boxes, etc, which finally contain some number of tiny items in the smallest box. I set up all the classes to do this as well as methods to build test ones of arbitrary size and run some algorithms on it, but something weird was going on.
class Box:
items = []
a = Box()
for b_label in range(2):
b = Box()
a.items.append(b)
for c_label in range(2):
c = Box()
b.items.append(c)
for d_label in range(2):
d = Box()
c.items.append(d)
for each in a.items:
print '%s' % eachI expected this to print out the addresses of the two boxes in box a, however, it prints a list of all 14 boxes. (further exploring the data structure created shows me that each of the boxes at each level of the tree has each other box in it, totaling 2744 boxes at the inner most level. Uhm... there should only be 14 boxes! I tried all kinds of things, using a dictionary instead of an array for items, using __new__ and __init__ constructor code in the object, including a random string in the name of each object, including a random number in the constructor of the object that it stored to guarantee uniqueness, etc. But nothing worked! After spending too much time on this, I asked my Host (Google's word for the person interns work for) and she said it looked fine but to ask someone that knew Python better. Voila! The solution was so simple it would have been almost impossible to find searching for it. My usage of items was as a class variable. Even though it was inside the class, it acted as a singleton variable across all instances of the box object and was thus shared in each box. Every time anything was added to any box, it was added to all boxes instead of just it's parent box. The solution: declare the items variable in the constructor instead of the class so it would be scoped to the instance of the object:
class Box:
def __init__(self):
self._items = []Problem solved! So if you're new to python and playing with things, make sure you scope everything properly! According to the guy that helped me out, he's pretty solid at python and still makes this mistake every now and then and it's very hard to track down.
TFStat - Traffic Flow Statistics
Submitted by ckdake on Tue, 2007-05-01 09:09Another class, another group project. For CS7260 - Internet Architectures & Protocols, Chris Lewis and I worked together again to build what we think is a pretty neat networking tool. The full details are available on a page here: tfstat, but the abstract is copied below. This was a pretty fun project to work on. I got to learn how to use libpcap and did a lot of multithreading with persistent shared objects in perl, and I'll likely continue work on this in the fall to submit to a conference. And having pretty graphs as results is always fun!
Traditionally, researchers who wish to look at traffic flow have one option – Netflow. However, Netflow only allows researchers to get a limited view of the big pictures as the data comes from core routers – it is very difficult to get a view of the network from an end user's perspective. TFStat is a set of tools that solve this problem. TFStat allows researchers to get such data from the vantage point of the end user, and has the added benefit of being compatible with Netflow. This paper will present the 2 major components of TFStat, discuss the implementation, look at some experimental results running the tools, and finally note some areas for future research to expand on the project.

