I was doing some tinkering around at work in Python to familiarize myself with the way Python works on threads. Why threads? I'm working on something that needs to do lots of little independent things all at the same time and I figured Python would be a better language to use this summer than C++ because I've done so much Perl programming recently (and I had a heck of a lot of trouble just getting a simple "hello world" program to compile and run in C++ with the Google build system!). I hacked out some toy Python code to see how well the thing performed and all I could manage was 100% of one core on one cpu. Uh... With other languages on other boxes I'd been able to peg all the cores on all the CPUs, so it was time for some research. Apparently, Python has a massive lock in the interpreter called the GIL (Global Interpreter Lock). This lock is because not all of python is thread-safe and there are bad things that can happen when multiple threads try to access something non-thread-safe at the same time. The effect of this lock is that, even when using threads, Python is only really doing one thing at a time and can thus only use (the equivalent of) 1 CPU.
At first I was pretty annoyed by this because it sounds ridiculous for a modern programming language to have such a limitation, but after some reading around online I've come to a different conclusion. An e-mail on a mailing list by Guido (The creator of Python and a fellow Googler) got me thinking that threads might not actually be the best way to do things. Each thread has overhead of data structures and with each thread you run into more context switches that are required for your program to run for some amount of time. With thousands of threads I'd be wasting all kinds of CPU cycles! As multi-CPU and multi-core machines are becoming more and more dominant, programmers (like me) need to think about more effective ways to make use of the resources available to them. In my particular situation, running several separate processes that communicate with IPCs and/or shared memory makes a lot more sense. Each process can handle some portion of the independent actions, but can do them serially per "cycle" so things get done in the same amount of clock, but the CPU isn't trying to do thousands of things at the exact same time (and multiple CPUs can be used so more clock time can be used in less wall time). For very simple operations, this saves all kinds of CPU, but it's still useful for more complicated operations.
Python has been discussed as being a lot like Perl but only having one right way to do things, and the consequences of the GIL is an example of that. The right way to use up all the CPU on a machine is to do something other than threads! This should be pretty neat and gives me reason to learn about some parts of Python that are completely new to me. (I never did any IPC/shared memory in Perl or C so this will be completely new!)
Apparently there was some work done on removing the GIL but because of the much finer granularity of locks, it slowed Python down up to 2x on single-CPU machines. Ouch! This was a problem with the first OSs to support SMP, and they got around it by shipping a thread-safe OS and a threadless OS separately. They could do this with Python but it would me a lot more maintenance overhead for the language and it gets messy real quick when people accidentally mix thread-safe and threadless code together. Google around for "GIL" if you're interested in this. Theres a lot more that I read and found interesting!