Multithreaded spot detector

From early March until now, I've been collaborating with an undergraduate researcher (Salika Dunatunga) on improving the performance of our pollen tube tracker.  Specifically, we are focusing on speeding up part of the inference engine (the brown box in this diagram), using one particular technology, threads.  There is ample room for this kind of improvement!  The module currently takes hours to produce high-quality tracks, which is annoying to anyone in a hurry (i.e., everyone), but fortunately we know a good strategy for making it better.

As I've mentioned before, the inference engine works in two stages, a detector stage followed by a tracker stage.  The detector stage takes images as input -- messy, noisy grids of millions of pixels, each pixel having at an X, Y, Z location and time coordinate plus a corresponding image intensity there.  From that input, it produces a few neat text files of numbers.  Ideally, the text file output catalogs the positions of the pollen tube tips.  (It inevitably makes a few errors of omission and hallucination too, but those mistakes are not in our scope today.)  We are focused on speeding up the detector.

The detector stage, thus, has to wade through hundreds of images, to glean a relative handful of interesting information.  The good news is that the images can be safely treated as independent data.  In other words, the conclusions we draw from image #42 depend solely on its pixels, and not on the pixels of any other image.  Perhaps that sounds obvious, but it need not be true.  Imagine, for example, that image 42 and image 43 are adjacent in Z-depth, and a pollen tube tip is perfectly between them.  The tip would fluoresce equally brightly in both, but an ideal detector would "know" to detect a tube tip once, neither missing the tip nor doubly-detecting it.  To do so, it might have to consider neighboring images.  Ravi's lab avoids this problem by setting the Z-stack spacing to a relatively large value, so the tips tend to appear at most once.  Our tracker (stage 2) also copes by presuming the fallibility of the detector -- it "knows" the detector will occasionally miss its target.

Another piece of good news is that modern computers are trending towards greater and greater parallelism.  It's currently easier to make an eight-core processor than make one processor eight times faster.  And there's a programming technology called multi-threading which lets one in essence multi-execute one program simultaneously -- run multiple inputs through one set of instructions, at the same time.  The details are hairy, and often confusing.  It's dangerously easy for the parallel paths to interfere with each other.  Salika and I are both fairly new to this style of programming, and we are learning it at the same time.  We are using basic pthreads, and have encountered and overcome a fair share of enigmatic bugs (in both our own and in others' code).

We are not done yet, but a couple of weeks ago I think we turned a corner -- we finally have two threads working pretty well.  I think if we can do two, we probably can do twenty with not too much more work (yet more than you might expect -- just like it takes more work to plan and organize a dinner for twenty than for two).  So I predict we will be able to boast of some significant speedups soon.