Blog from October, 2012

Bisque module works

The pollen-tracking module for Bisque is good enough to push into the world.  I believe it works well enough to be useful, and I hope the end users will concur.  I look forward to their feedback about how it can be improved.  I myself know a number of improvements I'd like to see.  But software can eternally be improved, and I know iPlant wants to launch Bisque on November 1, so let's call version 1 complete.  Kobus and I have talked, and we need to coordinate with Nirav and Ravi how to wrap up v1 nicely.

I have noticed that the browser used for Bisque matters.  The experience is nicer on Chrome than Firefox.  In particular, the graphical rendering and the user interface are a bit quirky in Firefox, but not in Chrome (which I believe is the developers' primary dev platform).

Bisque -- much better now

Ok, The Bisque situation is much improved since I last posted.  It seems obvious now the "right" approach all along was to install a separate Bisque engine on bovary itself.  Wish I'd perceived that earlier.  Anyway, now I can develop modules there without disrupting anyone else, and I don't have to wrestle with firewalls or configuration riddles-of-the-sphinx.

I've got the 4D module alive there, i.e., it starts; but I've encountered (or re-encountered?) a bug in the C++ inference code.  So we begin another round of getting the CVPR code fixed up, with the original developer.  Hope that doesn't take too long.

New SLIC blog post

The past couple of weeks the SLIC team has been busy increasing the security of the interface and ensuring its robustness against SQL injections and other malicious attacks. In the process, we have modularized some of the components, fixed minor bugs and added new features. As soon as we can reliably receive feedback from the site, we will be ready to release this new version to the world.

Currently I'm trying to get my local version of Bisque configured so that its "engine server" speaks to its main "server," both of which interact through different ports (27000 and 8080, respectively).  I think the two processes are not communicating.  I tried the simple test of copying the generic "MyData" and "ShExample" modules into the same directory, named "foobar" and "xyzzy."  With quite a bit of cajoling, I got the engine server to recognize foobar but not xyzzy, so that they turn up if one does a GET from localhost:27000/engine_service.  Unfortunately the main server does not even seem to be heeding what its putative engine is providing.  I've made as many intelligent guesses as I can, restarted the servers over and over and over, and even tried quite a few unintelligent guesses in desperation, but nothing is working.  Te documentation is no help, and I'm pretty frustrated.  I could succeed if I had a locally-running Bisque server, but it's a herculean task to get it configured correctly.  I could succeed if I had a locally-running Bisque engine-server that is registered with the one on bovary, but it's impossible to convince anyone with authority to open an external port in the firewall.  There has to be a better way.

Seed affinities

Good news and bad news.  First the good news.  The affinity sampler works pretty OK for different isolated seed silhouettes.  Here are a few examples with lima beans, barley, and wheat (respectively):


   

I've also made a few adjustments to from the earlier version, so that (1) magnification in X and Y directions is correlated, and (2) likelihood and prior seem to be a little more in balance, I guess.

  1. Correlated magnification:  this change represents additional prior knowledge that hadn't occurred to me yet.  The affinity model permits magnifying the X and Y axes by different amounts, which I think is appropriate, but I had let those magnifications move completely independent of each other.  That left the door open for the sampler to project one of the seeds into a thin filament.  However, when a seed is small in one direction, doesn't that suggest it is likely to be small in the perpendicular direction?  I added a correlation, therefore, to the two magnifications (which is kind of a pain because independent variables are so much easier to work with).
  2. Balancing prior and likelihood.  This is a subject that Kobus tells me often arises in a statistical model, but is somewhat difficult to specify precisely.  In general, if and when the likelihood and prior get into a tug-of-war, we want them to be roughly matched in strength.  We don't want one to be able to overwhelm the other.  Unfortunately it seems to be something of an empirical art how to achieve this balance. It would be nice if we had a principled way to find the best balance.

To show an example of what I'm talking about, below I show the lima bean example with two alternative models.  In the first one below, the prior does not correlate the two magnifications, so the bean projection gets super-skinny.  Also, the likelihood sort of overpowers the prior, so the sampler is "terrified"* of expanding the proposal outside the bounds of the red bean.  It gets kind of stuck, therefore, and wastes a lot of time on weak proposals, and doesn't get to a plausible magnification until it's too late.  In the second example, the likelihood is underpowered, and so the sampler "barely listens" to the data; the effect is aimless wandering.
 

* hope you don't mind me anthropomorphizing a bit.

So that's the good news.  The bad news is that the model is incomplete.  Kobus has convinced me that I need to add edge information into the likelihood.  The above is working nicely on isolated seeds, but touching seeds will need more clues about how to fit a template to the data -- a lesson that Kobus and Joe Schlect discovered working with furniture images.

Here's a picture of me stressing out a simple statistical inference engine.  I'm asking it to find an affinity to match up a triangle and square.  Of course, you and I know that a triangle and a square cannot ever be related by an affinity:  straight lines in become straight lines out (by definition), so triangles in become triangles out -- never a square.  Thus, the poor program is squirming this way and that to make the best of an impossible situation.  It's proposing all kinds of affinities, each of which projects the blue triangle (the original) into a new location (the green triangle).  The green triangle dances around as the program accepts new proposals.  Eventually it almost stops moving as the acceptance ratio drops.

To restate this:  I didn't animate the green triangle, but I wrote five components that did:

  • I made up a simple likelihood function that measures "quality of fit" by counting uncovered pixels, i.e., the red and green pixels above.  (Black and blue pixels do not count, nor do yellow pixels, which represent alignment.)
  • I chose affine transforms as the relationship between the given data (which are represented above by the red square and blue triangle).  Furthermore, I constructed a prior function that represents a-priori assumptions about the transform: it can magnify or shrink but not too much, it can rotate any amount, it can skew but only a little, etc.  In other words, some affinities are more probable than others.
  • I designed a proposer function that takes an affinity in, and tweaks it randomly.  It makes a small random change to all seven parameters.  It also can randomly flip the shape in the horizontal or vertical direction.
  • I implemented a sampling method called the Metropolis algorithm, which is one of the simplest MCMC sampling algorithms.  It can be thought of as an optimization procedure.
  • I typed in the coordinates of the square and triangle, gave them to the program, and said go.

For a little dramatic interest, I initialized the affinity so that the blue triangle maps onto itself (visible only in the first frame), but it's not really a sensible choice for compact shapes.  In practice, one would initialize so that the centroids are aligned.

Affinities

Yesterday and today I've been busy writing code to implement a model and method describing an affinity between two sets of seed pixels.  Here's the basic idea:

  • Two seeds out there, s1 and s2, have somehow produced a list of pixel locations:  s1 = ?{(x1,y1),(x2,y2), . . . , (xN,yN)} and s2 = {(u1,v1),(u2,v2), . . . (uM,vM)}
    .  We don't use color or shape, just pixel locations: s1 and s2 are silhouettes.
  • Our model states that these two data are explained by an affinity A.  An affinity is a restricted kind of transformation on pixels.  We postulate that A (which is unknown) will make s1 look like s2.  In other words,  A(s1) = s2 + noise.
  • Taking a generative, Bayesian approach, we want a statistical model p(A,s1,s2) = (k)L(s1,s2; A) p(A).  L is a function called the likelihood -- in this case, the likelihood of the seed data, presupposing we know A (which we don't).  p(A) represents the prior plausibility of some A.  Finally, k is constant of proportionality that we ignore.  We take this approach because if we try a sufficient bunch of candidate affinities A1, A2, A3, . . ., plugging each one into p(A,s1,s2), we often can find an affinity A' that yields a relatively high value for p(A',s1,s2).  That best answer A' can be said to "explain" the data; it's as if we've developed an "understanding" of the data.
  • The above story has a few blank sections still:  the likelihood L, the prior p(A), and the proposal process for A1, A2, etc.
  • For the likelhood, let's start from a nice big constant number and deduct 1 every time there's a pixel in s2 that is absent in A(s1), or vice versa.  So if all their pixels align, there will be a high score (no deductions).
  • For the prior p(A), I've decomposed A into seven more-or-less independent factors (discussed earlier) -- A might magnify or shrink in 2D, but not by a lot.  "Not by a lot" sounds vague, but we can be more quantitative using Gaussians -- say, along an axis, by a factor that on average is 1 and 68% of the time is no more than, say, 40% above or below unity.  Where did I get that?  I made it up, but I'm trying to draw from empirical observations of seeds, and I think I can be a bit sloppy in my guessing.  Likewise I can roughly model my prior knowledge of seed variation by casting it in terms of shear, translation, and rotation.  Rotation is easiest to model, because it's a "known unknown":  the seed could be rotated in absolutely any direction.
  • One annoying detail is that if A magnifies, and if the pixels of s1 are at least 1 unit apart, then some pixels of A(s1) will be more than one unit apart!  This could make s1 look like stripes or splinters!  So I wrote code to augment s1 with pseudopixels between the actual pixels, which will fill in the gaps and prevent splintering.
  • The last major piece is the proposal process.  We will use an approach in the large family of Markov-chain Monte Carlo methods, i.e., its proposals are a mixture of memory and randomness.  A Markov chain partially retains what else has been tried recently, and how well it worked, but then proposes a stochastic increment from that state -- it can be described as an "exploration" of a space.  In this case, a seven-dimensional space.
  • As is always the case with real code implementations, there are other chores:  developing a "language" for representing seeds in a text file, writing code to interpret and produce that language, writing test code to make sure the individual components actually do what I intended, and writing code to display what is happening in text or graphics.
Two things

I worked on two iplant-related things today, Bisque and Seed-counting.

  • The big Bisque goal is to get the pollen-tube tracking algorithm, tentative dubbed "ebtrack," running as a Bisque module.  To that end, I've got the back-end C++ code running on Vision, my local installation of Bisque (the one I got running yesterday).  This meant resolving a few known software dependencies (boost and lapack), and discovering a few previously-unacknowledged software dependencies (imagemagick and ncurses).
  • For seed counting, I ?worked on a distribution over random "affinities" (affine transforms) that could model the difference in shape between the shapes of two isolated, clean seeds.  I think we can factor this into seven independent dimensions (six Gaussian, one Uniform).  Specifically:  two shear parameters, two dilation parameters, two translate parameters (all Gaussian) and one rotation angle (uniform).  This implies a so-called "proposal distribution" that will be easy to draw from, and I can use a technique like Stochastic Dynamics to find a parameter set that will give a good alignment between two seeds.
Local Bisque is finally running

With the generous help of Kris Kvilekval, I've just managed to get a Bisque Server and Bisque Engine up and running on a local machine.  This is very good news, because it means I'll be able to develop user interfaces for our pollen-tube-tracking modules.

The user interface part of a Bisque module requires stopping and restarting the Bisque server with every major change, so that that change gets noticed.  Consequently, the development would be very disruptive to a live server.  That's why we wanted a local instance (my own private Bisque).  But we had a number of headaches, mostly due to the firewalls around this machine.  I've been working on this installation off and on since mid September, so it's a relief to get it going now.


Blank slate:  One user, no images.