~robcee/ more than just sandwiches

Posted
6 June 2008 @ 9am

Tagged
Infrastructure, Mozilla

Firefox 3 Unittest Architecture

Now that we’re on the verge of shipping Firefox 3 to hundreds of millions of users around the planet, and getting ready to move our development efforts into new territory on a new repository, it’s a good time to talk about the tireless robots that have diligently checked every source change for nearly a year and a half. They’ve worked hard, been yelled at, called names and had tense discussions with developers and IT personnel during that entire period. Some have passed on, retired to the quieter pastures of staging or have been melted down for scrap. A couple of die-hards survived relatively unscathed and are looking forward for more action, their scars worn like badges of honor…

unittest diagramOk, that’s laying it on pretty thick, but I have a fond affection for a few of these machines. For instance, qm-xserve01 was the original Mac unittest Xserve and nothing else has been able to touch it for compile and test time. We recently dropped qm-xserve06 onto the farm and at best, it’s about 10 minutes slower end-to-end than xserve01. I don’t know why this is and it shouldn’t be – xserve06 is newer and sporting more recent processors and I believe a faster bus. See also qm-win2k3-01, our first physical box running Windows unittests. It’s been there since April 26th of last year, as Shaver once quipped, “qm-win2k3-01 taught me to love”. We’re still waiting for the t-shirts.

Why are there three of everything? We borrowed a trick from Talos. One of the difficulties of running a testing box is that you sometimes get funny results. There are a lot of moving pieces on these machines and we try to keep them as tight as possible, running as few extra components as we can on identical operating systems (across a single platform), but sometimes things just go a little funny. The thinking was having sets of three (triadic) machines would introduce some error-checking into the runs. One orange run out of three can be discarded. Two out of three and you might have something a little harder to reproduce. Three out of three and you close the tree and backout. I think that system works pretty well in most cases.

The one sad point we’ve really struggled with during this period is the linux platform. We’ve been running linux unittests on our Centos5 ReferencePlatform which is solid and well-maintained by Ben Hearsum and Nick Thomas. Running it on VMs introduces some variability that has been really hard to shake down. Tests involving timers are almost guaranteed to fail at some point. Sometimes, if disk access gets a little busy, we see errors in reftests loading some resources, usually pngs of even the smallest size. Then there are the random mochi* failures. These problems are hard to reproduce and even harder to diagnose, usually resulting in a shrug and someone shakily pushing the “force build” button on the waterfall page in the hopes that the problem will go away on the next build. It often does.

If you can shed any light on these on-going issues, we could use some help. Please see bugs:

Moving forward to mozilla-central, we’ve got a few machines chugging along there and I hear qm-win2k3-03 picked up its first real regression yesterday so things seem to be progressing nicely. Over the next few weeks, we’re going to give the Firefox3/Gecko1.9 machines a spit-polish, upgrading all of them to buildbot 0.7.7 and matching their installations to what we’re using on the new machinery. As for what the new machinery will look like, I have a sneaky suspicion it’s going to look a little bit like Talos.


4 Comments

Posted by
Coop
6 June 2008 @ 11am

> We’re still waiting for the t-shirts

How about this:

qm-win2k<3-01


Posted by
Coop
6 June 2008 @ 11am

Well, the blog ate my styling. Try to imaging the <3 above as red.


Posted by
robcee
6 June 2008 @ 12pm

d’aw.

we should get it something nice to commemorate the release. Maybe a spiffy new paint job!


Posted by
morgamic
6 June 2008 @ 7pm

Hey robcee, nice post — keep it coming.


Leave a Comment

Powered by WP Hashcash