~robcee/ field notes of a pyroentomologist, now 33% snappier!

Posted
14 January 2008 @ 3pm

Tagged
Build, Infrastructure, Mozilla, Testing

Tinderbox Remixed (qa vs. build mashup)

There’s been some talk around the water cooler recently about some improvements to the build farm. I think this is great and will go a long way towards making the build systems easier to work on with a minimum of localized soreness.

First, a picture. This is the way stuff happens now:

Build-Talos-Unittest-1

(edit – X axis is time, think of these diagrams as gantt charts with some extra lines showing what happens, i.e., where things come from and where they go)

A few of notes about this group of largely disconnected systems:

  • Builds happen all the time.
  • Nightlies are produced regardless of what the other machines are reporting, i.e., there is no feedback from any of the testing machines to the build system that produces nightlies.
  • There is no direct route from the Try server to the build system. Worse, try builds are not being put through unittests leaving the burden of testing on the developer who writes the patch.

In a short while, this picture will morph (slightly) into this:

Build-Talos-Unittest-2

In this slightly better world:

  • Builds happen on checkin
  • Nightlies will still be produced from the last build at a certain hour
  • Try server builds will be unit tested and run through Talos for performance testing

In this view, the build machines and unittest machines work on checked-in patches, in parallel. Talos picks up the builds once they’ve landed on staging from the build machines. This picture is somewhat simplified as we may have parallel build machines and we already have parallel Talos machines taking in builds as they become available.

Another benefit we should see is Talos results lining up more closely with the build they’re associated with. Currently, because the Tinderboxes keep churning builds, Talos runs at a lag, sometimes with a build or two in queue as they struggle to catch up. This should go away with some gaps during the day allowing the Talos boxes to maintain parity with the main build machines.

Of notable difference here, when a patch gets submitted to the Try Server, it runs through the full gamut of unittests and at the end of that, runs a “make package” on the objdir producing a light(ish) build and putting it into the staging area for consumption. At that point, the talos try server can pick it up and run it through the full performance tests for analysis. While that’s going on, people can download the try build and play with it to see if they’ve broken anything. That’s step one in the testing lifecycle of a patch. This should be reality very soon, thanks in part to some help from the good people at Seneca College.

But we can still do better.

Build-Talos-Unittest-3

In this future, utopian landscape, the unittest and build machines can pick up a patch and start churning and testing builds before handing them over to Talos. At a specified time of day, the nightly machines which have been waiting on the results of the test boxes can now pick up and run the day’s patches and produce a proper build devoid of any extraneous testing code and optimized for userland. This build then gets picked up by the Talos servers and run through the performance tests again. If and only if all of these testing stages complete, the build can be pushed to the update servers for wider dispersion. This should severely limit the number of bad builds we’re able to ship to the world.

Ok, I admit that these pictures aren’t wildly-different from one another. The differences are in the connections and how we use them. By adding some dependency on the testing machines, we can ensure that we’re building the most robust nightlies possible. There are a few caveats to get to this awesome future. The Windows test machines must become more solid. We’ve been struggling with these for almost a year now and the time has come to do something about it. Better displays for gathering at-a-glance tree information are becoming increasingly important as we cram more information onto the main Firefox Tinderbox page. This is going to increase over the year as the JS testing machines come online, possibly requiring a separate tests only page. As more tests are added to Talos, it will become harder to keep on top of the results as builds come in, so this will require some rapidly-accessible view onto that data.

It’s 2008. Do you believe in the future?


3 Comments

Posted by
kev
14 January 2008 @ 4pm

remind me I need to ply you with beer and go over my plans to take over the world, because I think I see some direct impacts/gains with this. like, maybe wed night, if you’re free.

and yessir, I believe.


Posted by
Ted Mielczarek
15 January 2008 @ 7am

If we’re in the future, where are our benevolent robot masters?


Posted by
robcee
15 January 2008 @ 10am

that future ceased to exist when kev travelled back in time and retrieved the SkyNet device.


Leave a Comment

Powered by WP Hashcash