Archive for the ‘Build’ Category
Moved March 17th, 2008
Awhile back, we made a slight organizational change. Alice Nodelman, Mikeal Rogers, Chris Cooper and myself made a lateral shift to join forces with John O’Duinn, Rob Helmer, Ben Hearsum, Nick Thomas and Chris Cooper on the Build & Release team to create a larger, many-limbed crime-fighting robot. (Astute readers will note that Coop is actually two people) The reason for this was simple: Build & Release uses the same tools we’ve been using to drive the automation systems behind Talos, JS Testing and the Unittest machines. I think we’ve settled on the name of “Release Engineering” to describe this new beast and John O’Duinn has created some new Bugzilla components to reflect this.
What does this mean to you, the developer/tester/user? If you find a bug in one of the automated systems, you should now file that under “Mozilla.org:Release Engineering” in Bugzilla. I’ll still be monitoring Core:Testing but that should be used predominantly for Mozilla-related test harnesses and tools. We’re currently in the process of triaging Build & Release bugs and relevant bugs under Core:Testing and hope to have them re-assigned and re-bucketed within a week or two.
In other moving-related news, I’ve managed to reposition myself “closer to Greenland” as some have quipped. This shouldn’t really affect my availability for those of you in the West as I’ve shifted my hours up to compensate – this just means I get to sleep in a little more. What has been an issue is my ongoing struggle with Aliant.net to get my internet up to the speeds I’m actually paying for. They have a few more days to get this fixed and then I’m going to open up the bidding.
To the shamrock-wearers, happy St. Paddies day!
Tags: Build
Posted in Build, Mozilla | Comments (0)
Tinderbox Remixed (qa vs. build mashup) January 14th, 2008
There’s been some talk around the water cooler recently about some improvements to the build farm. I think this is great and will go a long way towards making the build systems easier to work on with a minimum of localized soreness.
First, a picture. This is the way stuff happens now:

(edit - X axis is time, think of these diagrams as gantt charts with some extra lines showing what happens, i.e., where things come from and where they go)
A few of notes about this group of largely disconnected systems:
- Builds happen all the time.
- Nightlies are produced regardless of what the other machines are reporting, i.e., there is no feedback from any of the testing machines to the build system that produces nightlies.
- There is no direct route from the Try server to the build system. Worse, try builds are not being put through unittests leaving the burden of testing on the developer who writes the patch.
In a short while, this picture will morph (slightly) into this:

In this slightly better world:
- Builds happen on checkin
- Nightlies will still be produced from the last build at a certain hour
- Try server builds will be unit tested and run through Talos for performance testing
In this view, the build machines and unittest machines work on checked-in patches, in parallel. Talos picks up the builds once they’ve landed on staging from the build machines. This picture is somewhat simplified as we may have parallel build machines and we already have parallel Talos machines taking in builds as they become available.
Another benefit we should see is Talos results lining up more closely with the build they’re associated with. Currently, because the Tinderboxes keep churning builds, Talos runs at a lag, sometimes with a build or two in queue as they struggle to catch up. This should go away with some gaps during the day allowing the Talos boxes to maintain parity with the main build machines.
Of notable difference here, when a patch gets submitted to the Try Server, it runs through the full gamut of unittests and at the end of that, runs a “make package” on the objdir producing a light(ish) build and putting it into the staging area for consumption. At that point, the talos try server can pick it up and run it through the full performance tests for analysis. While that’s going on, people can download the try build and play with it to see if they’ve broken anything. That’s step one in the testing lifecycle of a patch. This should be reality very soon, thanks in part to some help from the good people at Seneca College.
But we can still do better.

In this future, utopian landscape, the unittest and build machines can pick up a patch and start churning and testing builds before handing them over to Talos. At a specified time of day, the nightly machines which have been waiting on the results of the test boxes can now pick up and run the day’s patches and produce a proper build devoid of any extraneous testing code and optimized for userland. This build then gets picked up by the Talos servers and run through the performance tests again. If and only if all of these testing stages complete, the build can be pushed to the update servers for wider dispersion. This should severely limit the number of bad builds we’re able to ship to the world.
Ok, I admit that these pictures aren’t wildly-different from one another. The differences are in the connections and how we use them. By adding some dependency on the testing machines, we can ensure that we’re building the most robust nightlies possible. There are a few caveats to get to this awesome future. The Windows test machines must become more solid. We’ve been struggling with these for almost a year now and the time has come to do something about it. Better displays for gathering at-a-glance tree information are becoming increasingly important as we cram more information onto the main Firefox Tinderbox page. This is going to increase over the year as the JS testing machines come online, possibly requiring a separate tests only page. As more tests are added to Talos, it will become harder to keep on top of the results as builds come in, so this will require some rapidly-accessible view onto that data.
It’s 2008. Do you believe in the future?
Posted in Build, Infrastructure, Mozilla, Testing | Comments (3)
Hello and Welcome August 22nd, 2007
After a year of me slugging away at buildbot-related stuff, I’ve finally got some serious help. Chris Cooper (aka Coop, not to be confused with The Coop) has been digging into the buildbot setups and machinery running the unittest boxes and is helping out with the performance stuff. He’s been able to make sense of my haphazard documentation and run an install on the currently sluggish qm-xserve01, so a big thank you for that.
While I’m on the subject, we’ve gone through a number of attempts to repair qm-xserve01. The most recent and most drastic measure was a full reinstall of the OS and build tools. The machine’s still running slowly and we’re looking at getting the hardware looked at. In the meantime, please use bm-xserve07 on the firefox page. Another big thank you to Nick Thomas (cf) and John O’Duinn for donating that. I swear we’ll give it back Real Soon Now!
Last but definitely not least, Ben Hearsum has signed-on as a permanent resident and is rocking the buildbot talos stuff. He’s currently embroiled in the setup and installation of the talos linux and mac ports as well as bringing up some new Windows hardware. He’s also recently put some finishing touches on the buildbot try server which should help take some of the orangeness off of the main unittest machines. A heart-felt welcome and thank you goes out to him too.
As proof of all the awesomeness that’s going on, talos is about to begin reporting on the main Firefox tree for your amusement. Expect a few small fires along the way.
So, what am I going to do now that I have all this excellent help? Am I going to take a bunch of long vacations and begin a series of complex, time-consuming hobbies? Oh, dear me, no… I’ll be helping Bob Clary get his JavaScript testing frameworks automated under (you guessed it) buildbot. This is a long-overdue task that’s been taunting me from my to-do list since I started here.
Posted in Build, Quality | Comments (0)
qm-centos5-01 a little orange July 10th, 2007
I mentioned earlier today that I was bringing up a new machine for Great Justice. The sparky new qm-centos5-01 sporting a new Reference Platform VM. Well, it turns out there’s enough of a difference in the base libraries that a number of tests are failing out of the gate.
Ted M. noted that 3 of the four failing reftests appear to be kerning differences. The fourth, “(actually the first in the list) looks like it’s off by one pixel in size or so”.
The mochitests and chrome tests I haven’t looked at closely, but they appear to be a mixture of timeout, textarea and focus bugs.
If you want to take a look, grep the full log for instances of ERROR FAIL. Your help in getting qm-centos5-01 to turn green will be greatly-appreciated.
Also, an apology to Paul Reed: Apparently twm is the fallback window manager after all other options have been exhausted. I deleted the standard build user and was left with a partially-configured user. With his help I was able to turn on gdm. IOU five bucks, Paul!
Posted in Build, Quality, Testing | Comments (4)
Please welcome… July 10th, 2007
While the tree is closed for some network storage tests, I took the opportunity to insert a new member into the unit testing family. Please give qm-centos5-01 dep unit test a warm welcome. With his arrival, I hope we can get a newer Cairo installed and working and if we’re really lucky, passing tests.
Don’t fear him. He’s here to help.
This adds an extra bit of width to the Firefox tinderbox tree. Over the next week or so, I’m going to monitor qm-centos5-01’s progress and eventually, we’ll probably pull qm-rhel02 from the page, putting it to work elsewhere.
Also, props to Paul Reed for choosing the coolest window manager 1993 had to offer!
Posted in Build, Quality, Testing | Comments (0)
One more thing… May 3rd, 2007
The Mac unit test machine, qm-xserve01 is currently set on a rapid-cycling debug mode in an attempt to catch a stack trace on exit. Feel free to take a look through the logs and fix any assertions you see there.
Posted in Build, Quality, Testing | Comments (0)
Rhymes with Orange May 3rd, 2007
Alternatively, “what’s that smell?”
There has been an awful lot of orange on the Windows 2003 and OS X unit test boxes lately. Mostly due to shut-down errors. If you see one of these boxes turn orange after a checkin, but no obvious failures in the suites (pass/fail/todo in mochitest and reftest), it’s likely because of one of these exit failures which you can see at the end of the log.
On Windows 2003, the failure looks like,
FAIL Exited with code 1280 during test run
It turns out that Error code 1280 is specific to Windows 2003 and happens when a thread is converted to a fiber one too many times. To help track down the cause of the error, I’m working with Ted Mielc… Ted Milcza… Milk… luser on IRC and setting up an Airbreak/Breakpad/crash reporter processor to try and capture and debug this error. Expect some downtime on qm-win2k3-01 while I’m setting this up.
PS, Apologies to the always-charming Ted Mielczarek whose name I had to look up for proper spelling.
PPS, kudos to jwatt for landing a set of SVG tests into the reftest suite this morning!
Posted in Build, Quality, Testing | Comments (0)
ATTACK OF THE ROBOTS April 26th, 2007
The new box is all set up and chugging along happily. WINNT 5.2 qm-win2k3-01 dep unit test landed yesterday around 12:38 in a sea of burning fire. The new machine is now reporting to the Firefox tree on Tinderbox.
But what does this mean to you, Dear Developer? I’m glad you asked. The new machine is now cycling Windows builds and test runs in about 20 minutes — approximately 3 times faster than its Windows XP VM counterpart, WINNT 5.1 qm-winxp01 dep unit test. We’ll leave them both running for awhile before eventually moving qm-winxp01 to lighter duty elsewhere.
Also, the build environment on qm-win2k3-01 (can I just call it Robby?) is updated with the latest version of mozilla-build and contains Zero Cygwin. This should keep this machine more compatible with future changes to the build system on Windows and cut down on excess kruft.
Posted in Build, Quality, Testing | Comments (5)
New ‘Bots Coming April 25th, 2007
I’m just putting the finishing touches on a new machine that will eventually replace the qm-winxp01 unit test vm running on the Firefox Tinderbox Tree. The new machine will be called qm-win2k3-01 dep unit test and will have some beefier hardware behind it. Hopefully reducing cycle times and providing a better platform for some of the other testing frameworks we’ll be adding in the near future.
I’ll try to keep the other machines running as much as possible during the update, but there may be a few hiccups. I’ll be monitoring #developers on irc.mozilla.org just in case something goes Horribly Horribly Wrong.
Posted in Build, Quality, Testing | Comments (0)
a correction to CLOBBERin’ time March 28th, 2007
the correct path for the clobber files is mozilla/tools/tinderbox-configs/firefox/$platform/CLOBBER. Not mozilla/testing/… as I posted earlier.
Watch this space for further updates!
Posted in Build | Comments (0)