Build Sheriff
The sheriff is the person whose job is to make sure the Chandler tree stays in good condition and everyone plays by the rules.
Schedule
We have rotating sheriff schedule. When a person's name appears for the day, that person has the responsibilities and powers of the sheriff for all of that day (although it's not required that you watch the tree outside office hours).
Suggested shifts
December
| Mo | Tu | We | Th | Fr |
| | | | 1 | 2 |
| 5 | 6 | 7 | 8 alecf | 9 aparna |
| 12 jeffrey | 13 bkirsch | 14 capps | 15 davids | 16 donn |
| 19 heikki | 20 grant | 21 jed | 22 bear | 23 john |
| 26 | 27 morgen | 28 pbossut | 29 pje | 30 rae |
January
| Mo | Tu | We | Th | Fr |
| 2 stearns | 3 twl | 4 | 5 dan | 6 |
| 9 | 10 | 11 | 12 | 13 |
| 16 | 17 | 18 | 19 | 20 |
| 23 | 24 | 25 | 26 | 27 |
All other times the sheriffs are implicitly
MikeT and
HeikkiToivonen - although as you can guess availablity will be spotty.
Tasks
As a rule of thumb, if there is anything you can't do, or don't know what to do, contact
MikeT (bear on IRC), or if he is not available,
HeikkiToivonen (heikki on IRC).
Availability and messages
- Be on IRC, #chandler channel.
- Whenever you notice a problem on Tbox, add a message to the Tinderbox notice board (the little star icons) to inform others you are on it.
Builds
- Make sure Tinderbox contains columns for all machines that it should. If some are missing, contact MikeT, or if he is not available, HeikkiToivonen. As of 12/6/2005 the columns should be:
| ahukini-full-osx | alanui-full-linux | haleakala-full-win | kona-win | maunaloa-linux | molokini-osx | p_linux | p_osx | p_win |
- Make sure any branch Tinderboxes are also in order. Until 0.6 is released you also need to watch for 0.6 Tinderbox. It should have the following columns: kona-0.6-win, maunaloa-0.6-linux, molokini-0.6-osx.
- Make sure each column has recent results. *-full-* machines can take up to two hours to return results. p_* machines can take up to two hours to return results. All others can take up to one hour. If some are taking longer to appear, contact MikeT, or if he is not available, HeikkiToivonen.
- Make sure each column is green (builds and tests successful). In case of orange (tests failed) or red (builds failed), first see if there were any checkins that could have caused it. If not, contact MikeT, or if he is not available, HeikkiToivonen. If there are checkins, see the error and try to determine whose checkin(s) could have caused it, then inform them of the error. If they cannot get the error fixed within a day, consider backing out the change. Here are some instructions on how to track down the bad checkin(s):
- Click on the L link in the orange/red cell and select view full log (you can configure Firefox so that middle click on L will open the full link in a new tab)
- Near the top of the log you can see at what revision the build is, and what were the newly Svn updated files for that cycle.
- Either find and click to the error at the top of the window and/or scroll down to the location. If you clicked the link, you will need to scroll up a little bit usually to see the actual error.
- Note that compiler/linker errors may not be visible in the log on Mac and Linux (this is Bug:2818) - in that case you can only find the last file/command that was being executed that failed.
- Note that the log that contains errors ends in a make realclean, so if you want to see the last error you need to scroll a little bit up from the end of the log, just before the make realclean log begins.
- You can query Bonsai for a list of checkins in a time period to help track down who checked in and what in that period. The Tinderbox page has a link to show the last 6 hours, which corresponds to the visible area in Tinderbox. You can write pretty complex queries with Bonsai.
Performance
- Make sure there are recent performance numbers in the performance table above Tinderbox columns. The first header shows when the latest results came in. At the end of the table it has three lines that show when the latest results came in for each platform. Check that those numbers have been updated from the latest results from the p_win, p_osx and p_linux results (you can see when they finished from the Tinderbox table below). Note that the performance table gets regenerated every 15 minutes rather than continuously. If some results are taking longer to appear, contact MikeT, or if he is not available, HeikkiToivonen.
- Make sure performance numbers don't regress. The table on top of the Tinderbox columns shows the latest performance results for each platform. The top row shows the difference between the last two runs. Clicking on the links in the performance table takes you to a detailed history page (currently for the last day only, see Bug:4131). Scan through the graphs to see if some checkin caused us to "jump up", i.e. some test started consistently started taking longer to run. Spikes can be ignored. It's best to confirm the same checkin caused a jump on all platforms, although there are platform specific regressions as well. In case you notice a regression, find the person who checked in and inform them of the regression. If there is no fix within a day, make sure a bug is filed. If possible, take a last look at the graphs as the last thing in the evening, and file bugs for any perf bugs you see.
Checkins
- Make sure all checkins follow current checkin practices. If you notice discrepancies, first contact the person who checked in. Depending on the kind of mistake the actions that follow could include: backing out the checkin, sending a comment to dev (for example if the checkin comment was wrong/misleading/incomplete) or something else. Some things to check:
- Was this type of checkin (or any checkin) allowed? There are times when the tree is closed for all checkins, for example when spinning a release.
- Was there a bug number (if bug numbers are required for that type of checkin).
- If there was a bug number, was it the correct bug number?
- Was there a reviewer if reviewers are required for that type of checkin?
- Did the checkin comment explain what issue was fixed and why the checkin fixes it?