Busy Developers Guide to Chandler Performance Optimization
Does some action you do in Chandler feel too slow? If so, you should first check if our
existing performance tests cover that scenario.
If there is no test, you should start by developing a test. Start by copying from one of the existing tests. This way you and others will be able to run the same test easily, repeatedly, with the same settings. Also, if we add that test to Tinderbox we get automatic monitoring of how the performance changes over time.
Once you find or write a test, you want to run the test a few times to get a baseline number. When you make improvements, you want to make sure that they significantly improve from the baseline.
To find out what exactly is too slow, you MUST profile the code. Just by looking at the code it is practically impossible to know.
See how you can
run the performance tests.
Workflow:
- Identify slow usage scenario
- Find or create a test
- Get baseline performance numbers
- Get profile
- Analyze profile
- Create potential fix
- Repeat from 4. until satisfied
- Verify results
- Communicate
Getting Times, Reading Performance Test Output
Whether you are running the test for the first time to get a baseline number or after some changes, you should first close down all other programs to reduce the normal random performance variation. You should also run the test several times to see how much actual variance there is between the runs.
IMPORTANT: Always measure and profile using optimized release builds.
An easy way to get the numbers is to use rt.py, which can also take an optional --repeat option to specify how many times to run the performance test; in practice 5 is reasonably good number of repetitions:
./tools/rt.py -t PerfStampEvent --repeat=5
Note: If you're doing one of the large-data tests (that is, named "PerfLargeDataSomethingOrOther"), you'll need to generate the large repository (which rt.py will automatically tell the large-data tests to restore when it runs them). The PerfImportCalendar test creates this repository, so just run it once to make that happen:
./tools/rt.py -t PerfImportCalendar
rt.py will pretty print the results and calculate standard deviation for you. These will look like this:
PerfStampEvent.py 4.62 4.52 4.76 4.54 5.69 | 4.62 ± 0.49
The numbers before | are the individual runs, the number immediately after | is the median of the values and the last number is the standard deviation.
WARNING: The pretty printed results show only the
last measured test from each file. See
catsProfile command line argument for more information.
Below the pretty printed values, and also when you run without rt.py, will be all of the results including lines such as the following:
OSAF_QA: Perf_Stamp_as_Event.Note_creation | 13845 | 1.299634
OSAF_QA: Perf_Stamp_as_Event.Change_the_Event_stamp | 13845 | 0.221906
OSAF_QA: Perf_Stamp_as_Event | 13845 | 2.324717
The text after OSAF_QA: specifies the test, next is revision number, and the last shows the time in seconds. So which line is the actual
test result?
perf.py has the official list, but it is generally easy to determine, for example in the above case the middle line shows the time it took to stamp.
Getting a Profile
There are several ways to profile Chandler code.
catsProfile command line argument
WARNING: --catsProfile will not work correctly with tests that do several tests in a single file. These include at least the following: PerfLargeDataJumpWeek.py, PerfLargeDataOverlayCalendar.py, PerfLargeDataSwitchCalendar.py, PerfLargeDataSwitchToAllView.py, PerfSwitchToAllView.py and PerfLargeDataSharing.py.
If you add the --catsProfile=<filename> command line argument when you run Chandler, the scripting and testing framework will automatically generate a hotshot profile for you and save it in
filename. The framework will try its best to skip profiling code that is part of the the test framework and which would not be run in the real world scenario.
rt.py can make this even easier. For example, to get a profile of PerfStampEvent.py, run this:
./tools/rt.py -Pt PerfStampEvent
This will create the hotshot profile in ./test_profile/PerfStampEvent.hotshot.
Modifying code to run in hotshot profiler
If --catsProfile does not work for you, you can also manually make hotshot profile the call you want to test:
# we want to profile foobar()
import hotshot
prof = hotshot.Profile("foobar.prof")
prof.runcall(foobar)
There is more
documentation about hotshot on the Python website, including interacive samples on how to run and analyze a hotshot profile.
Using quickprofile
Alec Flett wrote a
quickprofile module. This is how you can get a profile for a function:
from util.easyprof import QuickProfile
@QuickProfile('foobar.prof')
def foobar():
Analysing profile
Whatever tool you use, you are looking for code that is too slow.
There are three interesting things per function: cumulative time, individual time, and number of times called. You should first sort by cumulative time.
Look for functions and methods that account for at least 1% of the profile (preferably much more than that), starting with the highest % of course.
Some functions are slow because they are called often. Some others are simply slow. The worst are those that are slow and are called often.
When looking at the profile, think about what you are seeing. Are all the function calls actually needed? In the best case you find code that should not be called at all and you can remove it, or can remove the call to it in the scenario you are profiling. In some cases you will find code that is called too often, for example creating an object in a loop when it could be created once outside the loop.
One typical performance optimization is trading space for speed. In other words, caching the results of slow calculations.
Perhaps in the majority of cases there are no simple oopsies in the code that can be fixed. You will need to think about a different implementation, using a different algorithm perhaps.
You should also read
Python Performance Tips.
KCachegrind
By far the easiest way is to use a visual profile analyzer. Currently we know of only one, and it is available on Linux only:
KCacheGrind.
To use KCachegrind you first need to convert the hotshot profile to KCachegrind format, and then launch KCachegrind:
$ hotshot2calltree filename -o filename.prof
$ kcachegrind filename.prof
easyprofileanalyzer
The second easiest method is probably to use easyprofileanalyzer, written by Alec Flett.
Read the script to see how to use it.
hotshot
Read the
manual.
timeit module
Python has a handy
timeit module which makes it easy to compare the performance of (usually) small and fast blocks of code. timeit is especially well suited to cases where you have tight loops executing hundreds or thousands of times and need to find the fastest implementation.
An example:
>>> import timeit
>>> t = timeit.Timer('1 > 0 and 1 < 2')
>>> t.timeit()
0.24477005004882812
>>> t = timeit.Timer('0 < 1 < 2')
>>> t.timeit()
0.22036290168762207
Verifying results
Once you have identified slow code and think you have fixed it, there are a few verification steps. Reprofile, and make sure that the function you optimized is now taking a smaller percentage of the profile. Run the performance test to make sure that wall clock agrees that the test got faster.
You will of course need to run unit and functional tests before checkin, but since performance optimization is notorious for introducing regressions you should consider playing with the app a bit manually to make sure that everything is still in order. Asking for code review prior to checkin is also highly recommended.
When you checkin, mention how much the expected speedup is. Finally after checkin,
monitor the performance numbers to make sure that the expected performance gains materialized. It would be nice to note in the bug the actual gains.
Communicate
Finally, let other people know what you find, and what you are working on. By letting others know of the problems you find, and how you solved them, will often help others realize similar improvements in areas they are working on. And of course letting others know what you are working on will avoid duplicate work.