Pilot Benchmark Framework

The Pilot Framework

Pilot is a framework that is designed for collecting precise benchmark results in the shortest possible time. This is useful when a designer or administrator needs to evaluate many candidate parameters. Pilot analyzes time series data in real time and tells you when the desired width of confidence interval is reached. Pilot can also automate many benchmark chores, such as measuring the overhead, detecting warm-up and tear-down phases, discovering bottleneck of the system, and comparing very close benchmark results. It comes with an easy-to-use scriptable interface with C/C++/Python bindings.

Just take me to the code!

Pilot Helps To Answer These Questions

  • How long shall I run this benchmark to get a precise results?
  • Is my new algorithm really 3% faster than baseline or is that an error?

Pilot Is Designed To

  • help testers who may not have enough statistics knowledge
  • get accurate, precise, repeatable results
  • get results using shortest possible time while still meet statistical requirements

Pilot Can Be Used By

  • researchers, engineers, testers, users

Pilot Is

a lightweight C++ library (BSD 3­clause or GPLv2+ license) with C/C++ macro, library, and CLI interfaces

Pilot’s Function

  • can handle statistical stuff so your benchmark results are statistically valid
    • measure and reduce the autocorrelation among samples
    • calculate the confidence interval (CI)
    • use t­-distribution to calculate the required number of samples to achieve the desired CI
  • can detect (and remove) warm­up and cool­down phases using multiple methods
  • can compare benchmarks that have very close results
  • is optimized to get results using the shortest time (for both measurement and comparison)
  • provides many easy­-to­-use interfaces
  • is extensible through plug­ins
  • and more …


Our MASCOTS’16 paper contains technical details of Pilot.


Join Slack or the mailing lists to receive future release announcements or share your experiences:


Pilot was a research project from the Storage Systems Research Center in UC Santa Cruz from 2015 to 2016. This research was supported in part by the National Science Foundation under awards IIP-1266400, CCF-1219163, CNS-1018928, CNS-1528179, by the Department of Energy under award DE-FC02-10ER26017/DESC0005417, by a Symantec Graduate Fellowship, by a grant from Intel Corporation, and by industrial members of the Center for Research in Storage Systems.

This project does not reflect the opinion or endorsement of the sponsors listed above.