2 releases
0.1.1 | Feb 4, 2025 |
---|---|
0.1.0 | Jan 28, 2025 |
#118 in Date and time
248 downloads per month
46KB
706 lines
cpu_timer
A library to support timing execution of code using a high precision, low overhead CPU clock tick, with a fallback to std::time where the CPU architecture does not support a high precision timer.
This provides a suite of timer types, from simple elapsed-ticks, through various accumulation and occurrence counted elapsed timers, and traces.
Usage
Add this to your Cargo.toml
:
[dependencies]
cpu-timer = "0.1.0"
Releases
Release notes are available in RELEASES.md.
License
Licensed under either of
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
lib.rs
:
This library provides architecture/implementation specific CPU counters for high precision timing, backed up by a std::time implementation where an architecture has no explicit CPU support
The timers are really CPU tick counters, and so are not resilient to threads being descheduled or being moved between CPU cores; the library is designed for precise timing of short code sections where the constraints are understood. Furthermore, the timer values are thus not in seconds but in other arbitrary units - useful for comparing execution of different parts of code, but requiring another mechanism to determine the mapping from ticks to seconds
Precision
For some architectures a real CPU ASM instruction is used to get the tick count. For x86_64 this returns (in an unvirtualized world) the real CPU tick counter, with a fine precision. For Aarch64 on MacOs this is no better than using std::time, and has a precision of about 40 ticks. However, the asm implementation has a lower overhead on Aarch64 on MacOs, so it is still worth using.
The library does not attempt to take into account any overheads of using the timers; that is for the user. Normally the overheads will be small compared to the times being measured.
CPU support (for non-experimental Rustc target architectures)
For the stable Rustc-supported architectures, CPU implementations are provided for:
- x86
- x86_64
- aarch64
- wasm32
Nonsupported architectures resort to the std::time::Instant 'now' method instead (which can be perfectly adequate)
Types
The types in the library are all generic on UseAsm whether the CPU architecture specific version (if provided) of the timer should be used, or if std::time should be used instead. For architectures without a CPU implementation, the std::time version is used whatever the value of the generic.
Timer
The base type provided by this library is [Timer], which simply
has a start
method and an elapsed
method, to delver the ticks
(as a u64) since the last `state. It uses a generic UseAsm bool;
if true then the CPU specific timer implementation is used,
otherwise it uses std::time.
There is an additional method elapsed_and_update
, which restarts
the timer as well as returning the elapsed time, in a single
operation.
DeltaTimer
The [DeltaTimer] allows for recording the delta in CPU ticks between the entry to a region of code and the exit from it. It uses a generic UseAsm bool.
let mut t = DeltaTimer::<true>::default();
t.start();
// do something! - timed using CPU ticks
t.stop();
println!("That took {} cpu 'ticks'", t.value());
let mut t = DeltaTimer::<false>::default();
t.start();
// do something! - timed using std::time
t.stop();
println!("That took {} nanoseconds", t.value());
AccTimer
Frequently one will want to repeatedly time a piece of code, to attain an average, or to just accumulate the time taken in some code whenever it is called to determine if it is a 'hotspot'. The [AccTimer] accumulates the time delta between start and stop.
let mut t = AccTimer::<true>::default();
for i in 0..100 {
t.start();
// do something!
t.stop();
println!("Iteration {i} took {} ticks", t.last_delta());
}
println!("That took an average of {} ticks", t.acc_value()/100);
AccArray
An [AccArray] is used to accumulate timer values, storing not just the times but also (optionally) the number of occurrences.
It is used as AccVec<A, T, C, N>
; A is a bool; T the time accumulator type; C the counter type; N the number of accumulators.
-
A is true if the CPU-specific timer should be used, false if std::time should be used
-
T is the type used for accumulating time deltas (u8, u16, u32, u64, u128, usize, f32, f64, or () to not accumulate times)
-
C is the type used for counting occurrences (u8, u16, u32, u64, u128, usize, f32, f64, or () to not count occurrences)
-
N can be any usize; the space for the occurrence accumulators and counters is statically held within the type, so N effects the size of the AccArray
The array can be cleared - clearing the accumulators.
A use is to first invoke start
and then later acc_n
with a
specific index which identifies the code just executed; the time
elapsed since the last start is accumulated and the occurrences
counted.
AccVec
An [AccVec] is a less static version of [AccArray], using an array
backed by a Vec
. It has the same methods, and additional push
related methods.
Trace
The [Trace] type supports tracing the execution path through some logic, getting deltas along the way
let mut t = Trace::<true, u32, 3>::default();
t.start();
// do something!
t.next();
// do something else!
t.next();
// do something else!
t.next();
println!("The three steps took {:?} ticks", t.trace());
The trace will have three entries, which are the delta times for the three operations.
AccTrace
The [AccTrace] accumulates a number of iterations of a Trace;
struct MyThing {
// things ...
/// For timing (perhaps only if #[cfg(debug_assertions)] )
acc: AccTrace::<true, u32,4>,
}
impl MyThing {
fn do_something_complex(&mut self) {
self.acc.start();
// .. do first complex thing
self.acc.next();
// .. do second complex thing
self.acc.next();
// .. do third complex thing
self.acc.next();
// .. do fourth complex thing
self.acc.next();
self.acc.acc();
}
}
let mut t = MyThing { // ..
acc: AccTrace::<true, u32, 4>::default()
};
for _ in 0..100 {
t.do_something_complex();
}
println!("After 100 iterations the accumulated times for the four steps is {:?} ticks", t.acc.acc_trace());
t.acc.clear();
// ready to be complex all again
The trace will have four entries, which are the accumulated delta times for the four complex things.
OS-specific notes
These outputs are generated from tests/cpu_timer.rs, test_timer_values
The tables will have a rough granularity of the precision of the tick counter. Average time taken is calculated using the fastest 95% of 10,000 calls, as beyond that the outliers should be ignored.
MacOs aarch64 (MacBook Pro M4 Max Os15.1 rustc 1.84
The granularity of the clock appears to be 41 or 42 ticks, and the asm implementation seems to match the std time implementation for this precision.
For asm, the average time taken for a call is 3 ticks in release, 9 ticks in debug
For std::time, the average time taken for a call is 8 ticks in release, 17 ticks in debug. So clearly there is an overhead for using std::time
%age | arch release | arch debug | std debug | std release |
---|---|---|---|---|
10 | 0 | 0 | 41 | 0 |
25 | 0 | 0 | 42 | 0 |
50 | 0 | 0 | 42 | 0 |
75 | 0 | 41 | 83 | 41 |
90 | 42 | 41 | 83 | 41 |
95 | 42 | 41 | 83 | 41 |
99 | 42 | 42 | 84 | 42 |
100 | 27084 | 2498 | 2166 | 1125 |
MacOs x86_64
MacBook Pro 2018 Os 15.0 rustc 1.84 2.2GHz i7
The granularity of the clock appears to be 2 ticks, and the asm implementation is better than using the std::time implementation
The average time taken for a call is 15 ticks in release, 78 (but sometimes 66!) ticks in debug
%age | arch release | arch debug | std debug | std release |
---|---|---|---|---|
10 | 12 | 62 | 72 | 38 |
25 | 12 | 64 | 74 | 38 |
50 | 12 | 64 | 79 | 39 |
75 | 14 | 66 | 81 | 39 |
90 | 14 | 68 | 83 | 39 |
95 | 14 | 70 | 83 | 40 |
99 | 16 | 82 | 132 | 41 |
100 | 42918 | 65262 | 17101 | 24560 |