137 releases
0.21.9 | Jan 8, 2025 |
---|---|
0.21.8 | Dec 5, 2024 |
0.21.7 | Sep 23, 2024 |
0.21.6 | Jul 24, 2024 |
0.2.9 | Mar 28, 2019 |
#986 in Machine learning
11,866 downloads per month
Used in 38 crates
(2 directly)
1MB
26K
SLoC
tract-linalg
linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.
Functions
- MatMatMul: Extended matrix*matrix product:
- inspired by Gotoblass and BLIS micro kernel approach
- extended for convolution friendly addressing (fused img2col)
- fused output pipeline (min, max, and a few more simple, fast ops)
- f32*f32 -> f32 (à la sgemm)
- i8*i8 -> i32 accumulator -> i32 storage
- i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
- f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
- byte-to-byte lookup table
Implementations
generic fallback | armv6, vfp | armv7 neon | armv8 simd | x64 FMA | |
---|---|---|---|---|---|
MatMatMul f32 | 4x4 | 8x4 | 8x8 | 16x6 | |
MatMatMul i8->i8 | 8x4 | 8x8 | |||
MatMatMul i8->i32 | 8x8 | ||||
sigmoid f32 | 4n | 4n | |||
tanh f32 | 4n | 4n | |||
byte lookup |
Dependencies
~10–18MB
~248K SLoC