23 releases (13 breaking)

0.18.0 Oct 10, 2023
0.17.2 Dec 12, 2021
0.16.0 Jul 20, 2021
0.15.0 Feb 22, 2021
0.5.1 Mar 24, 2019

#405 in Parser implementations

Download history 123/week @ 2024-09-11 102/week @ 2024-09-18 194/week @ 2024-09-25 106/week @ 2024-10-02 120/week @ 2024-10-09 165/week @ 2024-10-16 196/week @ 2024-10-23 162/week @ 2024-10-30 67/week @ 2024-11-06 67/week @ 2024-11-13 198/week @ 2024-11-20 144/week @ 2024-11-27 141/week @ 2024-12-04 166/week @ 2024-12-11 97/week @ 2024-12-18 4/week @ 2024-12-25

427 downloads per month
Used in 8 crates (6 directly)

MIT/Apache

290KB
6.5K SLoC

Introduction

crates.io docs.rs Travis CI

finalfusion is a crate for reading, writing, and using embeddings in Rust. finalfusion primarily works with its own format which supports a large variety of features. Additionally, the fastText, floret, GloVe, and word2vec file formats are also supported.

finalfusion is API stable since 0.11.0. However, we cannot tag version 1 yet, because several dependencies that are exposed through the API have not reached version 1 (particularly ndarray and rand). Future 0.x releases of finalfusion will be used to accomodate updates of these dependencies.

Heads-up: there is a small API change between finalfusion 0.11 and 0.12. The Error type has been moved from finalfusion::io to finalfusion::error. The separate ErrorKind enum has been merged with Error. Error is now marked as non-exhaustive, so that new error variants can be added in the future without changing the API.

Usage

To make finalfusion available in your crate, simply place the following in your Cargo.toml

finalfusion = 0.16

Loading embeddings and querying it is as simple as:

use std::fs::File;
use std::io::BufReader;

use finalfusion::prelude::*;

fn main() {
    let mut reader = BufReader::new(File::open("embeddings.fifu").unwrap());
    let embeds = Embeddings::<VocabWrap, StorageWrap>::read_embeddings(&mut reader).unwrap();
    embeds.embedding("Query").unwrap();
}

Features

finalfusion supports a variety of formats:

  • Vocabulary
    • Subwords
    • No subwords
  • Storage
    • Array
    • Memory-mapped
    • Quantized
  • Format

Moreover, finalfusion provides:

  • Similarity queries
  • Analogy queries
  • Quantizing embeddings through reductive
  • Conversion to the following formats:
    • finalfusion
    • word2vec
    • GloVe

For more information, please consult the API documentation.

Getting embeddings

Embeddings trained with finalfrontier starting with version 0.4 are in finalfusion format and compatible with his crate. A growing set of pretrained embeddings is offered on our website and we have converted the fastText Wikipedia and Common Crawl embeddings to finalfusion. More information can also be found at https://finalfusion.github.io.

Which type of storage should I use?

Quantized embeddings

Quantized embeddings store embeddings as discrete representations. Imagine that for a given embeddings space, you would find 256 prototypical embeddings. Each embedding could then be stored as a 1-byte pointer to one of these prototypical embeddings. Of course, having only 256 possible representations, this quantized embedding space would be very coarse-grained.

product quantizers (pq) solve this problem by splitting each embedding evenly into q subvectors and finding prototypical vectors for each set of subvectors. If we use 256 prototypical representations for each subspace, 256^q different word embeddings can be represented. For instance, if q = 150, we could represent 250^150 different embeddings. Each embedding would then be stored as 150 byte-sized pointers.

optimized product quantizers (opq) additionally applies a linear map to the embedding space to distribute variance across embedding dimensions.

By quantizing an embedding matrix, its size can be reduced both on disk and in memory.

Memory mapped embeddings

Normally, we read embeddings into memory. However, as an alternative the embeddings can be memory mapped. Memory mapping makes the on-disk embedding matrix available as pages in virtual memory. The operating system will then (transparently) load these pages into physical memory as necessary.

Memory mapping speeds up the initial loading time of word embeddings, since only the vocabulary needs to be read. The operating system will then load (part of the) embedding matrix a by-need basis. The operating system can additionally free up the memory again when no embeddings are looked up and other processes require memory.

Empirical comparison

The following empirical comparison of embedding types uses an embedding matrix with 2,807,440 embeddings (710,288 word, 2,097,152 subword) of dimensionality 300. The embedding lookup timings were done on an Intel Core i5-8259U CPU, 2.30GHz.

Known lookup and Unknown lookup time lookups of words that are inside/outside the vocabulary. Lookup contains a mixture of known and unknown words.

Storage Lookup Known lookup Unknown lookup Memory Disk
array 449 ns 232 ns 18 μs 3213 MiB 3213 MiB
array mmap 833 ns 494 ns 23 μs Variable 3213 MiB
opq 40 μs 21 μs 962 μs 402 MiB 402 MiB
opq mmap 41 μs 21 μs 960 μs Variable 402 MiB

Note: two units are used: nanoseconds (ns) and microseconds (μs).

Using a BLAS or LAPACK library

If you are using finalfusion in a binary crate, you can compile ndarray with BLAS support to speed up certain functionality in finalfusion-rust. In order to do so, enable the ndarray/blas feature and add one of the following crates as a dependency to select a BLAS/LAPACK implementation:

  • netlib-src: Use reference BLAS/LAPACK (slow, not recommended)
  • openblas-src: Use OpenBLAS
  • intel-mkl-src: Use Intel Math Kernel Library

If you want to quantize an embedding matrix using optimized product quantization, you must enable the the reductive/opq-train feature in addition to adding a BLAS/LAPACK implementation.

The Cargo.toml file of finalfusion-utils can be used as an example of how to use BLAS in a binary crate.

Example: embedding lookups in quantized matrices

Embedding lookups in embedding matrices that were quantized using the optimized product quantizer can be speeded up using a good BLAS implementation. The following table compares lookup times on an Intel Core i5-8259U CPU, 2.30GHz with finalfusion compiled with and without MKL/OpenBLAS support:

Storage Lookup Known lookup Unknown lookup
opq 40 μs 21 μs 962 μs
opq mmap 41 μs 21 μs 960 μs
opq (MKL) 14 μs 7 μs 309 μs
opq mmap (MKL) 14 μs 7 μs 309 μs
opq (OpenBLAS) 15 μs 7 μs 336 μs
opq mmap (OpenBLAS) 15 μs 7 μs 342 μs

Where to go from here

Dependencies

~6MB
~113K SLoC