#machine-learning #data #machine #learning #cifar #ten

cifar-ten

Parses the CIFAR-10 dataset binary files, with methods for downloading and ndarray conversion

8 releases (5 breaking)

0.6.0 Jan 7, 2025
0.5.1 Oct 13, 2022
0.5.0 Apr 1, 2022
0.4.0 Feb 26, 2022
0.1.0 Sep 20, 2020

#372 in Machine learning

Download history 28/week @ 2024-09-25 3/week @ 2024-10-09 4/week @ 2024-10-16 4/week @ 2024-10-30 3/week @ 2024-11-06 3/week @ 2024-12-04 10/week @ 2024-12-11 91/week @ 2025-01-01 31/week @ 2025-01-08

122 downloads per month
Used in tsuga

MIT license

21KB
334 lines

crates.io Documentation CI

cifar-ten

This library parses the binary files of the CIFAR-10 data set and returns them as a tuple struct

  • CifarResult: (Vec<u8>, Vec<u8>, Vec<u8>, Vec<u8>) which is organized as (train_data, train_labels, test_data, test_labels)

Convenience methods for converting these to the Rust ndarray numeric arrays are provided using the to_ndarray_0xx feature flag, as well as for automatically downloading binary training data from a remote url.

// $ cargo build --features=download,to_ndarray_015
use cifar_ten::*;

fn main() {
    let (train_data, train_labels, test_data, test_labels) = Cifar10::default()
        .download_and_extract(true)
        .encode_one_hot(true)
        .build()
        .unwrap()
        .to_ndarray::<f32>()
        .expect("Failed to build CIFAR-10 data");
}

Various ndarray versions can be used with the following feature flags:

version feature flag
0.16 to_ndarray_016
0.15 to_ndarray_015
0.14 to_ndarray_014
0.13 to_ndarray_013

A tar.gz file with the original binaries can be found here. The crate's author also provides several ML data mirrors here which are used for running tests on this library. Please feel free to use, but should you expect to make heavy use of these files, please consider creating your own mirror.

If you'd like to verify that the correct images and labels are being provided, the examples/preview_images.rs file using show-image to preview a RGB representation of a given image with the corresponding one-hot formatted label.

Note: Early commits included the dataset, which will make the download size large. For development, it's suggested to clone using

$ git clone --depth=1 https://github.com/quietlychris/cifar-ten.git

Dependencies

~0.2–11MB
~134K SLoC