2 unstable releases
0.2.0 | May 15, 2024 |
---|---|
0.1.0 | May 15, 2024 |
#1013 in Math
32 downloads per month
32KB
522 lines
markovgen
A library for building markov chain graphs from text datasets and performantly generating text sequences by traversing them.
Features
- Simple API for building and traversing graphs
- Configurable minimum sequence length
- An example CLI application (markovcli) that supports building graphs from datasets and writing them to the disk, as well as sampling such graphs with customizable sequence length.
- Capable of generating >2.5 million names per second with default settings (and cli_no_print feature set to avoid IO overhead) on my machine from the first_names benchmark dataset (see
benches/
) - Try it using
cargo run -r -F serde --bin markovcli
- Capable of generating >2.5 million names per second with default settings (and cli_no_print feature set to avoid IO overhead) on my machine from the first_names benchmark dataset (see
Example
src/bin/example.rs
:
use std::sync::Arc;
use markovgen::*;
const NAME_DATASET: &str = "Tim\nTom\nThomas\nNathan\nNina\nTiara\nTyra\nTyrone";
const SEQUENCE_START: char = '\x01';
const SEQUENCE_END: char = '\x02';
fn main() {
let mut constructor = GraphConstructor::new();
NAME_DATASET.lines().for_each(|l| {
l.chars().fold(SEQUENCE_START, |acc, x| {
constructor.register_sequence(acc, x);
x
});
constructor.register_sequence(l.chars().last().unwrap(), SEQUENCE_END);
});
let graph = Arc::new(constructor.construct());
let mut stepper = GraphStepper::new(
graph,
GraphStepperConfiguration {
start_char: Some(SEQUENCE_START),
min_length: Some(3),
},
)
.unwrap();
// Step until reaching a "dead end" vertex, with a timeout of 16 steps.
println!("{}", stepper.step_until_end_state(16).unwrap());
}
Running this should yield something like:
$ cargo run --bin example
Ninathom
Plans for 1.0
- This was one of my first Rust projects, which I just cleaned up a little. I'll probably be changing the API to make it a little more ergonomic before the 1.0.0 release
- Proper multi-threading support (I think just cloning GraphSteppers and using them in different tasks should already work, but haven't actually tried it)
- Generic implementation to allow for String and char vertices (currently only chars are supported since this fit my original use-case of generating names)
Dependencies
~2–2.8MB
~53K SLoC