7 unstable releases (3 breaking)
0.3.0 | Mar 3, 2024 |
---|---|
0.2.0 | Feb 24, 2024 |
0.1.2 | Feb 17, 2024 |
0.0.8 | Feb 10, 2024 |
0.0.7 | Jan 27, 2024 |
#852 in Text processing
79KB
2K
SLoC
Say It!
String replacements using regex.
Originally based on python pink-accents and primarily developed for ssnt game.
Overview
Provides a way to define a set of rules for replacing text in string. Each rule consists of regex pattern and Tag trait object. The original use case is to simulate mispronounciations in speech accents via text.
See docs.rs documentation for API overview.
Serialized format
Full reference:
(
// Consists of named blocks named "pass" that are applied in top to bottom order
// pass names must be unique. they are used if you want to extend accent
accent: {
// First pass
"words": (
// This optional field instructs all regexes inside this pass to be wrapped in
// regex word boundaries
format: r"\<{}\>",
// Pairs of (regex, tag)
rules: {
// Simplest rule to replace all "windows" words occurences with "spyware"
"windows": {"Literal": "spyware"},
// This replaces word "os" with one of tags, with equal probability
"os": {"Any": [
{"Literal": "Ubuntu"},
{"Literal": "Arch"},
{"Literal": "Gentoo"},
]},
// `Literal` supports regex templating:
// https://docs.rs/regex/latest/regex/struct.Regex.html#example-9
// This will swap "a" and "b" using named and numbered groups
r"(a)(?P<b_group>b)": {"Literal": "$b_group$1"},
},
),
// Second pass
"patterns": (
// Both rules use "(?-i)" which opts out of case insensivity
rules: {
// Lowercases all `P` letters
"(?-i)P": {"Lower": {"Original": ()}},
// Uppercases all `m` letters
"(?-i)m": {"Upper": {"Original": ()}},
},
),
// Third pass. note that ^ and $ may overlap with words at beginning and
// end of strings. These should be defined separately
"ending": (
rules: {
// Selects honks using relative weights. Higher is better
"$": {"Weights": {
32: {"Literal": " HONK!"},
16: {"Literal": " HONK HONK!"},
08: {"Literal": " HONK HONK HONK!"},
// Ultra rare sigma honk - 1 / 56 chance
01: {"Literal": " HONK HONK HONK HONK!!!!!!!!!!!!!!!"},
}},
},
),
},
// Accent can be used with intensity (non negative value). Higher
// intensities can either extend lower level or completely replace it.
// Default intensity (rules above) is 0. Higher ones are defined here
intensities: {
// Extends previous intensity (base one in this case), adding additional
// rules and overwritiong passes that have same names.
1: Extend({
"words": (
format: r"\<{}\>",
rules: {
// Will overwrite "windows" pattern in "main" pass
"windows": {"Literal": "bloatware"},
},
),
// Extend "patterns", adding 1 more rule with new pattern
"patterns": (
name: "patterns",
rules: {
"(?-i)[A-Z]": {"Weights": {
// 50% to replace capital letter with one of the Es
1: {"Any": [
{"Literal": "E"},
{"Literal": "Ē"},
{"Literal": "Ê"},
{"Literal": "Ë"},
{"Literal": "È"},
{"Literal": "É"},
]},
// 50% to do nothing, no replacement
1: {"Original": ()},
}},
},
),
}),
// Replace intensity 1 entirely. In this case with nothing
2: Replace({}),
},
)
See more examples in examples folder.
CLI tool
This library comes with a simple command line tool you can install with:
cargo install sayit --features=cli
Interactive session:
sayit --accent examples/scotsman.ron
Apply to file:
cat filename.txt | sayit --accent examples/french.ron > newfile.txt
Dependencies
~2.7–4.5MB
~78K SLoC