8 releases

0.1.7 Sep 8, 2020
0.1.6 Jul 3, 2019
0.1.5 Jun 26, 2019

#390 in Authentication

Download history 542/week @ 2024-07-20 481/week @ 2024-07-27 417/week @ 2024-08-03 667/week @ 2024-08-10 500/week @ 2024-08-17 571/week @ 2024-08-24 670/week @ 2024-08-31 987/week @ 2024-09-07 828/week @ 2024-09-14 945/week @ 2024-09-21 1049/week @ 2024-09-28 857/week @ 2024-10-05 647/week @ 2024-10-12 631/week @ 2024-10-19 661/week @ 2024-10-26 394/week @ 2024-11-02

2,419 downloads per month
Used in internment

MIT/Apache

455KB
33K SLoC

Rust 16K SLoC JavaScript 16K SLoC Python 187 SLoC // 0.3% comments

Crates.io version Documentation

Memorable wordlist

This is a list of words that are intended to be memorable (and unobjectionable) for use in correct horse battery staple style passphrase generators. The word list is generated based on a number of sources using a simple python script, and is intended to be something that will be improved upon.

The word list is provided as a rust crate, which also includes a few simple passphrase generators using the standard cryptographically secure random number generator. See the documentation for details.

Process

The alogrithm to generate the list is in memorable-wordlist.py. Basically, I came up with a hokey value for words based on wanting words to be short, familiar, not bad, and ideally either concrete (meaning something you can visualize) or exciting. I expect I can do better, and intend to do so over time. Patches are most welcome, as well as bug reports (e.g. if two words are significantly out of order on a subjective basis).

Note that my set of sources is the same as that of the EFF wordlist but that list does not publish its algorithm, and the algorithm used is obviously screwed up, since neither dog nor cat appear in the lists. What could be more concrete than those?

Sources

The memorable-wordlist.py script downloads and uses a variety of lexical data to decide which words will be most appropriate. Several of the sources come from lexical research data compiled by The Center for Reading Research at Ghent. This data includes word ratings of age of acquisition, concreteness, valence and arousal. There is also a file full of word prevalence data (i.e. what fraction of people know the words).

We further use the frequency tables from the New General Service List, which is a set of words for English language learners to learn. In addition, we use a list of word frequencies compiled from the OpenSubtitles database.

Finally, we download a list of "bad words" which might be considered offensive in randomly generated output from Luis von Ahn's research croup. Note that some of these aren't actually bad words, but are words that in combination might seem bad or irreverent.

Example output with 44 bits entropy

space_delimited:

jaw carrot granddad scale
walrus stage sunshine flashlight
caramel drawer door snout
giant field rabbit handbook
actress toffee cola hear
tip pianist bike engineer
running house steamboat cash
one fossil leather waist
mail date castle quiz

snake_case:

deodorant_patch_alarm_steak
pig_butler_jukebox_cod
helmet_hockey_photographer_cushion
tweezers_snowy_sandwich_motel
square_neck_school_engine
buyer_cookie_treasure_telescope
bourbon_chick_wrap_paintbrush
schoolgirl_horizon_pupil_blender
laundry_oak_pint_job

kebab_case:

dining-equipment-bank-yogurt
jacket-boot-tulip-rodeo
poultry-tower-vegetable-backyard
porridge-mist-twig-eyelash
meatball-strawberry-chamber-gunshot
boot-thigh-news-post
office-otter-skirt-cent
town-magazine-menu-cracker
lamp-vinegar-laser-butterfly

camel_case:

StarDeodorantInsectNostril
CombMoustacheFountainPlum
HillHandshakeChairShadow
LaneOnionManagerPlayer
RavenPeakHandwritingInk
PickMeatballBottlePath
GroomHeartRackDuckling
LeverPatternRibFirefly
SnoutSleighThreadCarrot

Dependencies

~1.4–2MB
~37K SLoC