#markup-language #parser #html #markdown-parser #kami

bin+lib kami-parser

Kami tries to be a machine-first human-also-first markup language

22 releases (7 breaking)

0.8.1 Aug 5, 2022
0.7.0 Aug 2, 2022
0.6.4 Jul 26, 2022

#2589 in Parser implementations

Download history 3/week @ 2024-09-17 55/week @ 2024-09-24

57 downloads per month

Custom license

50KB
1K SLoC

KAMI

Kami Logo

Kami is a machine-first human-also-first markup language, designed not for human readability, but for usability as an intermediary between our design desires and a machine's logic. Kami takes inspiration from Pandoc's Markdown flavor, and Textile, the latter of which is my favorite markup language. Its name stands for Katie Ampersand's Markup Instrument. I chose Instrument and not Language just becaues KAMI is a very cute acronym.

Kami files should use the .km extension, as in file.km.

Usage (In A Rust Crate)

The only public method is syntax::parse(), which takes in a string of characters and outputs an HTML string.

use kami_parser::syntax;

fn main() {
	println!("{}", syntax::parse("*bold text*")); // <b >bold text</b>
}

Philosophy

Seeing the similarities between Markdown and Kami, you might wonder why I'd bother making this. The reason is simple: Markdown is too human-centric. Of course, there is no one Markdown flavor, but the ones I've seen just focus too much on being something you can guess and read, and not something you can use. It's not necessarily a bad goal, but it's not one that works well with the way I like my things to function, as I've found it too limiting.

Kami is designed under the idea that markup languages are not like programming languages in the ways that they achieve their goals. A programming language is designed for humans, if it was more machine-centric you'd obtain an assembly language, or something closer to it. A markup language should be somewhere in the middle. HTML sucks to use because of how machine-centric it is, but Markdown or Textile can be limiting because they focus too much on human readability and use-cases. In a world of these extremes, Kami tries to stand in the middle.

(This also leads me to wonder what something more in the middle of programming langauges and assembly code would look like)

Human readability is still a goal, but it's a goal that should never limit what can be done. Kami is strictly an intermediate between thoughts and HTML, and is not meant to be read by anyone, which, to me, is what markup languages should strive for.

Kami is ultimately meant to fulfill my needs, it is what I want out of a Markup language, and while it tries to be for general use, it doesn't always care if it isn't. It might be worse for your specific goals, and that's okay. That's why there's so many markup languages - I'm simply not the first one to be dissatisfied with what already exists.

Specification

Bold italians and strong empaths (Bold, Italic, Strong and Emphasis)

Kami distinguishes Bold from Strong and Italic from Emphasis. This is because screen readers care about the distinction. Bold is surrounded with asterisks (*) and Italics are surrounded by underscores (_). Strong and emphasis are the same, just doubled (** and __). I used to struggle remembering this (Textile does it), so I came up with the mnemonic you see in the title of this section. I just memorized it along with this sequence * _ ** __. Hopefully that can help you too.

Subscript and Superscript

Subscript is surrounded with ~ and superscript is surrounded with a ^. They can contain spaces.

Underline and Strikethrough

Strikethrough text is surrounded by dashes (-) and underlined text is surrounded by double dashes (-). This syntax is subject to change because I think it sucks.

Hyperlinks use markdown format: [Visible text](destination). The visible text part can contain any other inline tokens (like bold or images)

Images

Images are surrounded with exclamation marks, like this: !example.png.

To give them a hyperlink, simply put them in a Kami hyperlink like this: [!img.png!](example.net)

And to give them an alt text, simply give them an attribute: !img.png!{alt="A monkey eating a burrito as the sun illuminates them, making them look angelic"}

Spans

Spans are to be surrounded with at signs (@).

Inline Code

Inline code is to be surrounded with backticks, as is done in Markdown.

Headers

Headers are done the same way as in Markdown, with sequences of hashtags (#).

Lists

Unordered lists are marked with a * at the beginning of a paragraph. The space after the asterisk is important, and is part of the token. Ordered lists are marked with a #. at the beginning of the paragraph. The space after the dot is part of the token.

To nest lists inside each other, simply add more asterisks or hashtags, for example:

* Main list element
** Sublist element

Lists can be arbitrarily nested, which means you can nest an ordered list inside an unordered list and vice versa, in whatever configuration you wish, as many times as you wish.

Attributes

Everything mentioned here can have id, class, and any HTML attribute you might care about. Simply do this {#id .class1 .class2 attribute="value"} after the affected part, without a space in between. Note that tokens that have spaces as their last character (like in the case of lists) don't get that space removed. They keep that space.

For example, **text**{#hey} would be parsed into <strong id="hey">text</strong>, and [link](ampersandia.net){rel="me"} would be parsed into <a href="ampersandia.net" rel="me">link</a>.

To give attributes to a paragraph simply start the paragraph with an attribute sequence.

To give attributes to format blocks (like the <ul> <ol> parts of lists) just put an attribute sequence before any of the elements of the block, like this:

{#id .class)
* list element

Inline HTML

Inline HTML is done simply by writing HTML in the file. If a line starts with an HTML tag, the line will not be treated as a paragraph (it won't be surrounded by the HTML <p> tag). If you want it to be surrounded, just add an empty attribute sequence at the beginning of the line.

To make a line not be treated as a paragraph even if it won't have HTML tags, just make it start with a <>.

<title></title>
{} <iframe>
<> text

Would be parsed into

<title></title>
<p><iframe></p>
text

Escaping

Every token can be escaped with backslashes. Backslashes can be escaped, too. Escaping will make it so that the parser interprets a character as being just a character, and not a token. You can also escape entire inline sections by surrounding them \=like this=.

Tables

KAMI tables are, for the most part, quite simple.

| Data         | More data  | Some other data |
| Lots of data | You get it |                 |

(nvim forced me to use spaces for indentation and I just kinda allowed it, but you prettify your tables with tabs. Spaces will be put in the final output)

You can make a cell be a header by starting it with |*. Any cell can be a header, not only the top ones. This allows for vertical tables.

You can set a cell's colspan and rowspan with |cXrY, where X is colspan and Y is rowspan. If you only want rowspan, only do |rY, and if you only want colspan, do |cX. cXrY is as valid as rXcY.

You can set a cell's attributes like this |{attr}.

Attributes, rowspan, colspan and the header mark can go in any order|r5*{#id}c1 is a valid cell starter. Just try to make them readable for yourself. I personally do |rXcY*{attrs}.

To put attributes on a row, put an attribute sequence after the last cell in the row.

To put attributes on a table, put an attribute sequence before the table starts, as you would do with lists.

Dependencies

~0.5–1MB
~25K SLoC