16 releases
0.1.14 | Jul 9, 2024 |
---|---|
0.1.13 | Feb 9, 2024 |
0.1.12 | Jan 30, 2024 |
0.1.8 | Nov 15, 2023 |
0.1.4 | Nov 6, 2022 |
#299 in Parser implementations
83 downloads per month
68KB
1.5K
SLoC
gigtag
A lightweight, textual tagging system aimed at DJs for managing custom metadata.
Structure
A gig tag is a flat structure with the following, pre-defined fields or components:
- Label
- Facet (including an optional calendar date)
- Prop(ertie)s
All components are optional with the following restrictions:
- A valid gig tag must have a label or a facet.
- A valid gig tag with only a facet and neither a label or props is valid, if the facet has a date suffix
Label
A label is a non-empty string that contains arbitrary text without leading/trailing whitespace.
Labels are supposed to be edited by users and are displayed verbatim in the UI.
Examples
Label | Comment |
---|---|
Wishlist |
a single word |
FloorFiller |
multiple words concatenated in PascalCase |
Floor Filler |
multiple words separated by whitespace |
Facet
The same content rules that apply to labels also apply to facets. Moreover facets must not start
with a leading slash /
character that would otherwise interfere with the serialization format (see
below).
Facets serve a different semantic purpose than labels. They are used for categorizing, namespacing or grouping a set of labels or for defining the context of associated properties.
Facets are supposed to represent pre-defined identifiers that are neither editable nor directly displayed in the UI.
Date-like facets
A reserved suffix could be used to encode a calendar date into facets.
Facets that end with a @
character followed by 8 decimal digits are considered as date-like
facets. The digits are supposed to encode an ISO 8601 calendar date without a time zone in the
format yyyyMMdd
.
Facets considered as date-like even if the 8 decimal digits do not encode a valid date. This less restrictive constraints have been chosen deliberately to allow using regular expressions for recognizing date-like facets.
The @
character of the date suffix must follow the preceding text without any intermediate
whitespace. Thus the remaining prefix after stripping the date-like suffix remains a valid facet.
The following regular expressions could be used:
Regex | Description |
---|---|
(^|[^\s])@\d{8}$ |
Recognize date-like facets |
[\s]+@\d{8}$ |
Reject facets with a date-like suffix if preceded by whitespace |
Valid examples
Facet | Description |
---|---|
spotify |
a tag for encoding properties related to Spotify |
@20220625 |
date-like facet without a prefix that denotes the calendar day 2022-06-25 in any time zone |
wishlist@20220625 |
date-like facet with prefix wishlist that denotes the calendar day 2022-06-25 in any time zone |
@00000000 |
date-like facet without a prefix and an invalid date |
abc xyz@99999999 |
date-like facet with prefix abc xyz and an invalid date |
Invalid examples
Facet | Description |
---|---|
played @20220625 |
invalid date-like facet with a prefix containing trailing whitespace before the date-like suffix |
Prop(ertie)s
Custom properties could be attached to tags, abbreviated as props.
Properties are represented as a non-empty, ordered list of name/value pairs.
Names are non-empty strings that contain arbitrary text without leading/trailing whitespace. There are no restrictions regarding the uniqueness of names, i.e. duplicate names are permitted.
Values are arbitrary strings without any restrictions. Empty values are permitted.
Applications are responsible for interpreting the names and values in their respective context. Facets could be used for defining this context.
Serialization
Single tag
Individual tags are encoded as URIs:
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
>authority = [userinfo "@"] host [":" port]
Only the path, query, and fragment components could be present. All other components must be absent, i.e. the URI string must neither contain a scheme nor an authority component.
The following table defines the component mapping:
Tag component | URI component | Percent-encoded character set |
---|---|---|
label | fragment | fragment percent-encode set + '%' |
facet | path | path percent-encode set + '%' |
props (name/value) | query | query percent-encode set + '%' + '&' + '=' |
Tags, respective their URIs, are serialized as text and the components are percent-encoded according to RFC 2396/1738. The above table specifies which characters need to be encoded for each tag component. Property names/values are encoded separately.
Empty components are considered as absent when parsing a gig tag from an URI string.
Examples
The following examples show variations of the encoded string with empty components that are ignored when decoding the URI.
Encoded | Facet | Date | Label | Props: Names | Props: Values |
---|---|---|---|---|---|
#MyTag ?#MyTag |
MyTag |
||||
wishlist@20220625#For%20you |
wishlist@20220625 |
2022-06-25 | For you |
||
played@20220625 played@20220625? played20220625# played@20220625?# |
played@20220625 |
2022-06-25 | |||
audio-features?energy=0.78&valence=0.61 audio-features?energy=0.78&valence=0.61# |
audio-features |
energy valence |
0.78 0.61 |
Examples (invalid)
The following tokens do not represent valid gig tags:
Encoded | Comment |
---|---|
https://#MyTag |
URL scheme/authority are present |
My%20Tag |
Only a facet without a date, neither a label nor props |
/my-facet#Label |
Facet starts with a / |
wishlist%20@20220625#Label |
Date suffix in facet is prefixed by whitespace |
?=val#Label |
Empty property name |
?name=my+val#My label |
Special characters like + and whitespace are not percent-encoded |
# |
Empty label is considered as absent |
? |
Empty facet and props are considered as absent |
?# |
Empty facet, props, and label are considered as absent |
Multiple tags
Formatting
Multiple tags are formatted and stored as text by concatenating the corresponding, encoded URIs. Subsequent URIs are separated by whitespace, e.g. a single ASCII space character.
Retro-fitting
Often it is not possible to store the encoded gig tags in a reserved field. In this case gig tags could appended to any text field by separating them with arbitrary whitespace from the preceding text.
Parsing
Text is split into tokens that are separated by whitespace. Parsing starts with the last token and continues from back to front. It stops when encountering a token that could not be parsed as a valid gig tag.
Retro-fitting
The first token that could not be parsed as a valid gig tag is considered the last token of the preceding text. The preceding text including this token and the whitespace until the first valid gig tag token must be preserved as an undecoded prefix.
When re-encoding the gig tags the undecoded prefix that was captured during parsing must be prepended to the re-encoded gig tags string. This rule ensures that only whitespace characters could get lost during a decode/re-encode roundtrip, i.e. when unintentionally parsing arbitrary words from the preceding text as valid gig tags (false positives).
Storage
File metadata
The text with the encoded gig tags is appended (separated by whitespace) to the Content Group field of audio files:
- ID3v2:
GRP1
(primary/preferred) /TIT11
(traditional/fallback) - Vorbis:
GROUPING
- MPEG-4:
©grp
License
Licensed under the Mozilla Public License 2.0 (MPL-2.0) (see MPL-2.0.txt or https://www.mozilla.org/MPL/2.0/).
Permissions of this copyleft license are conditioned on making available source code of licensed files and modifications of those files under the same license (or in certain cases, one of the GNU licenses). Copyright and license notices must be preserved. Contributors provide an express grant of patent rights. However, a larger work using the licensed work may be distributed under different terms and without source code for files added in the larger work.
Contribution
Any contribution intentionally submitted for inclusion in the work by you shall be licensed under the Mozilla Public License 2.0 (MPL-2.0).
It is required to add the following header with the corresponding SPDX short identifier to the top of each file:
// SPDX-License-Identifier: MPL-2.0
Dependencies
~5–7MB
~142K SLoC