POM homepage

POM Language Specification, v. 0.1.0

Introduction

POM is a “markup” language, primarily intended for software configuration, and designed to be easy to parse without any third-party libraries (e.g. for looking up Unicode classes or matching regular expressions), while still being terse and legible. The POM specification is quite strict, to avoid cases where dubious files can be accepted by some parsers while being rejected by others.

Every file describes a configuration, which is a mapping from keys to values. A key is a string consisting of components separated by dots (.). A value is a string whose interpretation is entirely decided by the application. Notably there is no distinction in POM’s syntax between, say, the number 5 and the string 5. A configuration, such as the one obtained from the POM file

[ingredients.sugar]
amount = 50 g
type = brown

[ingredients.flour]
amount = 100 g
type = all-purpose

[baking]
temperature = 150 °C
time = 35 min

can either be seen as a simple mapping from keys to values:

KeyValue
ingredients.sugar.amount50 g
ingredients.sugar.typebrown
ingredients.flour.amount100 g
ingredients.flour.typeall-purpose
baking.temperature150 °C
baking.time35 min

or a tree of keys, with a value associated to each leaf node:

(root) ingredients sugar type brown amount 50 g flour type all-purpose amount 100 g baking temperature 150 °C time 35 min

Error handling

All error conditions are described in this specification. A general-purpose POM parser should not reject a file in any other case, outside of exceptional circumstances such as running out of memory. When an error occurs, it should be reported, ideally with information about the file name and line number, and the file must be entirely rejected (i.e. parsers must not attempt to preserve only the correct parts of an erroneous file). Warnings may also be issued according to the judgment of the parser author.

Text encoding

All POM files are encoded using UTF-8. Both LF and CRLF line endings may be used (see below). If invalid UTF-8 is encountered, including overlong sequences and UTF-16 surrogate halves (U+D800-DFFF), an error occurs.

Valid keys/values

Keys in a POM file may contain the following characters

A non-empty string containing only these characters is a valid key if and only if it does not start or end with a dot and does not contain two dots in a row (..).

Any string of non-zero Unicode scalar values (U+0001–10FFFF, but not U+D800–U+DFFF) is a valid value.

Parsing

If a “byte order mark” of EF BB BF appears at the start of the file, it is ignored. Every carriage return character (U+000D) which immediately precedes a line feed (U+000A) is deleted. Then, if any control characters in the range U+0000 to U+001F other than the line feed and horizontal tab (U+0009) are present in the file, an error occurs.

The current-section is a string variable which should be maintained during parsing. It is initally equal to the empty string.

An accepted-space is either a space (U+0020) or horizontal tab (U+0009) character.

Parsing now proceeds line-by-line, with lines being delimited by line feed characters. For each line:

  1. Any accepted-spaces that appear at the start of the line are removed.
  2. If the line begins with #%disable warnings or #%enable warnings, warnings should be disabled/enabled if any are implemented.
  3. If the line is empty or begins with #, parsing proceeds to the next line.
  4. If the line begins with [, it is interpreted as a section header. In this case:
    1. If the line does not end with ] optionally succeeded by any number of accepted-spaces, an error occurs.
    2. The current-section is set to the text in between the initial [ and final ] (white space after the [ and before the ] is not trimmed).
    3. If the new current-section is not empty and not a valid key (see above), an error occurs.
  5. Otherwise, the line is now interpreted as a key-value assignment. In this case:
    1. If the line does not contain an equal sign (=), an error occurs.
    2. The relative-key is the text preceding the =, not including any space or horizontal tab characters immediately before the =.
    3. If the relative-key is not a valid key (see above), an error occurs.
    4. Let c be the first character after the = and any succeeding accepted-spaces.
    5. If c is " (U+0022 QUOTATION MARK) or ` (U+0060 GRAVE ACCENT), the value is quoted. In this case,
      1. The value spans from the first character after c to (but not including) the next unescaped instance of c in the file (which may be on a different line).
      2. Escape sequences in the value are replaced as described below.
      3. After the closing c, there may be any number of accepted-spaces, then a line feed or the end of the file must follow; otherwise, an error occurs.
    6. Otherwise, the value is unquoted. In this case,
      1. accepted-spaces at the end of the line are removed.
      2. The value is the exact text starting from c and going to the end of the line (escape sequences are not processed).
    7. If the value is not a valid value (i.e. it contains a null character), an error occurs.
    8. The key is equal to the relative-key if the current-section is empty; otherwise, it is equal to the concatenation of current-section, a dot, and the relative-key.
    9. The key is assigned to the value.

Escape sequences

POM defines the following escape sequences, which may appear in quoted values. If a backslash character occurs in a quoted value but does not form a defined escape sequence, an error occurs.

Escape sequenceValue
\nLine feed (U+000A)
\rCarriage return (U+000D)
\tHorizontal tab (U+0009)
\\Literal \ (U+005C)
\"Literal " (U+0022)
\'Literal ' (U+0027)
\`Literal ` (U+0060)
\,Literal \, (U+005C U+002C)
\xNM ASCII character with code NM,
interpreted as hexadecimal
(must be in the range 01–7F).
\u{digits} Unicode code point digits,
interpreted as hexadecimal.
digits must be 1–6 characters long,
and may contain leading zeros,
but must not be zero and
must not be a UTF-16 surrogate
half D800–DFFF.

Lists

Although POM does not have a way of specially designating a value as being a list, there is a recommended syntax for encoding them. Specifically, a value can be treated as a list by first splitting it into comma-delimited parts, treating \, as a literal comma in a list entry and \\ as a literal backslash, then removing any accepted-spaces and line feeds surrounding list entries.

List entries may be empty, but if the last entry in a list is empty, it is removed (if there are two or more empty entries at the end of a list, only one is removed). As a consequence, an empty string is considered to be an empty list.

If a list’s order is irrelevant and it might be large or benefit from labelling its entries, a key prefix should be used instead (see the ingredients “list” in the opening example).

Examples

The following lines describe 3-entry lists.

POM lineEntry 1Entry 2Entry 3
fonts = monospace, sans-serif, serif monospace sans-serif serif
files = " foo.txt, weird\,name,z " foo.txt weird,name z
things = \,,,76 , 76
empties = ,,,
escapees = \\,\a,\, \ \a ,

Merging configurations

A configuration B can be merged into another configuration A by parsing both of them and setting the value associated with a key k to be

  1. The value associated with k in B, if any.
  2. Otherwise, the value associated with k in A, if any.

(Likewise, an ordered series of configurations A1, …, An can be merged by merging A2 into A1, then A3 into the resulting configuration, etc.)

This is useful, for example, when you want to have a global configuration for a piece of software installed on a multi-user machine where individual settings can be overriden by each user (in this case, the user configuration would be merged into the global configuration).

API recommendations

The following functions are recommended in any general-purpose library for parsing POM files. Their exact names/signatures can be changed to fit the style of the language. The main important point here is that the functions get_int, get_uint, get_float, get_bool must accept exactly the format described below for integers/floating-point numbers/booleans (otherwise changing between libraries could subtly change which configurations are valid).

Examples

This section lists some examples of POM files. For more examples, see the tests/ directory in the main POM repository.

All syntax

This is a configuration which demonstrates almost all of the syntactic forms of POM.

title = 'Crème brûlée'
0-*/_description_/*-0 =`A 'beautiful' crème br\u{FB}l\u{0000e9}e recipe
that\'s sure to delight your friends!`
author == `Jean\0\\"P." D'Martingale
[ingredients.flour]
	quantity= "100 g"
	type="all-purpose"
[ingredients.sugar]
	quantity		=	   50 g
	type = "br\x6f\u{77}n"
[ingrédients]
	œufs.quantité=3
	œufs.type = "extra large\,farm fresh\\,free-range"
[]
DIRECTIONS.en_CA.version.5 = "
1. Separate the egg yolks from the \"whites\".
2. Mix the yolks in a bowl with the sugar.
…
59. Enjoy!
"

This configuration has the following mapping of keys to values:

KeyValue
title'Crème brûlée'
0-*/_description_/*-0A 'beautiful' crème brûlée recipe
that's sure to delight your friends!
author= `Jean\0\\"P." D'Martingale
ingredients.flour.quantity100 g
ingredients.flour.typeall-purpose
ingredients.sugar.quantity50 g
ingredients.sugar.typebrown
ingrédients.œufs.quantité3
ingrédients.œufs.typeextra large\,farm fresh\,free-range
DIRECTIONS.en_CA.version.5
1. Separate the egg yolks from the "whites".
2. Mix the yolks in a bowl with the sugar.

59. Enjoy!

Configuration for a text editor


indentation-type = tabs
show-line-numbers = yes
tab-size = 4
font-size = "18"

[file-extensions]
C = .c
Cpp = .cpp, .h, .hpp

[plug-in.edit-over-ssh]
path = ~/misc/edit-over-ssh.so
enabled = yes

[plug-in.edit-over-ssh.settings]
favourite-host = my-web-server

[plug-in.edit-over-ssh.settings.hosts.my-web-server]
address = example.org
port = 22
ssh-key = ~/.ssh/id_ed25519

This configuration has the following mapping of keys to values:

KeyValue
indentation-typetabs
show-line-numbersyes
tab-size4
font-size18
file-extensions.C.c
file-extensions.Cpp.cpp, .h, .hpp
plug-in.edit-over-ssh.path~/misc/edit-over-ssh.so
plug-in.edit-over-ssh.enabledyes
plug-in.edit-over-ssh.settings.favourite-hostmy-web-server
plug-in.edit-over-ssh.settings.hosts.my-web-server.addressexample.org
plug-in.edit-over-ssh.settings.hosts.my-web-server.port22
plug-in.edit-over-ssh.settings.hosts.my-web-server.ssh-key~/.ssh/id_ed25519

Errors

This section lists some erroneous lines that might appear in a POM file:


# Invalid key character '!'
cool-key! = 23
# Invalid key character ' '
fun times = yes
# Missing equals
music is on
# No closing ]
[my.section
# Invalid key character ' '
[ my.section ]
# Invalid escape sequence "\?"
no_trigraph = "a?\?=b"
# Invalid escape sequence "\xCE" — even though "\xCE\x92" is valid UTF-8.
#     ("\u{392}" should be used instead)
capital_beta = "\xCE\x92"
# Invalid escape sequence "\x00" / Invalid character in value (null character)
C_string = "Hello, world!\x00"
# Stray characters after closing "
name = "Andy" B
# Duplicate key 'tab-size'
tab-size = 4
tab-size = 8