POM is a “markup” language, primarily intended for software configuration, and designed to be easy to parse without any third-party libraries (e.g. for looking up Unicode classes or matching regular expressions), while still being terse and legible. The POM specification is quite strict, to avoid cases where dubious files can be accepted by some parsers while being rejected by others.
Every file describes a configuration, which is a mapping from keys to values.
A key is a string consisting of components separated by dots (.
).
A value is a string whose interpretation is entirely decided by the application.
Notably there is no distinction in POM’s syntax between, say,
the number 5
and the string 5
.
A configuration, such as the one obtained from the POM file
[ingredients.sugar]
amount = 50 g
type = brown
[ingredients.flour]
amount = 100 g
type = all-purpose
[baking]
temperature = 150 °C
time = 35 min
can either be seen as a simple mapping from keys to values:
Key | Value |
---|---|
ingredients.sugar.amount | 50 g |
ingredients.sugar.type | brown |
ingredients.flour.amount | 100 g |
ingredients.flour.type | all-purpose |
baking.temperature | 150 °C |
baking.time | 35 min |
or a tree of keys, with a value associated to each leaf node:
All error conditions are described in this specification. A general-purpose POM parser should not reject a file in any other case, outside of exceptional circumstances such as running out of memory. When an error occurs, it should be reported, ideally with information about the file name and line number, and the file must be entirely rejected (i.e. parsers must not attempt to preserve only the correct parts of an erroneous file). Warnings may also be issued according to the judgment of the parser author.
All POM files are encoded using UTF-8. Both LF and CRLF line endings may be used (see below). If invalid UTF-8 is encountered, including overlong sequences and UTF-16 surrogate halves (U+D800-DFFF), an error occurs.
Keys in a POM file may contain the following characters
a
–z
, A
–Z
,
0
–9
, as well as each of
./-*_
.
A non-empty string containing only these characters is a valid key
if and only if it does not start or end with a dot
and does not contain two dots in a row (..
).
Any string of non-zero Unicode scalar values (U+0001–10FFFF, but not U+D800–U+DFFF) is a valid value.
If a “byte order mark” of EF BB BF
appears at the start of the file,
it is ignored.
Every carriage return character (U+000D) which immediately
precedes a line feed (U+000A) is deleted.
Then, if any control characters in the range U+0000 to U+001F
other than the line feed and horizontal tab (U+0009) are
present in the file, an error occurs.
The current-section is a string variable which should be maintained during parsing. It is initally equal to the empty string.
An accepted-space is either a space (U+0020) or horizontal tab (U+0009) character.
Parsing now proceeds line-by-line, with lines being delimited by line feed characters. For each line:
#%disable warnings
or #%enable warnings
,
warnings should be disabled/enabled if any are implemented.
#
,
parsing proceeds to the next line.
[
, it is interpreted as a section header.
In this case:
]
optionally succeeded
by any number of accepted-spaces, an error occurs.
[
and final ]
(white space after the [
and before the
]
is not trimmed).
=
), an error occurs.=
,
not including any space or horizontal tab characters
immediately before the =
.
=
and any succeeding accepted-spaces.
"
(U+0022 QUOTATION MARK) or `
(U+0060 GRAVE ACCENT),
the value is quoted, and spans from the first character after c
to the next unescaped instance of c in the file (which may be on a different line).
In this case,
POM defines the following escape sequences, which may appear in quoted values. If a backslash character occurs in a quoted value but does not form a defined escape sequence, an error occurs.
Escape sequence | Value |
---|---|
\n | Line feed (U+000A) |
\r | Carriage return (U+000D) |
\t | Horizontal tab (U+0009) |
\\ | Literal \ (U+005C) |
\" | Literal " (U+0022) |
\' | Literal ' (U+0027) |
\` | Literal ` (U+0060) |
\, | Literal \, (U+005C U+002C) |
\x NM |
ASCII character with code NM,
interpreted as hexadecimal (must be in the range 01–7F). |
\u{ digits} |
Unicode code point digits, interpreted as hexadecimal. digits must be 1–6 characters long, and may contain leading zeros, but must not be zero and must not be a UTF-16 surrogate half D800–DFFF. |
Although POM does not have a way of specially designating a value as being a list,
there is a recommended syntax for encoding them. Specifically, a value can be treated as a list
by first splitting it into comma-delimited parts, treating \,
as a literal comma
in a list entry and \\
as a literal backslash,
then removing any accepted-spaces surrounding list entries.
List entries may be empty, but if the last entry in a list is empty, it is removed (if there are two or more empty entries at the end of a list, only one is removed). As a consequence, an empty string is considered to be an empty list.
If a list’s order is irrelevant and it might be large or benefit from labelling its entries,
a key prefix should be used instead
(see the ingredients
“list” in the opening example).
The following lines describe 3-entry lists.
POM line | Entry 1 | Entry 2 | Entry 3 |
---|---|---|---|
fonts = monospace, sans-serif, serif |
monospace |
sans-serif |
serif |
files = " foo.txt, weird\,name,z " |
foo.txt |
weird,name |
z |
things = \,,,76 |
, |
|
76 |
empties = ,,, |
|
|
|
escapees = \\,\a,\, |
\ |
\a |
, |
A configuration B can be merged into another configuration A by parsing both of them and setting the value associated with a key k to be
(Likewise, an ordered series of configurations A1, …, An can be merged by merging A2 into A1, then A3 into the resulting configuration, etc.)
This is useful, for example, when you want to have a global configuration for a piece of software installed on a multi-user machine where individual settings can be overriden by each user (in this case, the user configuration would be merged into the global configuration).
If needed for a particular domain, an parser may accept an extended form of the POM syntax.
Ideally, extensions should use lines containing invalid key characters (e.g. !&%
)
before the =
(if any)
so that there is no ambiguity, and the file cannot be parsed without the extension.
The following functions are (lightly) recommended in any general-purpose library for parsing POM files (their exact names/signatures can be changed to fit the style of the language).
load(filename: String, file: File) -> Configuration
filename
is used for reporting error locations.)
load_string(filename: String, string: String) -> Configuration
load
if language supports it).
(filename
is used for reporting error locations.)
load_path(path: String) -> Configuration
print(conf: Configuration)
key = value
on a separate line for each key in conf
,
sorted alphabetically by key. The value format doesn’t need to match POM’s format exactly,
since this function should only be used for debugging anyways.
has(conf: Configuration, key: String) -> Bool
key
is associated with any value.
keys(conf: Configuration) -> List<String>
all_values(conf: Configuration) -> List<Pair<String, String>>
location(conf: Configuration, key: String) -> Optional<Location>
key
in the configuration (file and line number).
Useful for reporting invalid values.
If a key k isn’t given a value in the configuration, but a key of the form
k.
j is, then the
location of the definition of an arbitrary such key
should be considered the location of k.
get(conf: Configuration, key: String) -> Optional<String>
key
, if any exists.
get_or_default(conf: Configuration, key: String, default: String) -> String
key
, if any exists, returning default
if not.
get_int(conf: Configuration, key: String) -> Optional<Int>
get_int_or_default(conf: Configuration, key: String, default: Int) -> Int
key
,
if any exists, and parse it as a signed integer
(returning default
if the key doesn’t exist).
The integer’s absolute value must be strictly less than 253,
written in decimal or
0x
/0X
-prefixed hexadecimal.
Leading zeroes are not permitted for decimal integers.
White space around or within the integer is not permitted.
A leading +
(or, of course, -
) is permitted.
If the key exists but its value does not follow these rules, an error is returned.
get_uint(conf: Configuration, key: String) -> Optional<UInt>
get_uint_or_default(conf: Configuration, key: String, default: UInt) -> UInt
key
, if any exists,
and parse it as an unsigned integer
(returning default
if the key doesn’t exist).
The integer must be at least 0 and strictly less than 253,
written in decimal or 0x
or 0X
-prefixed hexadecimal.
A leading +
is permitted, but -0
is not.
Leading zeroes are not permitted for decimal integers.
White space around or within the integer is not permitted.
If the key exists but its value does not follow these rules, an error is returned.
get_float(conf: Configuration, key: String) -> Optional<Float>
get_float_or_default(conf: Configuration, key: String, default: Float) -> Float
key
, if any exists,
and parse it as a 64-bit IEEE-754 double precision
floating-point number (returning default
if the key doesn’t exist).
The number must be written in ordinary decimal
(e.g. -1.234
, 7.
, 265
) or in C-like scientific notation
(e.g. 3e5
, 3.E-5
, -3.7e+005
).
Excessive leading zeroes are not permitted (0.0
is allowed, but not 00.0
).
Values which overflow to ±∞ are allowed (e.g. 1e999
), but NaN and explicit
inf
/Infinity
are not.
White space around or within the number is not permitted.
The decimal point (if one is present) must be preceded and succeeded by digits.
A leading +
(or, of course, -
) is permitted.
Returns an error if the key exists but its value is not a valid floating-point number.
get_bool(conf: Configuration, key: String) -> Optional<Bool>
get_bool_or_default(conf: Configuration, key: String, default: Bool) -> Bool
key
, if any exists, and parse it as a boolean,
taking true
, on
, yes
to be true
, and
false
, off
, no
to be false
(case-sensitive).
Returns an error if the key exists but is not one of those values.
get_list(conf: Configuration, key: String) -> Optional<List<String>>
get_list_or_default(conf: Configuration, key: String, default: List<String>) -> List<String>
key
, if any exists, and parse it as a list
(returning default
if the key isn’t present).
section(conf: Configuration, key: String) -> Configuration
key
(i.e. keys starting with key.
),
with the initial key.
removed, and their corresponding values.
Returns an empty configuration if there are no descendants of key
defined.
merge(conf_a: Configuration, conf_b: Configuration) -> Configuration
conf_b
into conf_a
.
unread_keys(conf: Configuration) -> List<String>
get
/ get_*
call (does not include has
),
either directly or through a section obtained from the section
function,
in an arbitrary order.
When configurations are merged, the gotten-ness of the values is preserved.
Whether or not getting values from the merged configuration affects the original configurations’
gotten-nesses is unspecified (and should rarely matter).
This section lists some examples of POM files. For more examples, see the tests/
directory in the main POM repository.
title = 'Crème brûlée'
0-*/_description_/*-0 =`A 'beautiful' crème br\u{FB}l\u{0000e9}e recipe
that\'s sure to delight your friends!`
author == `Jean\0\\"P." D'Martingale
[ingredients.flour]
quantity= "100 g"
type="all-purpose"
[ingredients.sugar]
quantity = 50 g
type = "br\x6f\u{77}n"
[ingrédients]
œufs.quantité=3
œufs.type = "extra large\,farm fresh\\,free-range"
[]
DIRECTIONS.en_CA.version.5 = "
1. Separate the egg yolks from the \"whites\".
2. Mix the yolks in a bowl with the sugar.
…
59. Enjoy!
"
This configuration has the following mapping of keys to values:
Key | Value |
---|---|
title | 'Crème brûlée' |
0-*/_description_/*-0 | A 'beautiful' crème brûlée recipe that's sure to delight your friends! |
author | = `Jean\0\\"P." D'Martingale |
ingredients.flour.quantity | 100 g |
ingredients.flour.type | all-purpose |
ingredients.sugar.quantity | 50 g |
ingredients.sugar.type | brown |
ingrédients.œufs.quantité | 3 |
ingrédients.œufs.type | extra large\,farm fresh\,free-range |
DIRECTIONS.en_CA.version.5 | 1. Separate the egg yolks from the "whites". 2. Mix the yolks in a bowl with the sugar. … 59. Enjoy! |
indentation-type = tabs
show-line-numbers = yes
tab-size = 4
font-size = "18"
[file-extensions]
C = .c
Cpp = .cpp, .h, .hpp
[plug-in.edit-over-ssh]
path = ~/misc/edit-over-ssh.so
enabled = yes
[plug-in.edit-over-ssh.settings]
favourite-host = my-web-server
[plug-in.edit-over-ssh.settings.hosts.my-web-server]
address = example.org
port = 22
ssh-key = ~/.ssh/id_ed25519
This configuration has the following mapping of keys to values:
Key | Value |
---|---|
indentation-type | tabs |
show-line-numbers | yes |
tab-size | 4 |
font-size | 18 |
file-extensions.C | .c |
file-extensions.Cpp | .cpp, .h, .hpp |
plug-in.edit-over-ssh.path | ~/misc/edit-over-ssh.so |
plug-in.edit-over-ssh.enabled | yes |
plug-in.edit-over-ssh.settings.favourite-host | my-web-server |
plug-in.edit-over-ssh.settings.hosts.my-web-server.address | example.org |
plug-in.edit-over-ssh.settings.hosts.my-web-server.port | 22 |
plug-in.edit-over-ssh.settings.hosts.my-web-server.ssh-key | ~/.ssh/id_ed25519 |
This section lists some erroneous lines that might appear in a POM file:
# Invalid key character '!'
cool-key! = 23
# Invalid key character ' '
fun times = yes
# Missing equals
music is on
# No closing ]
[my.section
# Invalid key character ' '
[ my.section ]
# Invalid escape sequence "\?"
no_trigraph = "a?\?=b"
# Invalid escape sequence "\xCE" — even though "\xCE\x92" is valid UTF-8.
# ("\u{392}" should be used instead)
capital_beta = "\xCE\x92"
# Invalid escape sequence "\x00" / Invalid character in value (null character)
C_string = "Hello, world!\x00"
# Stray characters after closing "
name = "Andy" B
# Duplicate key 'tab-size'
tab-size = 4
tab-size = 8