From 6f2eea28605381fadf41ec49d05c664c407b9b94 Mon Sep 17 00:00:00 2001 From: pommicket Date: Sun, 7 Sep 2025 03:37:30 -0400 Subject: Initial spec --- site/404.html | 11 ++ site/index.html | 10 + site/main.css | 7 + site/spec.html | 583 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 611 insertions(+) create mode 100644 site/404.html create mode 100644 site/index.html create mode 100644 site/main.css create mode 100644 site/spec.html (limited to 'site') diff --git a/site/404.html b/site/404.html new file mode 100644 index 0000000..3f9e5ac --- /dev/null +++ b/site/404.html @@ -0,0 +1,11 @@ + + + + + + + +

Page not found

+

Main page

+ + diff --git a/site/index.html b/site/index.html new file mode 100644 index 0000000..026778b --- /dev/null +++ b/site/index.html @@ -0,0 +1,10 @@ + + + + + + + +

Coming soon

+ + diff --git a/site/main.css b/site/main.css new file mode 100644 index 0000000..b771e2a --- /dev/null +++ b/site/main.css @@ -0,0 +1,7 @@ +td, th { + border: 2px solid black; + padding: 0.5em; +} +table { + border-collapse: collapse; +} diff --git a/site/spec.html b/site/spec.html new file mode 100644 index 0000000..c3ae3fa --- /dev/null +++ b/site/spec.html @@ -0,0 +1,583 @@ + + + + + + + POM Language Specification + + + + +

+ POM homepage +

+

POM Language Specification, v. 0.1.0

+

Introduction

+

+ POM is a “markup” language, primarily intended for software configuration, + and designed to be easy to parse and use. + The POM specification is quite strict, to avoid cases where dubious files can + be accepted by some parsers while being rejected by others. + POM files should use the .pom file extension to identify themselves. +

+

+ Every file describes a configuration, which is a mapping from keys to values. + A key is a string consisting of components separated by dots (.). + A value is a string whose interpretation is entirely decided by the application. + Notably there is no distinction in POM’s syntax between, say, the number 5 and the string 5. + A configuration, such as the one obtained from the POM file +

+
[ingredients.sugar]
+amount = 50 g
+type = brown
+
+[ingredients.flour]
+amount = 100 g
+type = all-purpose
+
+[baking]
+temperature = 150 °C
+time = 35 min
+
+

+ can either be seen as a simple mapping from keys to values: +

+ + + + + + + + + + + + +
KeyValue
ingredients.sugar.amount50 g
ingredients.sugar.typebrown
ingredients.flour.amount100 g
ingredients.flour.typeall-purpose
baking.temperature150 °C
baking.time35 min
+

+ or a tree of keys, with a value associated to each leaf node: +

+
+ + + + (root) + + + + + ingredients + + + + + sugar + + + + + type + brown + + + + amount + 50 g + + + + + flour + + + + + type + all-purpose + + + + amount + 100 g + + + + + + + + baking + + + temperature + 150 °C + + + + time + 35 min + + + +
+

Error handling

+

+ All error conditions are described in this specification. A compliant POM parser should not + reject any file in any other case, outside of exceptional circumstances such as running out of memory. + When an error occurs, it should be reported, ideally with information about the file name and line number, + and the file must be entirely rejected (i.e. parsers must not attempt to preserve only the correct parts of an erroneous file). + Warnings may also be issued according to the judgment of the library author. +

+

Text encoding

+

+ All POM files are encoded using UTF-8. Both LF and CRLF line endings may be used (see below). + If invalid UTF-8 is encountered, including overlong sequences and UTF-16 surrogate halves (U+D800-DFFF), + an error occurs. +

+

Valid keys/values

+

+ Keys in a POM file may contain the following characters +

+ +

+ A non-empty string containing only these characters is a valid key if and only if it does not start or end with a dot + and does not contain two dots in a row (..). +

+

+ Any string of non-zero Unicode code points (U+0001–U+10FFFF) is a valid value. +

+

Parsing

+

+ If a “byte order mark” of EF BB BF appears at the start of the file, + it is ignored. + Every carriage return character (U+000D) which immediately precedes a line feed (U+000A) is deleted. + Then, if any control characters in the range U+0000 to U+001F other than the line feed and horizontal + tab (U+0009) are present in the file, an error occurs. +

+

+ The current-section is a string variable which should be maintained during parsing. It is initally + equal to the empty string. +

+

+ An accepted-space is either a space (U+0020) or horizontal tab (U+0009) character. +

+

+ Parsing now proceeds line-by-line, with lines being delimited by line feed characters. For each line: +

+
    +
  1. Any accepted-spaces that appear at the start of the line are removed.
  2. +
  3. + If the line begins with #%disable warnings or #%enable warnings, + warnings should be disabled/enabled if any are implemented. +
  4. +
  5. + If the line is empty or begins with #, + parsing proceeds to the next line. +
  6. +
  7. + If the line begins with [, it is interpreted as a section header. In this case: +
      +
    1. If the line does not end with ] optionally succeeded by any number of accepted-spaces, an error occurs.
    2. +
    3. + The current-section is set to the text in between the initial [ and final ] + (white space between the [ and ] is not trimmed). +
    4. +
    5. + If the new current-section is not empty and not a valid key (see above), an error occurs. +
    6. +
    +
  8. +
  9. + Otherwise, the line is now interpreted as a key-value assignment. In this case: +
      +
    1. If the line does not contain an equal sign (=), an error occurs.
    2. +
    3. + The relative-key is the text preceding the =, not including any space or horizontal tab characters + immediately before the =. +
    4. +
    5. + If the relative-key is not a valid key (see above), an error occurs. +
    6. +
    7. + Let c be the first character after the = and any succeeding accepted-spaces. +
    8. +
    9. + If c is one of "'` (U+0022 QUOTATION MARK, U+0027 APOSTROPHE, U+0060 GRAVE ACCENT), + the value is quoted, and spans from the first character after c to the next unescaped + instance of c in the file (which may be on a different line). In this case, +
        +
      1. + Escape sequences are processed as described below. +
      2. +
      3. + Following the closing instance of c, there must be a line feed, + optionally preceded by any number of accepted-spaces; + otherwise an error occurs. +
      4. +
      +
    10. +
    11. + Otherwise, accepted-spaces at the end of the line are removed; + then, the value is the text starting from c and ending at the next line feed. +
    12. +
    13. If the value is not a valid value (see above), an error occurs.
    14. +
    15. + The key is equal to the relative-key if the current-section is empty; otherwise, it is equal to + the concatenation of current-section, a dot, and the relative-key. +
    16. +
    17. + The key is assigned to the value. +
    18. +
    +
  10. +
+ +

Escape sequences

+ +

+ POM defines the following escape sequences, which may appear in quoted values. + If a backslash character occurs in a quoted value but does not form + a defined escape sequence, an error occurs. +

+ + + + + + + + + + + + + + + + + + + + + +
Escape sequenceValue
\nLine feed (U+000A)
\rCarriage return (U+000D)
\tHorizontal tab (U+0009)
\\Literal \ (U+005C)
\"Literal " (U+0022)
\'Literal ' (U+0027)
\`Literal ` (U+0060)
\,Literal \, (U+005C U+002C)
\xNMASCII character with code NM, +
interpreted as hexadecimal +
(must be in the range 01–7F).
\u{digits}Unicode code point digits,
+ interpreted as hexidecimal
+ digits must be 1–6 characters long,
+ and may contain leading zeros,
+ but must not be zero.
+ +

Lists

+

+ Although POM does not have a way of specially designating a value as being a list, + there is a recommended syntax for encoding them. Specifically, a value can be treated as a list + by first splitting it into comma-delimited parts (treating \, as a literal comma + in a list entry), then removing any accepted-spaces surrounding list entries. List entries may be empty. +

+

+ An empty string is considered to be an empty list. +

+

+ If a list's order is irrelevant and it might be large or benefit from labelling its entries, + a key prefix should be used instead + (see the ingredients “list” in the opening example). +

+

Examples

+ + + + + + + + + + + + + + + + + + + + + + + + +
POM lineEntry 1Entry 2Entry 3
fonts = monospace, sans-serif, serifmonospacesans-serifserif
files = " foo.txt, weird\,name,z "foo.txtweird,namez
things = `\,,,76`,76
+ + +

Merging configurations

+

+ A configuration B can be merged into another configuration A by parsing both of them + and setting the value associated with a key k to be +

+
    +
  1. The value associated with k in B, if any.
  2. +
  3. Otherwise, the value associated with k in A, if any.
  4. +
+

+ (Likewise, an ordered series of configurations A1, …, An + can be merged by merging An-1 into An, then An-2 into + the resulting configuration, etc.) +

+

+ This is useful, for example, when you want to have a global configuration for a piece of software + installed on a multi-user machine where individual settings can be overriden by each user (in this case, + the user configuration would be merged into the global configuration). +

+ +

Extensions

+

+ If needed for a particular domain, an parser may accept an extended form of the POM syntax. + Ideally, extensions should use lines beginning with invalid key characters (e.g. !&%) + so that there is no ambiguity, and the file cannot be interpreted without the extension. +

+ +

Schemas

+

+ A schema is a POM file that describes how other POM files should be formatted (i.e. what keys they should + include, and what values they can be associated with). A configuration can be said to follow a schema when it obeys + all of the schema’s rules. +

+

+ Every schema key is of the form k.rule, where k is a valid key, + and rule is one of the rule names listed below. +

+

+ For any valid key k the value of the rule rule is determined for k as follows: +

+ + +

type rule

+

+ Default: String. +

+

+ This describes what values a key is allowed to be associated with. + The following types are defined: +

+ + +

allow_unknown rule

+

+ Default: inherited from parent (i.e. if k = j.component, + look up the allow_unknown rule for j), or yes if k has no parent (does not contain a dot). +

+

+ This describes whether or the key k is allowed if it is not described in the schema. + It must be set to either yes or no. +

+

+ If a key is encountered in a configuration + and the value of its allow_unknown rule is no, the configuration does not follow the schema. +

+

min, max rules

+

+ This schema key's value sets the minimum/maximum value for the key's value. This must not be set if type + does not explicitly allow numeric values (i.e. it does not contain a type Int/UInt/Float). +

+

maxlength rule

+

+ The value of this rule must be a non-negative integer no greater than 231−1. + Specifies that the value of a key can be no longer than that number of UTF-8 bytes. +

+

default rule

+

+ Sets the default value for a key. +

+

Missing values

+

+If there is a schema key k.type, where k does not contain any *-components, +and the type does not allow unset values (None), and there is no schema key k.default, +then a configuration must contain the key k to follow the schema. +

+

+Additionally, if there is a schema key j.*.k.type that does not allow unset values +and no correspoding default schema key, where k does not contain any *-components, then a configuration +which contains a key x matching j.* must also contain the key x.k. +

+

API recommendations

+

+ The following functions are (lightly) recommended + in any general-purpose library for parsing POM files + (their exact names/signatures can be changed to fit the style of the language). +

+ + +

Examples

+ +

A schema for a text editor's configuration

+

+# don't allow unknown keys by default
+*.allow_unknown = no
+
+# must put this in your config! I can't make the decision for you!
+indent-using-spaces.type = Bool
+
+show-line-numbers.type = Bool
+show-line-numbers.default = on
+
+[tab-size]
+type = UInt
+min = 1
+default = 4
+
+[font-size]
+# allow fractional font sizes; why not!
+type = Float
+min = 0.5
+max = 100
+default = 14
+
+[plug-in]
+*.path.type = String
+# everyone be nice to the Microsoft Windows
+*.path.maxlength = 260
+# allow arbitrary keys in plug-ins' settings
+*.settings.allow_unknown = yes
+
+[file-extensions]
+C.type = List[String]
+C.default = .c, .h
+C++.type = List[String]
+C++.default = .cpp, .hpp, .cc, .hh
+C-sharp.type = List[String]
+C-sharp.default = .cs
+
+ + + + -- cgit v1.2.3