Rick

Rick
Rick

Thursday, April 17, 2014

TOML what if PLIST, JSON and Windows INI files had a baby

YAML to me is a bastard child. It is an 80 page spec., and just seems so damn unwieldily. I have seen people use it for config, but only use a mere fraction of what it can do.

Recently (last year), I wrote a JSON parser.

I also wrote a JSON loose, and an ASCII PLIST (Json-ish) parser.
The loose format allows comments. /* */, // and #.
The PLIST format allows plist style.
Then I have several strict JSON parser to parse direct from bytes or char[] or a char[] stream for really large files.
The JSON parser that I wrote is about 4x faster than common variants. (I was aiming for parity and I overshot). The idea of the JSON loose and the PLIST is very similar to TOML.
YAML seemed overkill for what I wanted to do with config. PLIST seemed more inline. 
I did not know about TOML. I like TOML. Stupid name, but then again so is YAML. 
TOML and PLIST are similar.
TOML looks like ASCII PLIST and Windows INI files had a baby.
YAML looks like a dog breakfast.
My one issue with TOML, other than the stupid name is arrays are strongly typed. No that is not right. Hmmm.. Arrays are actually arrays and not a bag of object stuff like JSON. So arrays are uniformly typed in TOML. Ok. That is actually a good thing, but since arrays are actually arrays and not bags of object stuff. We still need something to hold a bag of object stuff.

add one more type a tuple with parens. Pretty please




A tuple is a bag to hold object stuff.


Add to table, int, float, date, boolean, one more type... tuple aka a bag to hold object sh1t.
You have table which is a k/v map. 
You have array which all items have to be on type.
Add a tuple (you can call it a list or a bag of stuff-bos).
(1, "foo", 1979-05-27T07:32:00Z)
Sometimes things make sense in tuples. Disparate types are good for expressing many concepts. No I can't think of a good example right now. 
Boon will have the 6th TOML implementation for Java.
I am writing a lot of config files, and I find JSON aggravating even loose JSON.

And I agree with the author of TOML that YAML has jumped the shark.

Boon will have tuple. I might call it list. I guess if they don't add tuple to TOML, I will call what I write TONL (text object notation language).

Also, I tend to marshal JSON arrays instead of JSON objects to reduce the footprint of the JSON feed, which matters when you have a 10,000,000 user app. I can see using toml/tonl in places where I normally might use JSON, not just config.

Most of the rest of this post is lifted right from the toml page.
Toml's home page is here:
https://github.com/mojombo/toml/

Tom's Obvious, Minimal Language.
By Tom Preston-Werner.
"TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages."

Example

# This is a TOML document. Boom.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
organization = "GitHub"
bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
dob = 1979-05-27T07:32:00Z # First class dates? Why not?

[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # You can indent as you please. Tabs or spaces. TOML don't care.
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]

comments.

# I am a comment. Hear me roar. Roar.
key = "value" # Yeah, you can do this.

Integers 

42
-17

float

3.1415
-0.01

Truth


true
false

Date


1979-05-27T07:32:00Z

Array

Arrays are square brackets with other primitives inside. Whitespace is ignored. Elements are separated by commas. Data types may not be mixed.
[ 1, 2, 3 ]
[ "red", "yellow", "green" ]
[ [ 1, 2 ], [3, 4, 5] ]
[ [ 1, 2 ], ["a", "b", "c"] ] # this is ok
[ 1, 2.0 ] # note: this is NOT ok
Arrays can also be multiline. So in addition to ignoring whitespace, arrays also ignore newlines between the brackets. Terminating commas are ok before the closing bracket.
key = [
  1, 2, 3
]

key = [
  1,
  2, # this is ok
]

Table

Tables (also known as hash tables or dictionaries) are collections of key/value pairs. They appear in square brackets on a line by themselves. You can tell them apart from arrays because arrays are only ever values.
[table]
Under that, and until the next table or EOF are the key/values of that table. Keys are on the left of the equals sign and values are on the right. Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign. Key/value pairs within tables are unordered.
[table]
key = "value"
You can indent keys and their values as much as you like. Tabs or spaces. Knock yourself out. Why, you ask? Because you can have nested tables. Snap.
Nested tables are denoted by table names with dots in them. Name your tables whatever crap you please, just don't use a dot. Dot is reserved. OBEY.
[dog.tater]
type = "pug"
In JSON land, that would give you the following structure:
{ "dog": { "tater": { "type": "pug" } } }
You don't need to specify all the super-tables if you don't want to. TOML knows how to do it for you.
# [x] you
# [x.y] don't
# [x.y.z] need these
[x.y.z.w] # for this to work
Empty tables are allowed and simply have no key/value pairs within them.
As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it.
[a.b]
c = 1

[a]
d = 2
You cannot define any key or table more than once. Doing so is invalid.
# DO NOT DO THIS

[a]
b = 1

[a]
c = 2
# DO NOT DO THIS EITHER

[a]
b = 1

[a.b]
c = 2

Array of Tables

The last type that has not yet been expressed is an array of tables. These can be expressed by using a table name in double brackets. Each table with the same double bracketed name will be an element in the array. The tables are inserted in the order encountered. A double bracketed table without any key/value pairs will be considered an empty table.
[[products]]
name = "Hammer"
sku = 738594937

[[products]]

[[products]]
name = "Nail"
sku = 284758393
color = "gray"
In JSON land, that would give you the following structure.
{
  "products": [
    { "name": "Hammer", "sku": 738594937 },
    { },
    { "name": "Nail", "sku": 284758393, "color": "gray" }
  ]
}
You can create nested arrays of tables as well. Just use the same double bracket syntax on sub-tables. Each double-bracketed sub-table will belong to the most recently defined table element above it.
[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"
The above TOML maps to the following JSON.
{
  "fruit": [
    {
      "name": "apple",
      "physical": {
        "color": "red",
        "shape": "round"
      },
      "variety": [
        { "name": "red delicious" },
        { "name": "granny smith" }
      ]
    },
    {
      "name": "banana",
      "variety": [
        { "name": "plantain" }
      ]
    }
  ]
}
Attempting to define a normal table with the same name as an already established array must produce an error at parse time.
# INVALID TOML DOC
[[fruit]]
  name = "apple"

  [[fruit.variety]]
    name = "red delicious"

  # This table conflicts with the previous table
  [fruit.variety]
    name = "granny smith"


Implementations


If you have an implementation, send a pull request adding to this list. Please note the commit SHA1 or version tag that your parser supports in your Readme.

Validators

Language agnostic test suite for TOML parsers

Editor support

Encoder


My initial thoughts on TOML taken from some G+ comments.

Someone posted why?

And I wrote:

JSON is a bit verbose, and harder to human edit.

JSON is better for serialization, REST, websocket (I think). This is better for config (I think). PLIST is not bad either (better than JSON). Loose JSON is not bad either (Jackson, JSON smart, Boon all have loose JSON that allows comments, no quote keys, single quote keys, etc.)

The part I have heart burn is that arrays have to be all of the same type.

I think that is ok, but if you have that then you also need another type that is like a list that takes arbitrary types (easier for marshaling).

double instead of arbitrary Number is nice.
Int is nice.

I will probably write a parser for this. I have one for PLIST and one for JSON. 

Andrew
 4:01 PM
Is it like a Windows INI file?
James
 4:13 PM
It's the same idea and at first glance has some structural similarities.

Rick Hightower
Yesterday 8:55 PM
+
3
4
3
Right. No problem. As long as it is never the windows registry.

Windows has ini files.

Mac has plist which were cool until XML versions which are f'ing XML so the opposite of cool.

Then everyone switched to XML config. I did too. XML had an X in it for Christ sakes who could not switch. I mean... You had Xtreme Programming. You have to use things that start with an X. I am X generation.. I think. Or am I Y. I might be GenX. Who the hell knows. XML sounded cool. Then they added like namespaces, and XSDs, and SOAP, and working groups, and XML became a sh1t fest where the headers and namespaces were longer than the damn config... I digress.

Then JSON...

Then that was not good enough but missing comments so not so great for config...

YAML but that is just nuts so... YAML is nuts. 80 pages. I don't care if it is ported to 90 languages. If you can't hold the entire syntax in your head, then you can't use it for human readable config files.

Let's reinvent Windows ini files and call them Toml and we are back to 1988 but hey no
problem most people who know what an ini file are managers.

Pretend like you don't know it's an ini file or you are outing yourself and
you have to be a manager and they won't let you write code anymore.

I am hip. I am young. Toml is new. Nothing quite like it. All is knew. What about ASCII PLIST? LALalalala.. I am not listening. What about ini files? lalalalalala... I can't hear you. 

+1 TOML
-1 YAML
-1 XML config
+0.5 ASCII PLIST
+0.5 JSON loose

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training