Anders Riggelsen

Anders Riggelsen

Introduction

The file's entry point is defined with the format keyword in front of a data-block specifier.

A file's structure is defined with theese primitives (data-blocks): struct, rel_offset, abs_offset, enum and bitfield.

Data blocks

struct

Structs are the most basic block of data containing other data-types within it.
They are defined as follows:

This struct is called "myStruct" and contains a signed byte named "somebyte" followed by a signed long value named "alongvalue".

rel_offset and abs_offset

Relative offsets and absolute offsets behaves much like struct, but defines data that is stored at an offset from some origin.
rel_offset blocks are stored relative to the end of the containing block in which they are stored in.
abs_offset blocks are stored relative to the beginning of the file.

Relative and Absolute offset-blocks digests a defined primitive type which holds the offset value:

Here the struct "myFile" contains "somedata" of the type byte and "datablock" of the type long which is in fact a reference to the data-block defined in "DataChunk".

enum

Enumerators are a special case of a primitive type which has special rules applied to them. The most common uses for this data-block are for validation purposes and for setting local variables or flags for future conditional parsing.

Enumerators looks like the following example:

Enums supports both single values and ranges of values (both closed and open brackets are allowed).
The enum labels ("NewBorn","Child","Teenager" ect. ect.) are optional.

Note: Overlapping ranges are not allowed, nor are single values that exists within an existing range.

The else keyword is a special value that is chosen if the read value does not match any of the given values or into any of the ranges. In the above case, an error is thrown.

If no 'else' case is given then any value read that does not match any value/range are implicitly defined as an error.

bitfield

Bitfields of a primitive type are used to easily parse boolean types and do certain actions if specific bits are set.

In the above example there is a bitfield of type byte (8 bits).
Any action can be performed if a bit is set or not. Non-negative integers from 0 to the size of the bitfield (in bits) are legal bit indexes. If an action should be performed if a bit is not set, then prefix it with an exclamation mark.
Other bits not in the bitfield block are still parsed but no action is performed.

Local fields

Local fields are write and readable variables that you can use to store information about the data you are parsing.
The local fields are not read from the file nor do they affect the writing index or size of the datastructures.
They can only be read or modified through actions.

In the above example the parser sets a boolean value if the read value is a prime number and smaller or equal to 17.
Otherwise it cause a warning and store one substracted from the read value and then divided in half.
Any expression using simple operators like +, -, * and / as well as paranthesis are allowed.

In struct-block types you can use the already parsed data in your expressions. This is handy if you need to calculate the size of another field based on previous parsed data.

The value keyword returns the same value as the read data.

Any of the data blocks can contain local variables (structs, enums, ... ect.).

Actions

Any data parsed inside enum's and bitfield's can trigger actions.
Actions are put inside square brackets [] seperated by commas.
The different types of legal actions are:

Example of triggering of actions:

Any number of actions can be defined

Conditionals

Conditionals are checks that are performed before data is read from the file.
The truth-value of the expression determines if the parser should parse the following data-block.

In the above example the parser looks at the 'type' field read from the file and checks what type is is before deciding what kind of data to read in next.
If the type resolves to 'Leaf' then 'nodeA' and 'nodeB' isn't expected in that block.
If it resolved to 'Node' then no 'leaf' value is expected in the block.

Arrays of data

Known lengths

It is possible for a file-format to contain a contigous sequence of the same type of data.
To represent this in the schema you supply the expected length of the sequence in square brackets.
Arrays can be of any type (data-blocks or primitive types)

The example above shows how a constant length or a length read from the file can be used.

Unknown lengths

The length of a block can't always be known. This is often the case when the file contains null-terminated strings or data should just be read until the end of the file. To allow this, arrays can be defined as follows:

In this case the 'name' array of bytes is read until it encounters a null-character.

The 'until' cases are checked only before it is about to read the next block of data.
This means that the parser will not stop expanding the array if it encounters a stop-case if it is contained inside the data that is being read, only if it is the next data the parser encounters.

By default the cursor is moved after the 'until'-case that matched.
If the cursor should not be moved then use the stay keyword after the case (and eventual inclusive/exclusive keyword).
That is used when the 'until'-case is part of data that belongs to data that should be parsed afterwards.

The inclusive keyword will in the above case store the null-character "\0" into the array as well. (default behavior)
The exclusive keyword does not include the 'until'-case in the array.
EOF stands for "End Of File". If the end of the file is encountered then the parser will stop adding elements to the array.

Note: For inclusive to be legal, the 'until'-case must be of compatible type with the array-type.

Note: The 'until' cases are checked in the order they are specified until it finds one that matches. This means that if some strings are prefixes of the other cases, then the longest of the strings should generally be first. Otherwise they would never be checked.

Consider this example:

data :: byte[] until "a" or "ab" or "abc" inclusive;

This would lead to incorrect behavior if it was expected that "abc" would get included in the array before stopping. This is because the parser is lazy and stops at the first 'until'-case it finds that matches. Then only "a" would get included and "bc" is the next to get parsed.
The correct way for that behavior is:

data :: byte[] until "abc" or "ab" or "a" inclusive;

Skipped data

If only a subset of a file format is needed after parsing, then there is no need to store the extra data.
By prefixing data-blocks or primitive types in struct-types with the skip keyword.
The parser still expects the data to be in the file, but it will just quickly skip over it without validation and not store it in memory.


Big endian / little endian

By default little-endian ordering is expected when parsing data.
This can be overridden (pr. schema basis) with the following line as the very first line in the file: byteOrdering :: little-endian;
or byteOrdering :: big-endian;