Intersec Object Packer Part 1 : the basics

This post is an introduction to a useful tool here at Intersec, a tool that we call IOP: the Intersec Object Packer.

IOP is our take on the IDL approach. It is a method to serialize structured data to use in our communication protocols or data storage technologies. It is used to transmit data over the network in a safe manner, to exchange data between different programming languages or to provide a generic interface to store (and load) C data on disk. IOP provides data integrity checking and backward-compatibility.

The concept of IDL is not new. There are a lot of different available languages, such as Google Protocol Buffers or Thrift. IOP itself isn’t new, its initial version was written in 2008 and has seen a lot of evolutions during its, almost decade-long, life. However, IOP has proven itself to be solid and sufficiently well designed for not seeing any backward incompatible changes during that period.

IOP package description

The first thing to do with IOP is to declare the data structures in the IOP description language. With those definitions, our IOP compiler will automatically create all the helpers needed to use these IOP data structures in different languages and to allow serialization and deserialization.

Data stucture declaration is done in a C-like syntax (actually, it is almost the D language syntax) and lives inside a .iop file. As a convention, we use CamelCase in our iop files (which is different from our .c files coding rules).

Let’s look at a quick example:

struct User {
    int    id;
    string name;
};

Here we are. An IOP object with two fields: an id (as an integer) and a name (as a string). Obviously, it is possible to create much more complex structures. To do so, here is the list of available types for our structure fields.

Basic types

IOP allow several low-level types to be used to define object members. One can use the classics:

int / uint(32 bits signed/unsigned integer)
long / ulong (64 bits signed/unsigned integer)
byte / ubyte (8 bits signed/unsigned integer)
short / ushort(16 bits signed/unsigned integer)
bool
double
string

and also the types:

bytes (a binary blob)
xml (for an XML payload)
void (to specify a lack of data).

Complex types

Four complex data types are also available for our fields.

Structures

The structure describes a record containing one or more fields. Each field has a name and a type. To see what it looks like, let’s add an address to our user data structure:

struct Address {
    int    number;
    string street;
    int    zipCode;
    string country;
};

struct User {
    int     id;
    string  name;
    Address address;
};

Of course, there is no theoretical limitation on the number of struct “levels”. A struct can have a struct field which also contains a struct field etc.

Classes

A class is an extendable structure type. A class can inherit from another class, creating a new type that adds new fields to the one present in its parent class.

We will see classes in more details in a separate article.

Unions

An union is a list of possibilities. Its description is very similary to a structure: it has typed fields, but only one of the fields is defined at a time. The name union is inherited from C since the concept is very similar to C unions, however IOP unions are tagged, which means we do know which of the field is defined.

Example:

union MyUnion {
    int    wantInt;
    string wantString;
    User   wantUser;
};

Enumeration

The last type that can be used is the enumeration. Here again, an enum is similar to the C-enum. It defines several literal keys associated to integer values. Just like the C enum, the IOP enum supports the whole integer range for its values.

Example:

enum MyEnum {
    VALUE_1 = 1,
    VALUE_2 = 2,
    VALUE_3 = 3,
};

Member constraints

Now that we have all the types we need for our custom data structure fields, it’s time to add some new features to them, in order to gain flexibility. Those features are called constraints. These constraints are qualifiers for IOP fields. For now, we have 4 different constraints: optional, repeated, with a default value and the implicit mandatory constraint.

Mandatory

By default, a member of an IOP structure is mandatory. This means it must be set to a valid value in order for the structure instance to be valid. In particular, you must guarantee the field is set before serializing/deserializing the object. By default, mandatory are value fields in the generated C structure: this means the value is inlined in the structure type and is copied. There are however some exceptions to this rule but we will see that later.

The example is pretty simple:

struct Foo {
    int mandatoryInteger;
};

Optional members

An optional member is indicated by a ? following the data type. The packers/unpackers allow these members to be absent without generating an error.

struct Foo {
    int? optionalMember;
    Bar? optionalMember2;
    int  mandatoryInteger;
};

Repeated members

A repeated member is a field that can appear zero or more times in the structure (often represented by an array in the programming languages). As such a repeated field is optional (can be present 0 times). A repeated member is indicated by a “[]” following the data type.

In the next example, you can consider the repeatedInteger field as a list of integers.

struct Foo {
    int[] repeatedInteger;
    int?  optionalMember;
    Bar?  optionalMember;
    int   mandatoryInteger;
};

With default value

A field with a default value is a kind of mandatory member but allowed to be absent. When the member is absent, the packer/unpacker always sets the member to its default value.

A member with a default value is indicated by setting the default value after the field declaration.

struct Foo {
    int   defaultInteger = 42;
    int[] repeatedInteger;
    int?  optionalMember;
    Bar?  optionalMember;
    int   mandatoryInteger;
};

Moreover, it is allowed to use arithmetic expressions on integer (and enum) member types like this:

struct Foo {
    int   defaultInteger = 2 * (256 << 20) + 42;
    int[] repeatedInteger;
    int?  optionalMember;
    Bar?  optionalMember;
    int   mandatoryInteger;
};

IOP packages

The last thing to know to be able to write our first IOP file is about packages.

An IOP file corresponds to an IOP package. Basically, the package is kind of a namespace for the data structures you are declaring. The filename must match with package name. Every IOP file must define its package name like this:

package foo; /*< package name of the file foo.iop */

struct Foo {
    [...]
};

[...]

A package can also be a sub-package, like this:

package foo.bar; /*< package name of the file foo/bar.iop */

struct Bar {
    [...]
};

[...]

Finally, you can import objects from another package by specifying the package name before the type:

package plop; /*< package name of the file plop.iop */

struct Plop {
    foo.bar.Bar bar;
};

[...]

How to use IOP

Before going to more complicated features on IOP, let’s see a simple example of how to use our new custom data structures that we just declared.

When compiling our code, a first pass is done on our IOP files using our own compiler. This compiler will parse the .iop files and generate the corresponding C sources files that provides helpers to serialize/deserialize our data structures. Here again, we will see it in more details soon 🙂

Let’s see an example of code which is using IOP. First, let’s assume we have declared a new IOP package:

package User;

struct UserAddress {
    string street;
    int?   zipCode;
    string city;
};

struct User {
    ulong       id = 1;
    string      login;
    UserAddress addr;
};

This will create several C files containing the type descriptors used for data serialization/deserialization as well as the C types declarations:

struct user__user_address__t {
    char*     street;  /*< Actually a slightly more complicated type is used for
                        *  strings, but no need to be too specific here :)
                        */
    opt_i32_t zip_code;
    char*     city;
};

struct user__user__t {
    uint64_t                     id;
    char *                       login;
    struct user__user_address__t addr;
};

Not very different from the IOP file right? We can notice some uncommon stuff still:

The opt_i32_t type for zip_code. This is how we handle optional field. It is a structure containing a 32 bits integer + a boolean indicating if the field is set or not.
The stuctures names are now in snake_case instead of camelCase. The name of the package is added as a prefix of each structures, and there is a __t suffix too. This helps to recognize IOP structures when we meet one in our C code.

All the code generated by our compiler will be available through a user.iop.h file.

Now let’s play with it in our code :

#include "user.iop.h"

[...]

int my_func(void) {
    user__user__t user;

    /* This function will initialize all the fields (and sub-fields) of the
     * structure, according to the IOP declarations. Here, everything will be set
     * to 0/NULL but the field "id" which will contains the value "1". The first
     * argument indicates the package + structure name of our IOP object.
     */
    iop_init(user__user, &user);

    /* This function will pack our IOP structure into an IOP binary format and
     * returns a pointer to the created buffer containing the packed structure.
     * The structures will be packed in order to use as little memory as possible.
     * Let put aside the memory management questions for this post.
     */
    void *binary_data = iop_bpack(user__user, &user);

    /* This call must have failed. Our constraint are not respected, as several
     * mandatory fields were not correctly filled.
     */
    assert(binary_data == NULL);

    user.addr.street = "221B Baker Street";
    user.addr.city   = "London";
    user.login = "SH";

    binary_data = iop_bpack(user__user, &user);

    /* This one should be the good one. Even if "id" field and "addr.zip_code" are
     * not filled, it is not a problem as the first one got a default value and
     * the second one is an optional field.
     */
    assert(binary_data != NULL);

    /* Now we can do whatever we want with these data (writing it on disk for
     * example). But for now, let's just try to unpack it. Here again, put a
     * blindfold about memory management.
     */
    user__user__t user2;
    int res = iop_bunpack(binary_data, user__user, &user2);

    /* Unpacking should have been successful, and we now have a "user2" struct
     * identical to "user" struct.
     */
    assert(res >= 0)
}

Here we are. IOP gave us the superpower of packing/unpacking data structures in a binary format in two simple function calls. These binary packed structures can be used for disk storage. But as we will see in a future article, we also use it for our network communications.

Next time, we will talk about inheritance for our IOP objects!

Posted on Aug 16, 2017 at 10:36