Convert complex YAML to .NET types with custom YamlDotNet type converters

Convert complex YAML to .NET types with custom YamlDotNet type converters

When it comes to YAML serialization and deserialization in .NET, YamlDotNet is a go-to library with over 100 million downloads on NuGet. It is also integrated into various projects by Microsoft and the .NET team, despite the absence of an official Microsoft YAML library for .NET.

In this blog post, we will explore the process of creating custom YAML serializers and deserializers using YamlDotNet. To illustrate these concepts, we'll examine the specific use case of partially parsing the environment variables section of a Docker Compose file.

The Docker Compose environment variables use case

Docker Compose allows the definition of environment variables in two distinct formats. The first, known as the object format, appears as follows:

environment:
  RACK_ENV: development
  SHOW: "true"
  SESSION_SECRET:

This object format can be directly deserialized into a dictionary of strings. However, Docker Compose also supports an array format:

environment:
  - RACK_ENV=development
  - SHOW=true
  - SESSION_SECRET

Unlike the object format, the array format is more complex to deserialize as it consists of an array of strings. If we want to consistently deserialize both formats into a dictionary of strings, we need to create a custom serializer. This can be done by implementing the IYamlTypeConverter interface.

Before jumping into the code, let's first understand the three types of YAML tokens that can be encountered when parsing a YAML document with YamlDotNet:

  • The Scalar token represent the presence of a scalar value. It can be a string, a number, a boolean, etc.
  • The MappingStart and MappingEnd tokens represent the start and end of a YAML object, an enumeration of key-value pairs. Note that the keys are always scalars.
  • The SequenceStart and SequenceEnd tokens represent the start and end of a YAML array, an enumeration of values.

You can understand how YAML documents can be parsed using those tokens here:

# If we were to parse the "myobject" YAML value, we would encounter:
# MappingStart, Scalar (foo), Scalar (bar), MappingEnd
myobject:
  foo: bar

# If we were to parse the "myarray" YAML value, we would encounter:
# SequenceStart, Scalar (foo), SequenceEnd
myarray:
  - foo

Implementing our custom IYamlTypeConverter

With this in mind, let's begin by implementing the IYamlTypeConverter interface. The interface has three methods. The first, Accepts, is used to determine if the converter can handle a given type. In our case, we want to handle the type EnvironmentVariables, which I've just created to represent a dictionary of environment variables:

public class EnvironmentVariables : Dictionary<string, string>
{
}

public class EnvironmentVariablesTypeConverter : IYamlTypeConverter
{
    public bool Accepts(Type type)
    {
        return type == typeof(EnvironmentVariables);
    }

    // [...]
}

The second method, ReadYaml, is used to deserialize a YAML document into a .NET object. In our scenario, we aim to deserialize a YAML object or array into an EnvironmentVariables:

public object ReadYaml(IParser parser, Type type)
{
    // We'll implement the deserialization logic very soon
    return new EnvironmentVariables();
}

The third method, WriteYaml, is used to serialize a .NET object back into a YAML document. For our purposes, we'll serialize an EnvironmentVariables object into a YAML format:

public void WriteYaml(IEmitter emitter, object? value, Type type)
{
    // We'll implement the serialization logic very soon
    var dict = (EnvironmentVariables)value!;
}

Now that we've outlined the structure of our custom serializer, let's delve into the deserialization logic. Given that the Docker Compose YAML schema for environment variables supports both an object format and a string array format, we can evaluate the first YAML token to find out whether it marks the beginning of a YAML object (MappingStart) or a YAML array (SequenceStart):

public object? ReadYaml(IParser parser, Type type)
{
    if (parser.TryConsume<MappingStart>(out _))
    {
        return ParseMapping(parser); // We're parsing a YAML object
    }

    if (parser.TryConsume<SequenceStart>(out _))
    {
        return ParseSequence(parser); // We're parsing a YAML array
    }

    throw new InvalidOperationException("Expected a YAML object or array");
}

The TryConsume method, as its name suggests, attempts to consume a YAML token from the document. If the token is of the expected type, it's consumed, and the method returns true. If not, the method returns false, and the parser doesn't move forward. Let's implement the two parsing methods:

private static EnvironmentVariables ParseMapping(IParser parser)
{
    var envvars = new EnvironmentVariables();

    // Read all the key-value pairs until we reached the end of the YAML object
    while (!parser.Accept<MappingEnd>(out _))
    {
        var key = parser.Consume<Scalar>();
        var value = parser.Consume<Scalar>();
        envvars[key.Value] = value.Value;
    }

    // Consume the mapping end token
    parser.MoveNext();
    return envvars;
}

// Regex that parses a key-value pair in the array format (e.g. "FOO=BAR" or "FOO")
private static readonly Regex EnvironmentVariableLineRegex = new Regex("^(?<key>[^=]*)(=(?<value>.*))?$", RegexOptions.Compiled);

private static EnvironmentVariables ParseSequence(IParser parser)
{
    var envvars = new EnvironmentVariables();

    // Read all the array values until we reach the end of the YAML array
    while (!parser.Accept<SequenceEnd>(out _))
    {
        var scalar = parser.Consume<Scalar>();

        if (EnvironmentVariableLineRegex.Match(scalar.Value) is { Success: true } match)
        {
            var key = match.Groups["key"].Value;
            var value = match.Groups["value"].Success ? match.Groups["value"].Value : string.Empty;
            envvars[key] = value;
        }
        else
        {
            throw new InvalidOperationException("Invalid key value mapping: " + scalar.Value);
        }
    }

    // Consume the mapping end token
    parser.MoveNext();
    return envvars;
}

With these methods in place, we can support both the object and array formats. The deserialization logic is now complete. Next, let's tackle the serialization logic. Our goal is to serialize an EnvironmentVariables object into a YAML format. The Emit method can be used to produce YAML tokens:

private static readonly char[] KeyCharactersThatRequireQuotes = { ' ', '/', '\\', '~', ':', '$', '{', '}' };

public void WriteYaml(IEmitter emitter, object? value, Type type)
{
    var envvars = (EnvironmentVariables)value!;

    // We start a new YAML object
    emitter.Emit(new MappingStart(AnchorName.Empty, TagName.Empty, isImplicit: true, MappingStyle.Block));

    foreach (var entry in envvars)
    {
        // We try to determine if the value needs to be quoted if it contains special characters
        var keyScalar = entry.Key.IndexOfAny(KeyCharactersThatRequireQuotes) >= 0
            ? new Scalar(AnchorName.Empty, TagName.Empty, entry.Key, ScalarStyle.DoubleQuoted, isPlainImplicit: false, isQuotedImplicit: true)
            : new Scalar(AnchorName.Empty, TagName.Empty, entry.Key, ScalarStyle.Plain, isPlainImplicit: true, isQuotedImplicit: false);

        // Write the key, then the value
        emitter.Emit(keyScalar);
        emitter.Emit(new Scalar(AnchorName.Empty, TagName.Empty, entry.Value, ScalarStyle.DoubleQuoted, isPlainImplicit: false, isQuotedImplicit: true));
    }

    // We end the YAML object
    emitter.Emit(new MappingEnd());
}

With these methods, we've constructed a custom YAML serializer and deserializer for the EnvironmentVariables type. We can subsequently use it to deserialize a YAML document into an EnvironmentVariables object:

var envvarTypeConverter = new EnvironmentVariablesTypeConverter();

var deserializer = new DeserializerBuilder()
    .WithTypeConverter(envvarTypeConverter)
    .IgnoreUnmatchedProperties() // don't throw an exception if there are unknown properties
    .Build();

var serializer = new SerializerBuilder()
    .WithTypeConverter(envvarTypeConverter)
    .DisableAliases() // don't use anchors and aliases (references to identical objects)
    .Build();

// Returns:
// environment:
//   foo: "bar"
var yamlText = serializer.Serialize(new DockerComposeService
{
    Environment = new EnvironmentVariables
    {
        ["foo"] = "bar"
    }
});

// 
var yamlObj = deserializer.Deserialize<DockerComposeService>(yamlText);