parsing YAML to values with libyaml in C

Learn parsing yaml to values with libyaml in c with practical examples, diagrams, and best practices. Covers c, linux, libyaml development techniques with visual explanations.

Parsing YAML to Values in C with libyaml

Hero image for parsing YAML to values with libyaml in C

Learn how to effectively parse YAML configuration files and extract specific values using the libyaml library in C, covering event-driven parsing and data extraction.

YAML (YAML Ain't Markup Language) has become a popular human-friendly data serialization standard for configuration files, data exchange, and more. When working with C applications, parsing YAML can seem daunting due to C's low-level nature. Fortunately, libyaml provides a robust, event-driven API for parsing YAML documents. This article will guide you through the process of setting up libyaml, understanding its event model, and extracting specific values from a YAML file into your C program.

Understanding libyaml's Event-Driven Parsing

libyaml doesn't directly build a DOM-like tree structure of your YAML document. Instead, it operates on an event-driven model. As the parser reads the YAML file, it emits events for each significant token it encounters: document start, mapping start, scalar value, sequence end, and so on. Your C program then needs to handle these events, typically using a switch statement, to reconstruct the data structure or extract the desired values.

sequenceDiagram
    participant Application
    participant libyaml Parser
    participant YAML File

    Application->>libyaml Parser: yaml_parser_initialize()
    Application->>libyaml Parser: yaml_parser_set_input_file(file)
    loop Parse Events
        libyaml Parser->>YAML File: Read next token
        YAML File-->>libyaml Parser: Token data
        libyaml Parser->>Application: Emit Event (e.g., SCALAR, MAPPING_START)
        Application->>Application: Process Event
    end
    Application->>libyaml Parser: yaml_parser_delete()

Sequence diagram illustrating the event-driven parsing flow of libyaml.

Setting Up libyaml and Basic Parsing

Before you can parse YAML, you need to include the libyaml header and initialize the parser and event objects. You'll also need to open the YAML file for reading. The core loop involves calling yaml_parser_parse() to get the next event and then processing it. Remember to clean up resources by deleting the parser and event objects and closing the file.

#include <yaml.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE *fp = fopen("config.yaml", "r");
    if (!fp) {
        perror("Failed to open config.yaml");
        return 1;
    }

    yaml_parser_t parser;
    yaml_event_t event;

    if (!yaml_parser_initialize(&parser)) {
        fputs("Failed to initialize parser!\n", stderr);
        return 1;
    }

    yaml_parser_set_input_file(&parser, fp);

    do {
        if (!yaml_parser_parse(&parser, &event)) {
            fprintf(stderr, "Parser error: %s\n", parser.problem);
            yaml_parser_delete(&parser);
            fclose(fp);
            return 1;
        }

        // Process event here (e.g., print event type)
        switch (event.type) {
            case YAML_STREAM_START_EVENT: puts("STREAM START"); break;
            case YAML_STREAM_END_EVENT:   puts("STREAM END");   break;
            case YAML_DOCUMENT_START_EVENT: puts("DOCUMENT START"); break;
            case YAML_DOCUMENT_END_EVENT:   puts("DOCUMENT END");   break;
            case YAML_MAPPING_START_EVENT:  puts("MAPPING START");  break;
            case YAML_MAPPING_END_EVENT:    puts("MAPPING END");    break;
            case YAML_SEQUENCE_START_EVENT: puts("SEQUENCE START"); break;
            case YAML_SEQUENCE_END_EVENT:   puts("SEQUENCE END");   break;
            case YAML_SCALAR_EVENT:         printf("SCALAR: %s\n", event.data.scalar.value); break;
            // Add other event types as needed
            default: break;
        }

        if (event.type != YAML_STREAM_END_EVENT) {
            yaml_event_delete(&event);
        }

    } while (event.type != YAML_STREAM_END_EVENT);

    yaml_event_delete(&event);
    yaml_parser_delete(&parser);
    fclose(fp);

    return 0;
}

Basic libyaml parsing loop to print all event types.

Extracting Specific Values from YAML

To extract specific values, you need to maintain state within your parsing logic. This often involves tracking the current 'path' within the YAML document (e.g., which mapping key you're currently inside) and then acting when a SCALAR event matches your desired key. For nested structures, you might use a stack to keep track of mapping keys or sequence indices.

# config.yaml
server:
  host: localhost
  port: 8080
database:
  type: postgres
  credentials:
    user: admin
    password: securepassword
features:
  - logging
  - caching
#include <yaml.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Simple state tracking for demonstration
enum ParseState {
    STATE_NONE,
    STATE_SERVER,
    STATE_HOST,
    STATE_PORT,
    STATE_DATABASE,
    STATE_DB_TYPE,
    STATE_CREDENTIALS,
    STATE_DB_USER,
    STATE_DB_PASSWORD
};

int main() {
    FILE *fp = fopen("config.yaml", "r");
    if (!fp) {
        perror("Failed to open config.yaml");
        return 1;
    }

    yaml_parser_t parser;
    yaml_event_t event;
    enum ParseState state = STATE_NONE;

    char *server_host = NULL;
    int server_port = 0;
    char *db_user = NULL;

    if (!yaml_parser_initialize(&parser)) {
        fputs("Failed to initialize parser!\n", stderr);
        return 1;
    }
    yaml_parser_set_input_file(&parser, fp);

    do {
        if (!yaml_parser_parse(&parser, &event)) {
            fprintf(stderr, "Parser error: %s\n", parser.problem);
            yaml_parser_delete(&parser);
            fclose(fp);
            return 1;
        }

        switch (event.type) {
            case YAML_MAPPING_START_EVENT:
                // Push current state onto a stack if needed for complex parsing
                break;
            case YAML_MAPPING_END_EVENT:
                // Pop state from stack
                state = STATE_NONE; // Reset for simplicity in this example
                break;
            case YAML_SCALAR_EVENT:
                if (state == STATE_NONE) {
                    if (strcmp((char *)event.data.scalar.value, "server") == 0) {
                        state = STATE_SERVER;
                    } else if (strcmp((char *)event.data.scalar.value, "database") == 0) {
                        state = STATE_DATABASE;
                    }
                } else if (state == STATE_SERVER) {
                    if (strcmp((char *)event.data.scalar.value, "host") == 0) {
                        state = STATE_HOST;
                    } else if (strcmp((char *)event.data.scalar.value, "port") == 0) {
                        state = STATE_PORT;
                    }
                } else if (state == STATE_DATABASE) {
                    if (strcmp((char *)event.data.scalar.value, "credentials") == 0) {
                        state = STATE_CREDENTIALS;
                    }
                } else if (state == STATE_CREDENTIALS) {
                    if (strcmp((char *)event.data.scalar.value, "user") == 0) {
                        state = STATE_DB_USER;
                    } else if (strcmp((char *)event.data.scalar.value, "password") == 0) {
                        state = STATE_DB_PASSWORD;
                    }
                } else if (state == STATE_HOST) {
                    server_host = strdup((char *)event.data.scalar.value);
                    state = STATE_SERVER; // Go back to parent state
                } else if (state == STATE_PORT) {
                    server_port = atoi((char *)event.data.scalar.value);
                    state = STATE_SERVER; // Go back to parent state
                } else if (state == STATE_DB_USER) {
                    db_user = strdup((char *)event.data.scalar.value);
                    state = STATE_CREDENTIALS; // Go back to parent state
                } else if (state == STATE_DB_PASSWORD) {
                    // For security, don't store passwords directly in plain text in real apps
                    printf("Found DB Password (not stored): %s\n", event.data.scalar.value);
                    state = STATE_CREDENTIALS; // Go back to parent state
                }
                break;
            default: break;
        }

        if (event.type != YAML_STREAM_END_EVENT) {
            yaml_event_delete(&event);
        }

    } while (event.type != YAML_STREAM_END_EVENT);

    yaml_event_delete(&event);
    yaml_parser_delete(&parser);
    fclose(fp);

    printf("\n--- Parsed Values ---\n");
    if (server_host) {
        printf("Server Host: %s\n", server_host);
        free(server_host);
    }
    printf("Server Port: %d\n", server_port);
    if (db_user) {
        printf("Database User: %s\n", db_user);
        free(db_user);
    }

    return 0;
}

C code to parse config.yaml and extract specific values using state tracking.

This example demonstrates a basic state machine to identify and capture specific scalar values. For more complex YAML structures, especially those with arbitrary nesting or sequences, you would typically implement a stack to keep track of the current context (e.g., which mapping key or sequence index you are currently processing). This allows you to correctly identify the 'parent' of a scalar value and store it in the appropriate data structure.