parsing YAML to values with libyaml in C
Categories:
Parsing YAML to Values in C with libyaml

Learn how to effectively parse YAML configuration files and extract specific values using the libyaml library in C, covering event-driven parsing and data extraction.
YAML (YAML Ain't Markup Language) has become a popular human-friendly data serialization standard for configuration files, data exchange, and more. When working with C applications, parsing YAML can seem daunting due to C's low-level nature. Fortunately, libyaml
provides a robust, event-driven API for parsing YAML documents. This article will guide you through the process of setting up libyaml
, understanding its event model, and extracting specific values from a YAML file into your C program.
Understanding libyaml's Event-Driven Parsing
libyaml
doesn't directly build a DOM-like tree structure of your YAML document. Instead, it operates on an event-driven model. As the parser reads the YAML file, it emits events for each significant token it encounters: document start, mapping start, scalar value, sequence end, and so on. Your C program then needs to handle these events, typically using a switch
statement, to reconstruct the data structure or extract the desired values.
sequenceDiagram participant Application participant libyaml Parser participant YAML File Application->>libyaml Parser: yaml_parser_initialize() Application->>libyaml Parser: yaml_parser_set_input_file(file) loop Parse Events libyaml Parser->>YAML File: Read next token YAML File-->>libyaml Parser: Token data libyaml Parser->>Application: Emit Event (e.g., SCALAR, MAPPING_START) Application->>Application: Process Event end Application->>libyaml Parser: yaml_parser_delete()
Sequence diagram illustrating the event-driven parsing flow of libyaml.
Setting Up libyaml and Basic Parsing
Before you can parse YAML, you need to include the libyaml
header and initialize the parser and event objects. You'll also need to open the YAML file for reading. The core loop involves calling yaml_parser_parse()
to get the next event and then processing it. Remember to clean up resources by deleting the parser and event objects and closing the file.
#include <yaml.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *fp = fopen("config.yaml", "r");
if (!fp) {
perror("Failed to open config.yaml");
return 1;
}
yaml_parser_t parser;
yaml_event_t event;
if (!yaml_parser_initialize(&parser)) {
fputs("Failed to initialize parser!\n", stderr);
return 1;
}
yaml_parser_set_input_file(&parser, fp);
do {
if (!yaml_parser_parse(&parser, &event)) {
fprintf(stderr, "Parser error: %s\n", parser.problem);
yaml_parser_delete(&parser);
fclose(fp);
return 1;
}
// Process event here (e.g., print event type)
switch (event.type) {
case YAML_STREAM_START_EVENT: puts("STREAM START"); break;
case YAML_STREAM_END_EVENT: puts("STREAM END"); break;
case YAML_DOCUMENT_START_EVENT: puts("DOCUMENT START"); break;
case YAML_DOCUMENT_END_EVENT: puts("DOCUMENT END"); break;
case YAML_MAPPING_START_EVENT: puts("MAPPING START"); break;
case YAML_MAPPING_END_EVENT: puts("MAPPING END"); break;
case YAML_SEQUENCE_START_EVENT: puts("SEQUENCE START"); break;
case YAML_SEQUENCE_END_EVENT: puts("SEQUENCE END"); break;
case YAML_SCALAR_EVENT: printf("SCALAR: %s\n", event.data.scalar.value); break;
// Add other event types as needed
default: break;
}
if (event.type != YAML_STREAM_END_EVENT) {
yaml_event_delete(&event);
}
} while (event.type != YAML_STREAM_END_EVENT);
yaml_event_delete(&event);
yaml_parser_delete(&parser);
fclose(fp);
return 0;
}
Basic libyaml parsing loop to print all event types.
libyaml
. For example: gcc -o myparser myparser.c -lyaml
.Extracting Specific Values from YAML
To extract specific values, you need to maintain state within your parsing logic. This often involves tracking the current 'path' within the YAML document (e.g., which mapping key you're currently inside) and then acting when a SCALAR
event matches your desired key. For nested structures, you might use a stack to keep track of mapping keys or sequence indices.
# config.yaml
server:
host: localhost
port: 8080
database:
type: postgres
credentials:
user: admin
password: securepassword
features:
- logging
- caching
#include <yaml.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Simple state tracking for demonstration
enum ParseState {
STATE_NONE,
STATE_SERVER,
STATE_HOST,
STATE_PORT,
STATE_DATABASE,
STATE_DB_TYPE,
STATE_CREDENTIALS,
STATE_DB_USER,
STATE_DB_PASSWORD
};
int main() {
FILE *fp = fopen("config.yaml", "r");
if (!fp) {
perror("Failed to open config.yaml");
return 1;
}
yaml_parser_t parser;
yaml_event_t event;
enum ParseState state = STATE_NONE;
char *server_host = NULL;
int server_port = 0;
char *db_user = NULL;
if (!yaml_parser_initialize(&parser)) {
fputs("Failed to initialize parser!\n", stderr);
return 1;
}
yaml_parser_set_input_file(&parser, fp);
do {
if (!yaml_parser_parse(&parser, &event)) {
fprintf(stderr, "Parser error: %s\n", parser.problem);
yaml_parser_delete(&parser);
fclose(fp);
return 1;
}
switch (event.type) {
case YAML_MAPPING_START_EVENT:
// Push current state onto a stack if needed for complex parsing
break;
case YAML_MAPPING_END_EVENT:
// Pop state from stack
state = STATE_NONE; // Reset for simplicity in this example
break;
case YAML_SCALAR_EVENT:
if (state == STATE_NONE) {
if (strcmp((char *)event.data.scalar.value, "server") == 0) {
state = STATE_SERVER;
} else if (strcmp((char *)event.data.scalar.value, "database") == 0) {
state = STATE_DATABASE;
}
} else if (state == STATE_SERVER) {
if (strcmp((char *)event.data.scalar.value, "host") == 0) {
state = STATE_HOST;
} else if (strcmp((char *)event.data.scalar.value, "port") == 0) {
state = STATE_PORT;
}
} else if (state == STATE_DATABASE) {
if (strcmp((char *)event.data.scalar.value, "credentials") == 0) {
state = STATE_CREDENTIALS;
}
} else if (state == STATE_CREDENTIALS) {
if (strcmp((char *)event.data.scalar.value, "user") == 0) {
state = STATE_DB_USER;
} else if (strcmp((char *)event.data.scalar.value, "password") == 0) {
state = STATE_DB_PASSWORD;
}
} else if (state == STATE_HOST) {
server_host = strdup((char *)event.data.scalar.value);
state = STATE_SERVER; // Go back to parent state
} else if (state == STATE_PORT) {
server_port = atoi((char *)event.data.scalar.value);
state = STATE_SERVER; // Go back to parent state
} else if (state == STATE_DB_USER) {
db_user = strdup((char *)event.data.scalar.value);
state = STATE_CREDENTIALS; // Go back to parent state
} else if (state == STATE_DB_PASSWORD) {
// For security, don't store passwords directly in plain text in real apps
printf("Found DB Password (not stored): %s\n", event.data.scalar.value);
state = STATE_CREDENTIALS; // Go back to parent state
}
break;
default: break;
}
if (event.type != YAML_STREAM_END_EVENT) {
yaml_event_delete(&event);
}
} while (event.type != YAML_STREAM_END_EVENT);
yaml_event_delete(&event);
yaml_parser_delete(&parser);
fclose(fp);
printf("\n--- Parsed Values ---\n");
if (server_host) {
printf("Server Host: %s\n", server_host);
free(server_host);
}
printf("Server Port: %d\n", server_port);
if (db_user) {
printf("Database User: %s\n", db_user);
free(db_user);
}
return 0;
}
C code to parse config.yaml
and extract specific values using state tracking.
yaml-cpp
) that builds a DOM tree.This example demonstrates a basic state machine to identify and capture specific scalar values. For more complex YAML structures, especially those with arbitrary nesting or sequences, you would typically implement a stack to keep track of the current context (e.g., which mapping key or sequence index you are currently processing). This allows you to correctly identify the 'parent' of a scalar value and store it in the appropriate data structure.