How can I parse a YAML file in Python
Categories:
Parsing YAML Files in Python: A Comprehensive Guide

Learn how to effectively read, parse, and manipulate YAML data in Python using the PyYAML library, covering basic loading, error handling, and common use cases.
YAML (YAML Ain't Markup Language) is a human-friendly data serialization standard often used for configuration files, data exchange between languages, and object persistence. Its clean, readable syntax makes it a popular choice over formats like XML or JSON for many applications. Python, with its rich ecosystem, provides excellent support for working with YAML files, primarily through the PyYAML
library. This article will guide you through the process of parsing YAML files, handling different data structures, and managing potential errors.
Getting Started: Installing PyYAML
Before you can parse YAML files, you need to install the PyYAML
library. It's the de facto standard for YAML processing in Python and can be easily installed using pip
.
pip install PyYAML
Install PyYAML using pip
Basic YAML Parsing: Loading Data
The most common operation is loading YAML data from a string or a file. PyYAML
provides yaml.load()
and yaml.safe_load()
for this purpose. It's highly recommended to use yaml.safe_load()
as it prevents the execution of arbitrary code, which can be a security risk when parsing untrusted YAML sources.
flowchart TD A[Start] --> B{YAML Source?} B -->|String| C[yaml.safe_load(string)] B -->|File| D[Open File] D --> E[yaml.safe_load(file_object)] C --> F[Python Object] E --> F F --> G[End]
Basic YAML Loading Process
Let's look at an example of a simple YAML file and how to parse it.
# config.yaml
application: MyWebApp
version: 1.0.0
database:
host: localhost
port: 5432
user: admin
features:
- authentication
- logging
- caching
Example config.yaml
file
import yaml
# --- Parsing from a string ---
yaml_string = """
name: John Doe
age: 30
cities:
- New York
- London
"""
data_from_string = yaml.safe_load(yaml_string)
print("Data from string:", data_from_string)
# --- Parsing from a file ---
try:
with open('config.yaml', 'r') as file:
config_data = yaml.safe_load(file)
print("\nData from file:", config_data)
print("Application name:", config_data['application'])
print("Database host:", config_data['database']['host'])
print("First feature:", config_data['features'][0])
except FileNotFoundError:
print("Error: config.yaml not found. Please create the file.")
except yaml.YAMLError as exc:
print(f"Error parsing YAML: {exc}")
Python code to parse YAML from a string and a file
Handling Different YAML Data Types
YAML supports various data types, which PyYAML
automatically converts to their Python equivalents:
- Mappings (Dictionaries): Key-value pairs become Python dictionaries.
- Sequences (Lists): Ordered lists of items become Python lists.
- Scalars: Strings, numbers (integers, floats), booleans (
true
/false
,yes
/no
), and null (null
,~
) are converted to their respective Python types.
# data_types.yaml
string_example: Hello, YAML!
integer_example: 123
float_example: 3.14
boolean_true: true
boolean_false: no
null_example: null
list_example:
- apple
- banana
- cherry
dictionary_example:
key1: value1
key2: value2
YAML file demonstrating various data types
import yaml
with open('data_types.yaml', 'r') as file:
data = yaml.safe_load(file)
print(f"String: {data['string_example']} (Type: {type(data['string_example'])})")
print(f"Integer: {data['integer_example']} (Type: {type(data['integer_example'])})")
print(f"Float: {data['float_example']} (Type: {type(data['float_example'])})")
print(f"Boolean True: {data['boolean_true']} (Type: {type(data['boolean_true'])})")
print(f"Boolean False: {data['boolean_false']} (Type: {type(data['boolean_false'])})")
print(f"Null: {data['null_example']} (Type: {type(data['null_example'])})")
print(f"List: {data['list_example']} (Type: {type(data['list_example'])})")
print(f"Dictionary: {data['dictionary_example']} (Type: {type(data['dictionary_example'])})")
Python code to parse and inspect YAML data types
Error Handling During Parsing
YAML files can sometimes be malformed, leading to parsing errors. It's crucial to implement robust error handling to prevent your application from crashing. PyYAML
raises yaml.YAMLError
for parsing issues.
# malformed.yaml
key: value
- bad_indentation # This line is incorrectly indented
An example of a malformed YAML file
import yaml
try:
with open('malformed.yaml', 'r') as file:
malformed_data = yaml.safe_load(file)
print("Successfully parsed malformed YAML (this shouldn't happen).")
except FileNotFoundError:
print("Error: malformed.yaml not found. Please create the file.")
except yaml.YAMLError as exc:
print(f"Caught a YAML parsing error: {exc}")
if hasattr(exc, 'problem_mark'):
mark = exc.problem_mark
print(f"Error at line {mark.line + 1}, column {mark.column + 1}")
Python code demonstrating error handling for malformed YAML
yaml.safe_load()
instead of yaml.load()
when parsing YAML from untrusted sources. yaml.load()
can execute arbitrary Python code embedded in the YAML, posing a significant security vulnerability.