Specifying package data in pyproject.toml
Categories:
Mastering Package Data Specification in pyproject.toml
Learn how to effectively define and manage package data, including non-code files and assets, using pyproject.toml for modern Python projects.
The pyproject.toml
file has become the central configuration point for modern Python projects, replacing various older files like setup.py
and setup.cfg
for many common tasks. While it's widely used for build system configuration and dependency management, specifying package data—non-code files like images, static assets, configuration files, and documentation—is crucial for distributing functional packages. This article delves into how pyproject.toml
handles package data, focusing on best practices and common pitfalls, ensuring your Python packages include all necessary assets.
Understanding Package Data and its Importance
Package data refers to any files that are not Python source code but are essential for your package to function correctly or be fully usable. This can include:
- Static assets: CSS, JavaScript, images for web frameworks.
- Templates: HTML templates for web applications.
- Configuration files: Default settings or examples.
- Data files: CSVs, JSONs, or other data formats used by your package.
- Documentation: Markdown or reStructuredText files distributed with the package.
Failing to include these files means your package might not work as expected when installed by others, leading to FileNotFoundError
or incomplete functionality. Modern Python packaging, particularly with setuptools
, provides robust mechanisms to declare these files within pyproject.toml
.
Conceptual flow of package data through pyproject.toml
into an installed package.
Specifying Package Data with [tool.setuptools.packages.data]
For projects using setuptools
(which is common even with pyproject.toml
), the primary way to specify package data is through the [tool.setuptools.packages.data]
table. This table allows you to associate data files with specific Python packages within your project. The keys in this table correspond to your Python package names, and the values are lists of glob patterns or file paths relative to the package directory.
Consider a project structure like this:
my_package/
├── __init__.py
├── core.py
├── data/
│ └── config.json
└── static/
├── image.png
└── style.css
README.md
pyproject.toml
To include config.json
, image.png
, and style.css
in my_package
, your pyproject.toml
would look like the example below. Note that paths are relative to the package root (e.g., my_package/
), not the project root.
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "my-package"
version = "0.1.0"
[tool.setuptools.packages.data]
"my_package" = ["data/*.json", "static/*"]
Example pyproject.toml
showing package data specification for my_package
.
[tool.setuptools.packages.data]
are relative to the package directory itself, not the project root. If your package is my_package
and it has a data
folder inside it, you'd specify data/*.json
.Including Top-Level Files with include
and exclude
Sometimes you need to include files that are not directly inside a Python package directory but are at the project's root level, such as README.md
, LICENSE
, or CHANGELOG.md
. For these, setuptools
provides include
and exclude
options within the [tool.setuptools]
table, which apply to the source distribution (sdist).
These fields accept a list of glob patterns relative to the project root. Files included via include
are added to the source distribution, and subsequently, typically installed in the top-level site-packages/my_package-X.Y.Z.dist-info/
directory, making them accessible via importlib.resources
or similar mechanisms after installation.
Here's how you might include README.md
and LICENSE
:
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "my-package"
version = "0.1.0"
[tool.setuptools]
include = ["README.md", "LICENSE"]
[tool.setuptools.packages.data]
"my_package" = ["data/*.json", "static/*"]
Including README.md
and LICENSE
at the project root level.
include
and exclude
at the [tool.setuptools]
level. These affect the source distribution (sdist). If you're trying to include files within your Python package structure, [tool.setuptools.packages.data]
is the correct approach. Misusing include
can lead to unexpected files being bundled or critical ones being missed.Accessing Package Data at Runtime
Once your package is installed, you can't simply open files using relative paths like open('data/config.json')
because the installation location might be different from your development setup (e.g., site-packages
). Python's importlib.resources
module (or importlib.resources.files
for Python 3.9+) provides a standard, cross-platform way to access data files within installed packages.
This method works reliably whether your package is installed from a wheel, an editable install, or directly from a source distribution.
Here’s an example of how to read the config.json
file we specified earlier:
import importlib.resources
import json
def load_config():
# Use .joinpath() for constructing paths within the package
config_path = importlib.resources.files('my_package').joinpath('data/config.json')
with open(config_path, 'r') as f:
return json.load(f)
if __name__ == '__main__':
config = load_config()
print(f"Loaded configuration: {config}")
Python code to access config.json
using importlib.resources.files
.
importlib.resources.read_text
or importlib.resources.read_binary
which directly return file content as strings or bytes, respectively, without needing to open a file handle yourself. For directory traversal, the importlib_resources
backport library offers similar functionality to importlib.resources.files
.Best Practices for Package Data
Adhering to best practices ensures maintainable and robust package data handling:
- Keep Data with Code: Store data files logically alongside the Python modules that use them. This improves readability and makes it easier to manage related assets.
- Use Glob Patterns Carefully: While
*
is convenient, be specific with your glob patterns (e.g.,data/*.json
instead ofdata/*
) to avoid accidentally including unwanted files. - Test Your Package Installation: Always install your package in a clean virtual environment (e.g.,
pip install .
orpip install -e .
for editable installs) and verify that all data files are present and accessible usingimportlib.resources
. - Avoid Absolute Paths: Never hardcode absolute paths for data files; always rely on
importlib.resources
for runtime access. - Document Data Files: Clearly document which data files your package expects and where they should be located within the package structure for users who might want to inspect or modify them.
1. Step 1
Step 1: Define Project Structure: Organize your project with a clear separation of Python code and data files. For example, place configuration files in a data/
subdirectory within your package.
2. Step 2
Step 2: Configure pyproject.toml
: Use [tool.setuptools.packages.data]
to specify data files relative to your Python package directory. For top-level files like README.md
, use [tool.setuptools].include
.
3. Step 3
Step 3: Access Data at Runtime: Implement importlib.resources
in your Python code to reliably load package data, ensuring your package works correctly after installation.
4. Step 4
Step 4: Build and Test: Create a source distribution (python -m build
) and install it in a clean virtual environment. Test all functionalities that rely on package data to confirm everything is included and accessible.