How to speed up shutil.copy()?
Categories:
Optimizing shutil.copy()
Performance in Python

Learn how to significantly speed up file copy operations in Python using shutil.copy()
and explore alternative methods for better performance.
The shutil
module in Python provides high-level operations on files and collections of files, with shutil.copy()
being a commonly used function for copying files. While convenient, its performance can become a bottleneck when dealing with large files or a high volume of smaller files. This article delves into understanding shutil.copy()
's behavior and explores various strategies and alternatives to optimize file copy speeds in Python.
Understanding shutil.copy()
and Its Limitations
The shutil.copy()
function is a wrapper around shutil.copyfile()
and shutil.copymode()
. It copies the contents of the source file to the destination file, preserving the file's permission bits. For many use cases, it's perfectly adequate. However, its default implementation often involves reading the entire source file into memory (or in chunks) and then writing it to the destination. This process can be I/O bound and CPU-bound, especially on systems with slow disk I/O or when copying across network shares.

Default shutil.copy()
Process Flow
Strategies for Faster File Copying
Optimizing file copy speed often involves reducing I/O operations, leveraging system-level utilities, or using more efficient Python libraries. Here are several approaches to consider:
1. Adjusting Buffer Size for shutil.copyfile()
While shutil.copy()
doesn't directly expose a buffer size parameter, shutil.copyfile()
(which shutil.copy()
uses internally) does. By default, shutil.copyfile()
uses a buffer size of 16KB. Increasing this buffer size can sometimes improve performance, especially for large files, as it reduces the number of read/write calls to the operating system. However, setting it too large can consume excessive memory without further performance gains, or even degrade performance due to cache misses.
import shutil
import os
def copy_with_custom_buffer(src, dst, buffer_size=1024*1024): # 1MB buffer
"""Copies a file with a custom buffer size."""
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
shutil.copyfileobj(fsrc, fdst, buffer_size)
shutil.copymode(src, dst)
# Example usage:
# create a dummy file for testing
with open('source.txt', 'wb') as f:
f.write(os.urandom(10 * 1024 * 1024)) # 10MB file
copy_with_custom_buffer('source.txt', 'destination_buffered.txt')
print("File copied using custom buffer size.")
os.remove('source.txt')
os.remove('destination_buffered.txt')
Using shutil.copyfileobj()
with a custom buffer size
2. Leveraging OS-Level Copy Utilities
For maximum performance, especially on Linux/Unix systems, delegating the copy operation to the operating system's native utilities (like cp
or rsync
) can be significantly faster. These utilities are highly optimized and can often perform kernel-level optimizations that Python's shutil
cannot. The subprocess
module allows you to execute these commands from Python.
subprocess
to call external commands introduces platform dependency. The commands (cp
, rsync
) might not be available or behave identically across different operating systems (e.g., Windows vs. Linux).import subprocess
import os
def copy_with_os_cp(src, dst):
"""Copies a file using the OS 'cp' command."""
try:
# -p preserves mode, ownership, and timestamps
subprocess.run(['cp', '-p', src, dst], check=True)
print(f"File '{src}' copied to '{dst}' using 'cp'.")
except subprocess.CalledProcessError as e:
print(f"Error copying file with 'cp': {e}")
except FileNotFoundError:
print("Error: 'cp' command not found. Are you on a Unix-like system?")
def copy_with_os_rsync(src, dst):
"""Copies a file using the OS 'rsync' command."""
try:
# -a for archive mode (preserves permissions, timestamps, etc.)
# -h for human-readable output (optional)
# -P for progress and partial transfers (optional)
subprocess.run(['rsync', '-ahP', src, dst], check=True)
print(f"File '{src}' copied to '{dst}' using 'rsync'.")
except subprocess.CalledProcessError as e:
print(f"Error copying file with 'rsync': {e}")
except FileNotFoundError:
print("Error: 'rsync' command not found.")
# Example usage (ensure source.txt exists):
# with open('source.txt', 'wb') as f:
# f.write(os.urandom(10 * 1024 * 1024))
# copy_with_os_cp('source.txt', 'destination_cp.txt')
# copy_with_os_rsync('source.txt', 'destination_rsync.txt')
# os.remove('source.txt')
# os.remove('destination_cp.txt')
# os.remove('destination_rsync.txt')
Using subprocess
to call cp
or rsync
3. Using os.sendfile()
(Linux/Unix specific)
For Linux and Unix-like systems, os.sendfile()
provides a highly efficient way to copy data between two file descriptors. It performs the copy entirely within the kernel, avoiding costly user-space buffer copies. This is often the fastest method for local file copies on supported systems.
import os
import shutil
def copy_with_sendfile(src, dst):
"""Copies a file using os.sendfile (Linux/Unix only)."""
if not hasattr(os, 'sendfile'):
print("os.sendfile is not available on this system. Falling back to shutil.copy.")
shutil.copy(src, dst)
return
try:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
# sendfile copies up to count bytes from file descriptor in to file descriptor out
# offset is the starting position in the input file
# We copy the entire file, so count is file size, offset is 0
os.sendfile(fdst.fileno(), fsrc.fileno(), 0, os.fstat(fsrc.fileno()).st_size)
shutil.copymode(src, dst)
print(f"File '{src}' copied to '{dst}' using os.sendfile.")
except OSError as e:
print(f"Error copying file with os.sendfile: {e}. Falling back to shutil.copy.")
shutil.copy(src, dst)
# Example usage (ensure source.txt exists):
# with open('source.txt', 'wb') as f:
# f.write(os.urandom(10 * 1024 * 1024))
# copy_with_sendfile('source.txt', 'destination_sendfile.txt')
# os.remove('source.txt')
# os.remove('destination_sendfile.txt')
Implementing file copy using os.sendfile()
4. Asynchronous I/O for Concurrent Copies
If your bottleneck isn't the speed of a single file copy, but rather the need to copy many files concurrently without blocking your main program, asynchronous I/O with asyncio
can be beneficial. This doesn't necessarily make individual copies faster, but it allows your application to do other work while copies are in progress, or to initiate multiple copies in parallel.
import asyncio
import shutil
import os
async def async_copy_file(src, dst):
"""Asynchronously copies a file using shutil.copy."""
print(f"Starting async copy of {src} to {dst}")
# Use run_in_executor to run blocking shutil.copy in a separate thread
await asyncio.to_thread(shutil.copy, src, dst)
print(f"Finished async copy of {src} to {dst}")
async def main():
# Create dummy files for testing
for i in range(3):
with open(f'source_{i}.txt', 'wb') as f:
f.write(os.urandom(5 * 1024 * 1024)) # 5MB each
tasks = [
async_copy_file('source_0.txt', 'destination_async_0.txt'),
async_copy_file('source_1.txt', 'destination_async_1.txt'),
async_copy_file('source_2.txt', 'destination_async_2.txt')
]
await asyncio.gather(*tasks)
# Clean up dummy files
for i in range(3):
os.remove(f'source_{i}.txt')
os.remove(f'destination_async_{i}.txt')
if __name__ == "__main__":
asyncio.run(main())
print("All async copies completed and cleaned up.")
Asynchronous file copying using asyncio.to_thread
Conclusion
While shutil.copy()
is a convenient and generally sufficient tool for file copying in Python, performance-critical applications may require more optimized approaches. By understanding the underlying mechanisms and leveraging alternatives like adjusting buffer sizes, utilizing OS-level commands, or employing os.sendfile()
for kernel-level copies, you can significantly improve the speed of your file transfer operations. Always choose the method that best fits your specific operating system, performance requirements, and complexity tolerance.