How to speed up shutil.copy()?

Learn how to speed up shutil.copy()? with practical examples, diagrams, and best practices. Covers python, shutil development techniques with visual explanations.

Optimizing shutil.copy() Performance in Python

Hero image for How to speed up shutil.copy()?

Learn how to significantly speed up file copy operations in Python using shutil.copy() and explore alternative methods for better performance.

The shutil module in Python provides high-level operations on files and collections of files, with shutil.copy() being a commonly used function for copying files. While convenient, its performance can become a bottleneck when dealing with large files or a high volume of smaller files. This article delves into understanding shutil.copy()'s behavior and explores various strategies and alternatives to optimize file copy speeds in Python.

Understanding shutil.copy() and Its Limitations

The shutil.copy() function is a wrapper around shutil.copyfile() and shutil.copymode(). It copies the contents of the source file to the destination file, preserving the file's permission bits. For many use cases, it's perfectly adequate. However, its default implementation often involves reading the entire source file into memory (or in chunks) and then writing it to the destination. This process can be I/O bound and CPU-bound, especially on systems with slow disk I/O or when copying across network shares.

Hero image for How to speed up shutil.copy()?

Default shutil.copy() Process Flow

Strategies for Faster File Copying

Optimizing file copy speed often involves reducing I/O operations, leveraging system-level utilities, or using more efficient Python libraries. Here are several approaches to consider:

1. Adjusting Buffer Size for shutil.copyfile()

While shutil.copy() doesn't directly expose a buffer size parameter, shutil.copyfile() (which shutil.copy() uses internally) does. By default, shutil.copyfile() uses a buffer size of 16KB. Increasing this buffer size can sometimes improve performance, especially for large files, as it reduces the number of read/write calls to the operating system. However, setting it too large can consume excessive memory without further performance gains, or even degrade performance due to cache misses.

import shutil
import os

def copy_with_custom_buffer(src, dst, buffer_size=1024*1024): # 1MB buffer
    """Copies a file with a custom buffer size."""
    with open(src, 'rb') as fsrc:
        with open(dst, 'wb') as fdst:
            shutil.copyfileobj(fsrc, fdst, buffer_size)
    shutil.copymode(src, dst)

# Example usage:
# create a dummy file for testing
with open('source.txt', 'wb') as f:
    f.write(os.urandom(10 * 1024 * 1024)) # 10MB file

copy_with_custom_buffer('source.txt', 'destination_buffered.txt')
print("File copied using custom buffer size.")

os.remove('source.txt')
os.remove('destination_buffered.txt')

Using shutil.copyfileobj() with a custom buffer size

2. Leveraging OS-Level Copy Utilities

For maximum performance, especially on Linux/Unix systems, delegating the copy operation to the operating system's native utilities (like cp or rsync) can be significantly faster. These utilities are highly optimized and can often perform kernel-level optimizations that Python's shutil cannot. The subprocess module allows you to execute these commands from Python.

import subprocess
import os

def copy_with_os_cp(src, dst):
    """Copies a file using the OS 'cp' command."""
    try:
        # -p preserves mode, ownership, and timestamps
        subprocess.run(['cp', '-p', src, dst], check=True)
        print(f"File '{src}' copied to '{dst}' using 'cp'.")
    except subprocess.CalledProcessError as e:
        print(f"Error copying file with 'cp': {e}")
    except FileNotFoundError:
        print("Error: 'cp' command not found. Are you on a Unix-like system?")

def copy_with_os_rsync(src, dst):
    """Copies a file using the OS 'rsync' command."""
    try:
        # -a for archive mode (preserves permissions, timestamps, etc.)
        # -h for human-readable output (optional)
        # -P for progress and partial transfers (optional)
        subprocess.run(['rsync', '-ahP', src, dst], check=True)
        print(f"File '{src}' copied to '{dst}' using 'rsync'.")
    except subprocess.CalledProcessError as e:
        print(f"Error copying file with 'rsync': {e}")
    except FileNotFoundError:
        print("Error: 'rsync' command not found.")

# Example usage (ensure source.txt exists):
# with open('source.txt', 'wb') as f:
#     f.write(os.urandom(10 * 1024 * 1024))

# copy_with_os_cp('source.txt', 'destination_cp.txt')
# copy_with_os_rsync('source.txt', 'destination_rsync.txt')

# os.remove('source.txt')
# os.remove('destination_cp.txt')
# os.remove('destination_rsync.txt')

Using subprocess to call cp or rsync

3. Using os.sendfile() (Linux/Unix specific)

For Linux and Unix-like systems, os.sendfile() provides a highly efficient way to copy data between two file descriptors. It performs the copy entirely within the kernel, avoiding costly user-space buffer copies. This is often the fastest method for local file copies on supported systems.

import os
import shutil

def copy_with_sendfile(src, dst):
    """Copies a file using os.sendfile (Linux/Unix only)."""
    if not hasattr(os, 'sendfile'):
        print("os.sendfile is not available on this system. Falling back to shutil.copy.")
        shutil.copy(src, dst)
        return

    try:
        with open(src, 'rb') as fsrc:
            with open(dst, 'wb') as fdst:
                # sendfile copies up to count bytes from file descriptor in to file descriptor out
                # offset is the starting position in the input file
                # We copy the entire file, so count is file size, offset is 0
                os.sendfile(fdst.fileno(), fsrc.fileno(), 0, os.fstat(fsrc.fileno()).st_size)
        shutil.copymode(src, dst)
        print(f"File '{src}' copied to '{dst}' using os.sendfile.")
    except OSError as e:
        print(f"Error copying file with os.sendfile: {e}. Falling back to shutil.copy.")
        shutil.copy(src, dst)

# Example usage (ensure source.txt exists):
# with open('source.txt', 'wb') as f:
#     f.write(os.urandom(10 * 1024 * 1024))

# copy_with_sendfile('source.txt', 'destination_sendfile.txt')

# os.remove('source.txt')
# os.remove('destination_sendfile.txt')

Implementing file copy using os.sendfile()

4. Asynchronous I/O for Concurrent Copies

If your bottleneck isn't the speed of a single file copy, but rather the need to copy many files concurrently without blocking your main program, asynchronous I/O with asyncio can be beneficial. This doesn't necessarily make individual copies faster, but it allows your application to do other work while copies are in progress, or to initiate multiple copies in parallel.

import asyncio
import shutil
import os

async def async_copy_file(src, dst):
    """Asynchronously copies a file using shutil.copy."""
    print(f"Starting async copy of {src} to {dst}")
    # Use run_in_executor to run blocking shutil.copy in a separate thread
    await asyncio.to_thread(shutil.copy, src, dst)
    print(f"Finished async copy of {src} to {dst}")

async def main():
    # Create dummy files for testing
    for i in range(3):
        with open(f'source_{i}.txt', 'wb') as f:
            f.write(os.urandom(5 * 1024 * 1024)) # 5MB each

    tasks = [
        async_copy_file('source_0.txt', 'destination_async_0.txt'),
        async_copy_file('source_1.txt', 'destination_async_1.txt'),
        async_copy_file('source_2.txt', 'destination_async_2.txt')
    ]
    await asyncio.gather(*tasks)

    # Clean up dummy files
    for i in range(3):
        os.remove(f'source_{i}.txt')
        os.remove(f'destination_async_{i}.txt')

if __name__ == "__main__":
    asyncio.run(main())
    print("All async copies completed and cleaned up.")

Asynchronous file copying using asyncio.to_thread

Conclusion

While shutil.copy() is a convenient and generally sufficient tool for file copying in Python, performance-critical applications may require more optimized approaches. By understanding the underlying mechanisms and leveraging alternatives like adjusting buffer sizes, utilizing OS-level commands, or employing os.sendfile() for kernel-level copies, you can significantly improve the speed of your file transfer operations. Always choose the method that best fits your specific operating system, performance requirements, and complexity tolerance.