Batch Image Optimization with Python: Using Pillow & OpenCV

Use Python, Pillow and OpenCV for batch image optimization that standardizes e-commerce visuals and speeds up web publishing.

2024-11-20

An e-commerce client came to us with the following situation: 12,000 product images, all in different sizes and formats, and all of them needed to be standardized. A graphic designer was doing it by hand — it took days, and the same work had to be repeated every catalog update. When we automated it with Python, the same job came down to 20 minutes. Zero human intervention, repeatable, free.

This post explains the technical approach behind that image optimization solution.

Why Batch Image Processing Is Needed

E-commerce platforms, content management systems, and media archives accumulate thousands — even millions — of images over time. Processing these images manually one by one is both a waste of time and a process prone to human error.

The most common use cases for batch image processing are:

  • Standardizing e-commerce product photos to a uniform size
  • Optimizing heavy images as part of website speed optimization
  • Converting archive photos to WebP format
  • Cropping to different aspect ratios for social media posts

Python + Pillow + OpenCV Architecture

The two libraries have complementary strengths:

Pillow is strong at format conversion, basic resizing, and metadata management. It supports dozens of formats including JPEG, PNG, WebP, TIFF, and BMP. You can process images while preserving EXIF data.

OpenCV is preferred for advanced operations such as pixel-level manipulation, face/object detection, and smart cropping. When performing content-aware cropping (smart crop) in particular, using OpenCV’s object detection features improves results dramatically.

Core Architecture

from PIL import Image
import cv2
import json
import os
from pathlib import Path

class ImageProcessor:
    def __init__(self, config_path: str):
        with open(config_path) as f:
            self.config = json.load(f)

    def process_directory(self, input_dir: str, output_dir: str, dry_run: bool = False):
        images = list(Path(input_dir).glob("**/*"))
        images = [i for i in images if i.suffix.lower() in self.config["supported_formats"]]

        for img_path in images:
            if dry_run:
                self._preview(img_path)
            else:
                self._process(img_path, output_dir)

JSON-Based Configuration

For flexible settings management, we keep all parameters in a JSON file. This way, even non-technical team members can change basic settings:

{
  "supported_formats": [".jpg", ".jpeg", ".png", ".webp", ".tiff"],
  "output_format": "webp",
  "quality": 85,
  "resize": {
    "enabled": true,
    "max_width": 1920,
    "max_height": 1080,
    "maintain_aspect_ratio": true
  },
  "crop": {
    "enabled": false,
    "width": 800,
    "height": 600,
    "smart_crop": true
  },
  "strip_metadata": false,
  "naming": "{original_name}_optimized"
}

Dry-Run: Preview Before Processing

Seeing the result before applying it to thousands of images is critically important. Dry-run mode reports what will happen without performing the actual operation:

def _preview(self, img_path: Path):
    with Image.open(img_path) as img:
        original_size = img_path.stat().st_size / 1024
        estimated_output = self._estimate_size(img)

        print(f"[DRY RUN] {img_path.name}")
        print(f"  Size: {img.size[0]}x{img.size[1]} -> {self._target_size(img)}")
        print(f"  File: {original_size:.1f}KB -> ~{estimated_output:.1f}KB")
        print(f"  Format: {img.format} -> {self.config['output_format'].upper()}")

You can also write this output to a CSV — then review it in Excel, approve it, and launch the actual process.

Supported Formats and Operations

Formats

  • JPEG — lossy compression with quality parameter
  • PNG — transparency support, lossless
  • WebP — modern web standard, 25–35% smaller than JPEG
  • TIFF — high-quality archive for publishing and print

Operations

Resize: Works by maintaining the aspect ratio or filling to an exact size with crop. We use Lanczos interpolation for downscaling and bicubic interpolation for upscaling.

Crop: Standard center-crop or smart crop with OpenCV. Smart crop attempts to center the main subject in the frame.

Format Conversion: Converting from any source format to the target format. When migrating from PNG to WebP, file sizes shrink by an average of 40%.

Performance: Optimization for Thousands of Images

We parallelize the process using Python’s concurrent.futures module:

from concurrent.futures import ThreadPoolExecutor, as_completed

def process_bulk(self, images: list, output_dir: str):
    with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
        futures = {executor.submit(self._process, img, output_dir): img for img in images}

        for future in as_completed(futures):
            img = futures[future]
            try:
                result = future.result()
                print(f"[OK] {img.name} -> {result['saved_kb']:.1f}KB saved")
            except Exception as e:
                print(f"[ERROR] {img.name}: {e}")

Processing time for 10,000 images on a typical 8-core machine:

  • Sequential: ~45 minutes
  • Parallel (8 threads): ~7 minutes

Practical Decision Guide: Which Tool for Which Situation?

A simple guide for deciding when to use this solution:

Is Pillow sufficient? For standard resizing, format conversion, and EXIF management — yes. Product photo standardization, archive conversion, and web optimization are all solved with Pillow.

Should you add OpenCV? Yes, if you need content-aware cropping (automatically centering a product object), face-detection-based profile photo cropping, or image quality analysis.

When is parallel processing essential? Above 1,000 images. Below that you may not notice the difference; above it, sequential processing can turn a morning into an all-day affair.

Conclusion

The Python + Pillow + OpenCV combination provides a powerful and flexible solution that meets enterprise-scale image processing needs. With JSON-based configuration you can create and manage different profiles for different projects. Thanks to dry-run mode, you can test safely before applying to large archives.

This approach is a concrete example of the automation solutions Barlas Dijital develops for SMEs: instead of buying a standard tool, a custom and reusable solution tailored to the need. In Defys, where e-commerce and automation come together, this kind of standardization work also reduces operational load significantly. Rather than spending hours processing 12,000 images with every catalog update, completing the same job in 20 minutes creates a measurable difference in both cost and team capacity. If you have a similar automation need or want to optimize your existing image processing workflows, get in touch with us.