Batch Image Optimization with Python: Using Pillow & OpenCV
Use Python, Pillow and OpenCV for batch image optimization that standardizes e-commerce visuals and speeds up web publishing.
2024-11-20An e-commerce client came to us with the following situation: 12,000 product images, all in different sizes and formats, and all of them needed to be standardized. A graphic designer was doing it by hand — it took days, and the same work had to be repeated every catalog update. When we automated it with Python, the same job came down to 20 minutes. Zero human intervention, repeatable, free.
This post explains the technical approach behind that image optimization solution.
Why Batch Image Processing Is Needed
E-commerce platforms, content management systems, and media archives accumulate thousands — even millions — of images over time. Processing these images manually one by one is both a waste of time and a process prone to human error.
The most common use cases for batch image processing are:
- Standardizing e-commerce product photos to a uniform size
- Optimizing heavy images as part of website speed optimization
- Converting archive photos to WebP format
- Cropping to different aspect ratios for social media posts
Python + Pillow + OpenCV Architecture
The two libraries have complementary strengths:
Pillow is strong at format conversion, basic resizing, and metadata management. It supports dozens of formats including JPEG, PNG, WebP, TIFF, and BMP. You can process images while preserving EXIF data.
OpenCV is preferred for advanced operations such as pixel-level manipulation, face/object detection, and smart cropping. When performing content-aware cropping (smart crop) in particular, using OpenCV’s object detection features improves results dramatically.
Core Architecture
from PIL import Image
import cv2
import json
import os
from pathlib import Path
class ImageProcessor:
def __init__(self, config_path: str):
with open(config_path) as f:
self.config = json.load(f)
def process_directory(self, input_dir: str, output_dir: str, dry_run: bool = False):
images = list(Path(input_dir).glob("**/*"))
images = [i for i in images if i.suffix.lower() in self.config["supported_formats"]]
for img_path in images:
if dry_run:
self._preview(img_path)
else:
self._process(img_path, output_dir)
JSON-Based Configuration
For flexible settings management, we keep all parameters in a JSON file. This way, even non-technical team members can change basic settings:
{
"supported_formats": [".jpg", ".jpeg", ".png", ".webp", ".tiff"],
"output_format": "webp",
"quality": 85,
"resize": {
"enabled": true,
"max_width": 1920,
"max_height": 1080,
"maintain_aspect_ratio": true
},
"crop": {
"enabled": false,
"width": 800,
"height": 600,
"smart_crop": true
},
"strip_metadata": false,
"naming": "{original_name}_optimized"
}
Dry-Run: Preview Before Processing
Seeing the result before applying it to thousands of images is critically important. Dry-run mode reports what will happen without performing the actual operation:
def _preview(self, img_path: Path):
with Image.open(img_path) as img:
original_size = img_path.stat().st_size / 1024
estimated_output = self._estimate_size(img)
print(f"[DRY RUN] {img_path.name}")
print(f" Size: {img.size[0]}x{img.size[1]} -> {self._target_size(img)}")
print(f" File: {original_size:.1f}KB -> ~{estimated_output:.1f}KB")
print(f" Format: {img.format} -> {self.config['output_format'].upper()}")
You can also write this output to a CSV — then review it in Excel, approve it, and launch the actual process.
Supported Formats and Operations
Formats
- JPEG — lossy compression with quality parameter
- PNG — transparency support, lossless
- WebP — modern web standard, 25–35% smaller than JPEG
- TIFF — high-quality archive for publishing and print
Operations
Resize: Works by maintaining the aspect ratio or filling to an exact size with crop. We use Lanczos interpolation for downscaling and bicubic interpolation for upscaling.
Crop: Standard center-crop or smart crop with OpenCV. Smart crop attempts to center the main subject in the frame.
Format Conversion: Converting from any source format to the target format. When migrating from PNG to WebP, file sizes shrink by an average of 40%.
Performance: Optimization for Thousands of Images
We parallelize the process using Python’s concurrent.futures module:
from concurrent.futures import ThreadPoolExecutor, as_completed
def process_bulk(self, images: list, output_dir: str):
with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
futures = {executor.submit(self._process, img, output_dir): img for img in images}
for future in as_completed(futures):
img = futures[future]
try:
result = future.result()
print(f"[OK] {img.name} -> {result['saved_kb']:.1f}KB saved")
except Exception as e:
print(f"[ERROR] {img.name}: {e}")
Processing time for 10,000 images on a typical 8-core machine:
- Sequential: ~45 minutes
- Parallel (8 threads): ~7 minutes
Practical Decision Guide: Which Tool for Which Situation?
A simple guide for deciding when to use this solution:
Is Pillow sufficient? For standard resizing, format conversion, and EXIF management — yes. Product photo standardization, archive conversion, and web optimization are all solved with Pillow.
Should you add OpenCV? Yes, if you need content-aware cropping (automatically centering a product object), face-detection-based profile photo cropping, or image quality analysis.
When is parallel processing essential? Above 1,000 images. Below that you may not notice the difference; above it, sequential processing can turn a morning into an all-day affair.
Conclusion
The Python + Pillow + OpenCV combination provides a powerful and flexible solution that meets enterprise-scale image processing needs. With JSON-based configuration you can create and manage different profiles for different projects. Thanks to dry-run mode, you can test safely before applying to large archives.
This approach is a concrete example of the automation solutions Barlas Dijital develops for SMEs: instead of buying a standard tool, a custom and reusable solution tailored to the need. In Defys, where e-commerce and automation come together, this kind of standardization work also reduces operational load significantly. Rather than spending hours processing 12,000 images with every catalog update, completing the same job in 20 minutes creates a measurable difference in both cost and team capacity. If you have a similar automation need or want to optimize your existing image processing workflows, get in touch with us.
Related Services
Connect accounting, e-invoice, payment and stock data into an automated, traceable and reliable operational flow.
API Development & IntegrationConnect your systems with RESTful and GraphQL API development, secure webhook infrastructure and reliable third-party integrations.
Automation & Workflow AutomationAutomate repetitive workflows with ERP integrations, scheduled tasks and data flows that run reliably across business systems.
Browser Extension DevelopmentBuild custom browser extensions for Chrome, Firefox and Edge that automate workflows, collect data and integrate with internal systems.