in code

Batch compressing PDF files on Mac and Linux

The Problem

As part of optimizing workflows at Telary Studio, I’ve been improving various operational processes, including bookkeeping. I recently started using Cakedesk to write proposals and invoices. One of its standout features is the ability to create fully custom PDF templates—perfect for maintaining a professional and branded appearance in client documents. However, I ran into a problem: the generated PDFs are extremely large due to embedded fonts and lack of image compression. This made sharing and archiving invoices cumbersome, so I set out to find a solution to batch compress these PDFs.

Given the sensitivity of client data, uploading files to online “compress PDF” services was not an option. I needed a tool that would meet the following requirements:

Compressing PDF files with Ghostscript

Ghostscript could easily be called the “ffmpeg of PostScript and PDF.” It’s a powerful command-line utility that allows for extensive processing of PDF files, including compression. Its flexibility and local-first nature made it an ideal choice.

Compressing a Single PDF with Ghostscript

To compress a single PDF file, you can use the following command:

gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dPDFSETTINGS=/${3:-"screen"} -dCompatibilityLevel=1.4 -dDownsampleColorImages=true -dColorImageResolution=300 -dDownsampleGrayImages=true -dGrayImageResolution=300 -dDownsampleMonoImages=true -dMonoImageResolution=300 -sOutputFile="$2" "$1"

To avoid remembering all these parameters every time, I wrote a simple Bash script that can be saved into the .zshrc (or else) file to streamline the process:

# Usage: compresspdf [input file] [output file] [screen*|ebook|printer|prepress]

compressPdf() {
    gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH \
        -dCompatibilityLevel=1.4 \
        -dPDFSETTINGS=/${3:-"screen"} \
        -dDownsampleColorImages=true -dColorImageResolution=300 \
        -dDownsampleGrayImages=true -dGrayImageResolution=300 \
        -dDownsampleMonoImages=true -dMonoImageResolution=300 \
        -sOutputFile="$2" "$1"
}

Batch compressing multiple PDF files

Since I usually just want to compress all PDF files in a folder, I created a simple script that can be run from the terminal to compress all PDF files in a folder. This script will create a new folder called compressed_pdfs in the same directory as the input folder and compress all PDF files in the input folder.

# Usage: compressPdfs [input_directory]

compressPdfs() {
  # Check if an input directory is provided
  if [ -z "$1" ]; then
      echo "Usage: compressPdfs <input_directory>"
      return 1
  fi

  # Define input and output directories
  local input_dir="$1"
  local output_dir="${input_dir}/compressed_pdfs"

  # Create the output directory if it doesn't exist
  mkdir -p "$output_dir"

  # Loop through all PDF files in the input directory
  for input_file in "$input_dir"/*.pdf; do
      # Check if any PDFs exist in the input directory
      if [ ! -e "$input_file" ]; then
          echo "No PDF files found in $input_dir."
          return 1
      fi

      local base_name=$(basename "$input_file")
      local output_file="${output_dir}/${base_name}"

      # Run the Ghostscript command with corrected parameters
      gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH \
          -dCompatibilityLevel=1.4 \
          -dPDFSETTINGS=/screen \
          -dDownsampleColorImages=true -dColorImageResolution=300 \
          -dDownsampleGrayImages=true -dGrayImageResolution=300 \
          -dDownsampleMonoImages=true -dMonoImageResolution=300 \
          -sOutputFile="$output_file" "$input_file" \
      || echo "Failed to process $input_file"

      # Feedback for each processed file
      if [ -e "$output_file" ]; then
          echo "Compressed $input_file -> $output_file"
      else
          echo "Error: Output not created for $input_file"
      fi
  done

  echo "Processing completed. Check $output_dir for compressed PDFs."
}

This batch script compresses 10 PDF files that are ~2MB each to ~100KB each, within just two seconds. Of course, the results vary and depend on the parameters used.