Skip Navigation

Search

Paperless office; document/image processing @sopuli.xyz

Preparing a PDF for a lawyer and other orgs to use in court. PDF bookmarks, evidence labels, etc. Using LaTeX.

Paperless office; document/image processing @sopuli.xyz

TIFF → DjVu conversion produces bigger file from bilevel doc than color

I would like to get to the bottom of what I am doing wrong that leads to black and white documents having a bigger filesize than color.

My process for a color TIFF is like this:

tiff2pdfocrmypdfpdf2djvu

Resulting color DjVu file is ~56k. When pdfimages -all runs on the intermediate PDF file, it shows CCITT (fax) is inside.

My process for a black and white TIFF is the same:

tiff2pdfocrmypdfpdf2djvu

Resulting black and white DjVu file is ~145k (almost 3× the color size). When pdfimages -all runs on the intermediate PDF file, it shows a PNG file is inside. If I replace step ① with ImageMagick’s convert, the first PDF is 10mb, but in the end the resulting djvu file is still ~145k. And PNG is still inside the intermediate PDF.

I can get the bitonal (bilevel) image smaller by using cjb2 -clean, which goes straight from TIFF to DjVu, but then I can’t OCR it due to the lack of PDF intermediate version. And the size is still bigger than t