Free software solutions for Linux that can run OCR on PDF documents and convert them to searchable PDF. Pdf ocr x language pack, but most have a major drawback. They can only export plain text of the OCR’ed image and do not support embedding text into the PDF in order to make a searchable PDF. By searchable PDF, we refer to a scanned PDF document that contains invisible OCR’ed text over the scanned image.
The text should have the right size in order to be placed over the text portions from image. Every word from the text layer should overlay exactly on the portion of the image that contains that word. Here are two software solutions that are able to create searchable PDFs. One is a native Linux OCR engine and the other is a free PDF reader with OCR capabilities running in Wine. The only problem is that it only accepts image input.
So you can’t feed it a PDF document. Things get complicated if you already have a PDF document that you want to make searchable. And to do this, you must know the resolution of the scanned image. And this can be a problem if you didn’t scan the document and have no idea what resolution it is. Not only it extracts all pages from PDF as images, but it also pre-processes them for OCR using multiple threads. You can download the DEB package from the website and you can install it with GDebi. PDF already contains processed images and you don’t want any other processing.