Use gImageReader to Extract Text From Images and PDFs on Linux

Brief: gImageReader is a GUI tool to utilize tesseract OCR engine for extracting texts from images and PDF files in Linux.

gImageReader is a front-end for Tesseract Open Source OCR Engine. Tesseract was originally developed at HP and then was open-sourced in 2006.

Basically, the OCR (Optical Character Recognition) engine lets you scan texts from a picture or a file (PDF). It can detect several languages by default and also supports scanning through Unicode characters.

However, the Tesseract by itself is a command-line tool without any GUI. So, here, gImageReader comes to the rescue to let any user utilize it to extract text from images and files.

Let me highlight a few things about it while mentioning my experience with it for the time I tested it out.

gImageReader: A Cross-Platform Front-End to Tesseract OCR

To simplify things, gImageReader comes in handy to extract text from a PDF file or an image that contains any kind of text.

Whether you need it for spellcheck or translation, it should be useful for a specific group of users.

To sum up the features in a list, here’s what you can do with it:

Add PDF documents and images from disk, scanning devices, clipboard and screenshots
Ability to rotate images
Common image controls to adjust brightness, contrast, and resolution
Scan images directly through the app
Ability to process multiple images or files in one go
Manual or automatic recognition area definition
Recognize to plain text or to hOCR documents
Editor to display the recognized text
Can spellcheck the text extracted
Convert/Export to PDF documents from hOCR document
Export extracted text as a .txt file
Cross-platform (Windows)

Installing gImageReader on Linux

Note: You need to explicitly install Tesseract language packs to detect from images/files from your software manager.

You can find gImageReader in the default repositories for some Linux distributions like Fedora and Debian.

For Ubuntu, you need to add a PPA and then install it. To do that, here’s what you need to type in the terminal:

sudo add-apt-repository ppa:sandromani/gimagereader
sudo apt update
sudo apt install gimagereader

You can also find it for openSUSE from its build service and AUR will be the place for Arch Linux users.

All the links to the repositories and the packages can be found in their GitHub page.

Experience with gImageReader

gImageReader is a quite useful tool for extracting texts from images when you need them. It works great when you try from a PDF file.

For extracting images from a picture shot on a smartphone, the detection was close but a bit inaccurate. Maybe when you scan something, recognition of characters from the file could be better.

So, you’ll have to try it for yourself to see how well it works for your use-case. I tried it on Linux Mint 20.1 (based on Ubuntu 20.04).

I just had an issue to manage languages from the settings and I didn’t get a quick solution for that. If you encounter the issue, you might want to troubleshoot it and explore more about it how to fix it.

Other than that, it worked just fine.

Do give it a try and let me know how it worked for you! If you know of something similar (and better), do let me know about it in the comments below.

Mira Murati, along with two other key people, leaves OpenAI OpenAI’s CTO, Mira Murati, just left the company. She’s been one of the key people involved in getting ChatGPT, GPT-4, DALL-E and more out to the world. According to Mira’s note to the OpenAI team, she’s stepping away to “create time and space to do

The Growing Demand for Specialized Linux Solutions As the Linux market is set to soar to nearly USD 100 billion by 2032,1 businesses are facing mounting challenges in managing increasingly complex workloads spanning from the cloud to the edge. Traditional Linux distributions are not built to meet the specific demands of these modern use cases, creating

WordPress widgets are small blocks of content that can be added to your website’s sidebars, footers, or other widget-ready areas. They allow you to add various features and functionalities to your website without having to know coding or editing your website’s theme. These widgets are an essential aspect of WordPress, and knowing how to use… […]

15 Essential WordPress Plugins For Every Site

WordPress is the most widely used content management system (CMS) in the world, powering over 40% of all websites. One of the key reasons for its popularity is its vast selection of plugins. WordPress plugins are pieces of software that can be added to your site to enhance its functionality, design, and performance. Plugins are… […]

In the competitive landscape of WordPress hosting, delivering fast, secure, and reliable services is crucial. But… The post Unlock the Full Potential of Your WordPress Hosting: Webinar on AccelerateWP + Imunify360 appeared first on LinuxPunx.

Everything from global policy choices to targeted marketing is powered by data in today’s society. But… The post Data Ethics and Society: The Human Impact of Data Collection – nandbox Native App Builder appeared first on LinuxPunx.

Use gImageReader to Extract Text From Images and PDFs on Linux

gImageReader: A Cross-Platform Front-End to Tesseract OCR

Installing gImageReader on Linux

Experience with gImageReader

GhostBSD 23.10.1 Quick Overview #shorts

Get started with Linux containers in Docker on WSL 2

the EASIEST way to build a website

Run your own AI (but private)

Cut, Copy and Paste in Vim

Nobara Project 38 overview | a modified version of Fedora Linux with user-friendly fixes added to it

Design

Magento eCommerce

Fast Domains

Cloud Cluster

gImageReader: A Cross-Platform Front-End to Tesseract OCR

Installing gImageReader on Linux

Experience with gImageReader

Similar Posts