-
 KDE-Apps.org Applications for the KDE-Desktop 
 GTK-Apps.org Applications using the GTK Toolkit 
 GnomeFiles.org Applications for GNOME 
 MeeGo-Central.org Applications for MeeGo 
 CLI-Apps.org Command Line Applications 
 Qt-Apps.org Free Qt Applications 
 Qt-Prop.org Proprietary Qt Applications 
 Maemo-Apps.org Applications for the Maemo Plattform 
 Java-Apps.org Free Java Applications 
 eyeOS-Apps.org Free eyeOS Applications 
 Wine-Apps.org Wine Applications 
 Server-Apps.org Server Applications 
 apps.ownCloud.com ownCloud Applications 
--
-
 KDE-Look.org Artwork for the KDE-Desktop 
 GNOME-Look.org Artwork for the GNOME-Desktop 
 Xfce-Look.org Artwork for the Xfce-Desktop 
 Box-Look.org Artwork for your Windowmanager 
 E17-Stuff.org Artwork for Enlightenment 
 Beryl-Themes.org Artwork for the Beryl Windowmanager 
 Compiz-Themes.org Artwork for the Compiz Windowmanager 
 EDE-Look.org Themes for your EDE Desktop 
--
-
 Debian-Art.org Stuff for Debian 
 Gentoo-Art.org Artwork for Gentoo Linux 
 SUSE-Art.org Artwork for openSUSE 
 Ubuntu-Art.org Artwork for Ubuntu 
 Kubuntu-Art.org Artwork for Kubuntu 
 LinuxMint-Art.org Artwork for Linux Mint 
 Arch-Stuff.org Art And Stuff for Arch Linux 
 Frugalware-Art.org Themes for Frugalware 
 Fedora-Art.org Artwork for Fedora Linux 
 Mandriva-Art.org Artwork for Mandriva Linux 
--
-
 KDE-Files.org Files for KDE Applications 
 OpenTemplate.org Documents for OpenOffice.org
 GIMPStuff.org Files for GIMP
 InkscapeStuff.org Files for Inkscape
 ScribusStuff.org Files for Scribus
 BlenderStuff.org Textures and Objects for Blender
 VLC-Addons.org Themes and Extensions for VLC
--
-
 KDE-Help.org Support for your KDE Desktop 
 GNOME-Help.org Support for your GNOME Desktop 
 Xfce-Help.org Support for your Xfce Desktop 
--
openDesktop.orgopenDesktop.org:   Applications   Artwork   Linux Distributions   Documents    LinuxDaily.com    Linux42.org    OpenSkillz.com   
 
Artwork
News
Groups
Knowledge
Events
Forum
People
Jobs
Register
Login


Sponsoring


-
- Content .- Fans  . 

TextRipper (aka T-Rip)

   2.0  

Nautilus Script

Score 66%
TextRipper (aka T-Rip)
zoom


TextRipper (aka T-Rip)
zoom


TextRipper (aka T-Rip)
zoom


Downloads:  1044
Submitted:  Sep 18 2010
Updated:  Jan 14 2011

Description:

An OCR, Optical Character Recognition, gui application or cli script
# Supports the Tesseract engine by default!
# Optionally supports the Ocrad engine for multi-column text.
# These recognition engines have a very high character recognition success rate compared to other OCR's, including proprietary software.
# New: multi-page and multiple file selection support!
# Enhanced XSANE output and TIFF compatibility.
# New: now handles nearly any format out there!
# This script will convert any image of text into editable and indexable text. (for a full list of compatible file formats see the first filter below)
#
# REM: The better/cleaner/higher contrasted/higher resolution your image or scan is the better the results
#
# Dependencies: libtiff-dev (or -devel)(installed FIRST), tesseract-2.04 (latest stable-version), your chosen language data for Tesseract (2.00 and up) *1,
# ImageMagick, ghostscript, Zenity, and OpenOffice or other text editor *2
# This version of tesseract can be downloaded from here: http://code.google.com/p/tesseract-ocr/downloads/list
# Warning: This script will not work with the latest beta version (tesseract 3.00 pre-release) due to database structure modifications.
#
# Optional dependencies: ocrad ->an alternate recognition engine
# If inital results are unsatisfactory, maybe this engine will do better. Most importantly, it supports basic page format recognition. *3
# The latest version of ocrad can be downloaded off the GNU mirror list here: http://www.gnu.org/software/ocrad/
#
# Also: Make sure to select Unicode UTF-8 in OpenOffice's pop-up window (or text editor of your choice).
#
#
#
# *1 Install Tesseract after libtiff-dev. Then extract all the language databases you need into the "wherever_you_installed/tesseract-2.04/tessdata" directory.
# This is done automatically if you extract the language databases from WITHIN the "tesseract-2.04" directory (and allow overwriting).
# This script allows the use of multiple language databases. Default is English and French. For adding others see comments below.
# You NEED at least one language database or tesseract will not work.
# *2 Simply change the occurance of "soffice -writer" below to a text editor of your choice, ie: gedit, KWrite, etc
# Some systems call on OpenOffice Writer differently. If unsure, check the properties tab of your Writer launcher.
# Ie: On customized versions of OOo (such as the ones provided by Linux Mandrake or Gentoo), you start Writer with: oowriter
# *3 If you install ocrad also, TextRipper will recognize this and prompt you to choose between the two offering better recognition or page format support
#
# Troubleshooting:
# If this script ends saying your text editor can't open "OCR output-editable text.txt",
# or if run off the cli: Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset
# do (as superuser):
# echo /usr/local/share /usr/share | xargs -n 1 cp -R wherever_you_installed/tesseract-2.04/tessdata
# Explanation: Tesseract may call on the tessdata directory from the /share directory of your filesystem,
# so you need to make your language databases available from there.




LicenseGPL
Source(TextRipper)
Send to a friend
Subscribe
Other  Artwork  from kickass
Report inappropriate content



-

 ocrad not so hot.... :(

 
 by chric on: Sep 19 2010
 
Score 50%

You might want to check out tessereract OCR as your recognition engine - just using the data in ubuntu's repo I was able to recognise stuff that ocrad completely failed with...


Reply to this

-
.

 Re: ocrad not so hot.... :(

 
 by kickass on: Sep 20 2010
 
Score 50%

Chris:
Thanks for your input.
You'll be happy with Ver 1.1.
It's made for tesseract but still allows for ocrad if you like.
cheers,
d.


Reply to this

-

 This doesn't work for me...

 
 by inameiname on: Sep 21 2010
 
Score 50%

When I right click an image file, and after selecting "English", I only get this message:

/home/me/Tmp/OCR output-editable text.txt does not exist.

This occurred in both versions of your script. I don't understand.

Thanks in advance.


Reply to this

-

 Re: This doesn't work for me...

 
 by kickass on: Sep 21 2010
 
Score 50%

tjc:
There are only two possible causes for this error message.
The first is treated clearly albeit concisely in the heading comments of the script itself under troubleshooting.
The second is an incompatible image format because either 1) you are missing libraries such as libtiff-dev or 2) the tesseract engine just can't treat that particular file. In this case a conversion usually fails. You must rescan preferrably to a different format such pnm. There have also been reports of success in such cases after switching to the ocrad engine.
Your pick.
cheers,
d.


Reply to this

-

 Re: Re: This doesn't work for me...

 
 by inameiname on: Sep 23 2010
 
Score 50%

Thanks for the info. I believe my issue was with not having tesseract-ocr and tesseract-ocr-eng installed. I just installed those and voila. I wasn't aware there were dependencies for it as none were mentioned on this page, so that's why. hehe. Thanks. I figured it out now. Although, not sure about libtiff-dev. It's not in Ubuntu's repos, but is in Debian's. Doesn't seem I need it installed though.


Reply to this

-

 Download Link Broken

 
 by kayce on: Dec 4 2010
 
Score 50%

Hello,

The download link is broken. Can somebody fix it please? Thanks.


-k
Reply to this

-

 Re: Download Link Broken

 
 by kickass on: Dec 4 2010
 
Score 50%

Uh, it's not really. Try again.
HOWEVER, soon, very soon, I'll release the new version of Text Recognition now rebaptized TextRipper. It can rip text off anything!
till then,
d.


Reply to this

-

 Re: Re: Download Link Broken

 
 by kayce on: Dec 4 2010
 
Score 50%

Hi D,

Well when I clicked on "Download", I have a new page coming up saying that the download popup should appear soon. But instead, no popup appears and it redirects me to the following link:

http://gtk-apps.org/CONTENT/content-files/132759-Text%20Recognition

Any insight regarding this? Thanks,


-k
Reply to this

-

 Re: Re: Re: Download Link Broken

 
 by kickass on: Dec 5 2010
 
Score 50%

just copy/paste it into a file, add the execute permissions, then read through the comments to ensure dependencies, etc.
Otherwise, like i said above, in about a week i'll release TextRipper.


Reply to this

-

 Re: Re: Re: Re: Download Link Broken

 
 by kayce on: Dec 5 2010
 
Score 50%

Thanks for the fast reply. I appreciate it.Guess I'll wait for the product if it really can rip any text!

Btw, does it handle hand-written cases? Most OCR out there (including Tessract) cannot handle hand-written characters. From my understanding, Tessract expects a well-segmented and well-defined fonts.

Cheers


-k
Reply to this

-

 Re:Handwritten text

 
 by kickass on: Dec 6 2010
 
Score 50%

Hello Kayce:
the main difficulty in recognizing handwritten text has less to do with the "font" (caligraphy) but rather whether the letters are joined or distictly separate. Tesseract does a pretty fine job if the letters aren't linked. Try it out for yourself. If you find an engine that beats tesseract in this please let me know.
dave


Reply to this

-

 Re:Handwritten text

 
 by kickass on: Dec 6 2010
 
Score 50%

Hello Kayce:
the main difficulty in recognizing handwritten text has less to do with the "font" (caligraphy) but rather whether the letters are joined or distictly separate. Tesseract does a pretty fine job if the letters aren't linked. Try it out for yourself. If you find an engine that beats tesseract in this please let me know.
dave


Reply to this

-

 Re: Re: Re: Re: Re: Download Link Broken

 
 by kickass on: Dec 10 2010
 
Score 50%

I've uploaded TextRipper and just wanted to let you know. I hope it satisfies you as much as it satisfies most everybody.
d.


Reply to this

-

 Best. Script. Ever.

 
 by agentkiller4 on: Dec 10 2010
 
Score 50%

i can finally use scanned papers and pdfs for editing! thank you, flawless even for pdf and tiff


Reply to this

-

 Re: Best. Script. Ever.

 
 by kickass on: Dec 10 2010
 
Score 50%

Glad to hear your raves.


Reply to this

-

 Unfort I get an error

 
 by polardude1983 on: Apr 23 2011
 
Score 50%

I am getting this error

/home/christoph/Downloads/dog_petition10001.jpg (editable and indexable 001.txt does not exist

And I believe I installed everything correctly.

I have Zenity, Tesseract-ocr, Tesseract-ocr-eng, imagemagick, libtiff4-dev, ghostscript.

Any help would be appreciated. I have tried it on different images in different formats, jpg, png, pdf. Same error for all


Reply to this

-

 Re: Unfort I get an error

 
 by kickass on: May 3 2011
 
Score 50%

hey polardude
try this:
do (as superuser):
# echo /usr/local/share /usr/share | xargs -n 1 cp -R wherever_you_installed/tesseract-2.04/tessdata

Explanation: Tesseract may call on the tessdata directory from the /share directory of your filesystem,
so you need to make your language databases available from there.

let me know if this was it.
d.


Reply to this

-

 Re: Re: Unfort I get an error

 
 by I4C on: Feb 20 2013
 
Score 50%

I installed all the dependencies and the optional one as well, but how do i install the script ?


Reply to this

-

 Re: Unfort I get an error

 
 by kickass on: Mar 1 2013
 
Score 50%

Just run the script off the command line once you've installed the dependancies.
ps to all users: there's a great new one out there called YAGF. check it out.


Reply to this

Add commentBack






-
 
 
 Who we are
Contact
More about us
Frequently Asked Questions
Register
Twitter
Blog
Explore
Artwork
Jobs
Knowledge
Events
People
Updates on identi.ca
Updates on Twitter
Facebook App
Content RSS   
Events RSS   

Participate
Groups
Forum
Add Artwork
Public API
About GNOME-Look.org
Legal Notice
Spreadshirt Shop
CafePress Shop
Advertising
Sponsor us
Report Abuse
 

Copyright 2003-2014 GNOME-Look.org Team  
All rights reserved. GNOME-Look.org is not liable for any content or goods on this site.
All contributors are responsible for the lawfulness of their uploads.
GNOME and the foot logo are trademarks of the GNOME Foundation.