Update: The paperless office
#4
Posted 28 November 2008 - 11:56 AM
Just a quick note that DEVONthink Pro Office comes with built-in OCR as well as Image-Capture-based scanning, TWAIN support, and pro-grade document management. Whoever is serious about going paperless may want to have a look: http://www.devon-tec...cts/devonthink/
Disclaimer: I am DEVONtechnologies' President, makers of DEVONthink.
Disclaimer: I am DEVONtechnologies' President, makers of DEVONthink.
#5
Posted 29 November 2008 - 03:45 AM
PowerBook190 said:
Thanks for the update, Joe. Would PDFpen work better for scripting? OCR was recently added to its feature list.
I haven't tried PDFpen's new OCR capability yet, so I can't say, but it's certainly worth a try if you have it, and have the necessary AppleScripting skills.
Joe
#7
Posted 29 November 2008 - 03:51 AM
eboehnisch said:
Just a quick note that DEVONthink Pro Office comes with built-in OCR as well as Image-Capture-based scanning, TWAIN support, and pro-grade document management. Whoever is serious about going paperless may want to have a look: http://www.devon-tec...cts/devonthink/
The original article that this update goes with does mention DEVONthink Pro Office, which is what I use personally for OCR on scanned documents. It's great because it doesn't require any scripting, but it's more expensive if you already have a copy of Acrobat Standard/Pro or ReadIris that was bundled with your scanner.
Joe
#9
Posted 01 December 2008 - 07:09 AM
FYI, you don't need Acrobat to get the text from a PDF document. Here's a free solution that comes with Mac OS X.
Automator has a built-in set of PDF actions, including one for extracting text from a PDF document, in addition to actions for rendering watermarks, inserting metadata, merging documents, extracting pages, and even rendering PDF pages as images.
To make a text extraction solution, create a new workflow with the Extract PDF Text action and then save the workflow as either a self-running application that will process dragged-on PDF files, or as a Folder Actions plug-in to run automatically when PDF files are placed in an attached folder.
For more information about Automator, visit:
Automator has a built-in set of PDF actions, including one for extracting text from a PDF document, in addition to actions for rendering watermarks, inserting metadata, merging documents, extracting pages, and even rendering PDF pages as images.
To make a text extraction solution, create a new workflow with the Extract PDF Text action and then save the workflow as either a self-running application that will process dragged-on PDF files, or as a Folder Actions plug-in to run automatically when PDF files are placed in an attached folder.
For more information about Automator, visit:
#10
Posted 01 December 2008 - 07:27 AM
Nyhthawk said:
FYI, you don't need Acrobat to get the text from a PDF document. Here's a free solution that comes with Mac OS X.
We're not talking here about extracting text from a PDF; we're talking about doing OCR to put text into the PDF. Nothing built into Mac OS X can do that, I'm afraid.
Joe
#11
Posted 10 December 2008 - 01:49 PM
Joe,
Thanks for writing these articles, this is going to make my life a lot easier.The Fujitsu scanner you recommended is amazing.
I followed your directions but I am running into a problem. I think the script is running too fast on my computer. Acrobat attempts to open the pdf before the scanner has finished saving the file and then I get an error message and the script stops. How can I fix this problem?
Your help would be appreciated.
Thanks,
Jehad
Thanks for writing these articles, this is going to make my life a lot easier.The Fujitsu scanner you recommended is amazing.
I followed your directions but I am running into a problem. I think the script is running too fast on my computer. Acrobat attempts to open the pdf before the scanner has finished saving the file and then I get an error message and the script stops. How can I fix this problem?
Your help would be appreciated.
Thanks,
Jehad
#12
Posted 10 December 2008 - 02:11 PM
syclone171 said:
I followed your directions but I am running into a problem. I think the script is running too fast on my computer. Acrobat attempts to open the pdf before the scanner has finished saving the file and then I get an error message and the script stops. How can I fix this problem?
Erm...that's a good question, and I don't have an answer. It sounds like what's happening is that Folder Actions (part of Mac OS X) is deciding that a new item has appeared in the folder and launching the appropriate script before the item in question has finished being saved - which sounds like an Apple bug to me. Presumably, the file should remain open until it's finished saving, and the Folder Action shouldn't trigger until the file closes. In any case, it's not something you can control by modifying the AppleScript, I don't think, because by the time the AppleScript kicks in it's already too late (or, rather, too early!).
Your best bet, therefore, may be to skip the scripts altogether and go with either Abbyy FineReader (which comes with newer ScanSnap models) or DevonThink Pro Office. Both of those do OCR, and neither requires the help of a script.
#14
Posted 11 December 2008 - 08:31 PM
Everyone - this might seem like a stupid question - but why OCR each scan? Do you need to edit the text? I have been moving to a paperless office myself and have not been OCR'ing unless there is a real need to get access to the content itself. Please provide some insights.
Thanks,
Matthew
Thanks,
Matthew



Sign In
Register
Help

MultiQuote