June 26, 2017

Convert a PDF or Scanned File into Editable Text, Using Google Drive

Apply OCR to Previously Saved or Scanned Documents;
Easily Convert PDF Files into Editable Formats

 




The articles in this blog are inspired by real life issues.  This weekend, I received a call from a less-skilled user. All of his electronic files had "disappeared" from his PC, and he was busy trying to recreate them from hard copies. He had already scanned a bunch before he discovered the scanned files were not editable.  Rather, he had a bunch of files that had basically been scanned into giant graphics. Was there anything he could do?

We discussed that, in the future, he needed to look for a "Scan to OCR" option in his scanning software. Most scanners include a Scan to OCR plug-in. (We also had a long conversation about disappearing files, and what that might mean, as well as setting up cloud backup through Backblaze to prevent these issues in the future.) But believe it or not, all was NOT lost. It is NOT too late to convert those scans. There is an easy way to apply OCR to saved files, using Google Drive and Google Docs.

What is OCR



OCR stands for Optical Character Recognition. OCR takes electronic information stored in an image format, (like GIF, JPG, PDF, or PNG,) and converts it into characters your programs can understand.  When you copy or scan something, basically a device is taking a picture of whatever you lay on its bed. Electronically, it saves a series of dots, rather than a series of letters and numbers. If you wish to edit a scanned document, first you MUST convert it back from a bunch of dots to characters software can recognize and understand. This process is called OCR, or Optical Character Recognition.

Most popular scanners come with OCR software.  If you are scanning a document, and you will want to edit it later, you should look for the “scan to OCR” option in your scanning software.  (It may also be called "Scan to Doc.")

If you have a document that has already been scanned to an image-based format, you will need to convert it using OCR BEFORE you can edit it.. This is also true for files you have stored on drives and other media.

OCR Using Google Docs


Google Drive contains a free and easy way to convert image-based files to editable text.  This is especially handy for PDF files, as well as files received as e-mail attachments. Google Docs has hidden OCR capabilities. You can open a file stored in Google Drive with Google Docs, and run it through an OCR plug-in while you're doing so. The steps are outlined below.


Convert PDF and photo files to text


You can convert image files to text with Google Drive.  You MUST do this from a computer; OCR is not available on mobile platforms (android, iPhone, iPad, etc.) at the time of publication.

Conversion Steps:

 

1. Prepare the file:
  • Format: You can convert .JPEG, .PNG, .GIF, or PDF (multipage document) files. 
  • File size: The file should be 2 MB or less. 
  • Resolution: Text should be at least 10 pixels high. 
  • Orientation: Documents must be right-side up. If your image is facing the wrong way, rotate it before uploading it to Google Drive. 
  • Languages: Google Drive will detect the language of the document. 
  • Font and character set: For best results, use common fonts such as Arial or Times New Roman. 
  • Image quality: Sharp images with even lighting and clear contrasts work best. You cannot OCR handwritten text.

2. Upload and Convert the File
  • On your computer, go to drive.google.com.
    This will bring up a list of documents you have previously created or stored in Google Drive.
  • If you need to upload your file,  select “New,” then “File Upload.”
    Browse to the location you have stored the scanned files, (usually the Documents folder, or a folder called “Scans” within the User folder,) and select the file(s) to upload. You can also drag and drop the file onto the upload dialog box.
  • Once uploaded, you will see the files in your Google Docs list.
  • Right-click on the file you want to convert
  • Click “Open with... Google Docs”. 
  • The image file will be converted to a Google Doc, and it will open in a Google Docs window.
  • You may edit the document within the Google Docs interface, or “save“ to your computer, and open/edit like you would any local file.
  • Google Docs format is compatible with all major Office Packages, including Microsoft Office (Word, Excel, PowerPoint,) LibreOffice, WordPerfect, and OpenOffice.

    3. Conversion Considerations
    • Some formatting might not transfer: 
    • Bold, italics, font size, font type, and line breaks are most likely to be retained. 
    • Lists, tables, columns, footnotes, and endnotes most likely will not be detected. 
    • Clear, high contrast documents will OCR better; light or fuzzy documents will not OCR as well.
    • You cannot OCR handwritten text
    • Some text may not “translate” correctly, so it is important to proofread the converted document.

    Another Way to Convert Files to Different Formats

    Zamzar.com is a free, online document conversion service.  You can upload all types of files to Zamzar, and they will convert them for you. The free service has you upload a file, and when it has made its way through the queue, Zamzar will send you a link so you can download the converted file. Premium plans let you upload by e-mail and store converted files on their server.  Zamzar is especially handy for converting formats other than PDF. For example, it is one of the few ways to convert old MS Publisher files into a word-compatible format. If a file fails to convert using Google Docs, sending it to Zamzar for conversion is another alternative.



    Try It Yourself!


    We've uploaded a PDF version of these instructions to Google Drive. You can access it here: https://drive.google.com/file/d/0B6XRbo2aGTmjZWFOWURFTmt6b1k/view?usp=sharing. You can save a copy of the file to your own Google Drive (using the Hamburger Menu) and convert it back to editable using Google Docs.

    Comments:


    Do you have any questions? Do you use Google Drive or Google Docs? Did you know about the OCR feature of Google Docs? Have you, or will you use this OCR conversion? Let us know in the comments, or hit us up on Facebook or Twitter. And as always, thanks for reading.

    No comments:

    Post a Comment

    Thank you for contributing to the discussion! Your feedback is valued! (Unless you are a sunglasses or work at home spammer, in which case, your comment will be promptly deleted. :D) The Mods are reviewing it, to keep those types away! ;)