Breedlove: ocr - Tess4J doOCR() for *First Page* of pdf / tif -

Tuesday, 15 March 2011

ocr - Tess4J doOCR() for First Page of pdf / tif -

ocr - Tess4J doOCR() for *First Page* of pdf / tif -

is there way tell tess4j ocr amount of pages / characters?

i potentially working 200+ page pdf's, want ocr first page, if that!

as far understand, mutual sample

package net.sourceforge.tess4j.example;  import java.io.file; import net.sourceforge.tess4j.*;      public class tesseractexample {          public static void main(string[] args) {             file imagefile = new file("eurotext.tif");             tesseract instance = tesseract.getinstance();  // jna interface mapping             // tesseract1 instance = new tesseract1(); // jna direct mapping               seek {                 string result = instance.doocr(imagefile);                 system.out.println(result);             }  grab (tesseractexception e) {                 system.err.println(e.getmessage());             }         }     }

would effort ocr entire, 200+ page single string.

for particular case, way more need do, , i'm worried take very long time if allow 200+ pages , substring first 500 or so.

the library has pdfutilities class can extract pages of pdf.

pdf ocr tess4j

Breedlove

Tuesday, 15 March 2011

ocr - Tess4J doOCR() for First Page of pdf / tif -

No comments:

Post a Comment