Tuesday 15 March 2011

ocr - Tess4J doOCR() for *First Page* of pdf / tif -



ocr - Tess4J doOCR() for *First Page* of pdf / tif -

is there way tell tess4j ocr amount of pages / characters?

i potentially working 200+ page pdf's, want ocr first page, if that!

as far understand, mutual sample

package net.sourceforge.tess4j.example; import java.io.file; import net.sourceforge.tess4j.*; public class tesseractexample { public static void main(string[] args) { file imagefile = new file("eurotext.tif"); tesseract instance = tesseract.getinstance(); // jna interface mapping // tesseract1 instance = new tesseract1(); // jna direct mapping seek { string result = instance.doocr(imagefile); system.out.println(result); } grab (tesseractexception e) { system.err.println(e.getmessage()); } } }

would effort ocr entire, 200+ page single string.

for particular case, way more need do, , i'm worried take very long time if allow 200+ pages , substring first 500 or so.

the library has pdfutilities class can extract pages of pdf.

pdf ocr tess4j

No comments:

Post a Comment