ocr - Tess4J doOCR() for *First Page* of pdf / tif -
is there way tell tess4j ocr amount of pages / characters?
i potentially working 200+ page pdf's, want ocr first page, if that!
as far understand, mutual sample
package net.sourceforge.tess4j.example; import java.io.file; import net.sourceforge.tess4j.*; public class tesseractexample { public static void main(string[] args) { file imagefile = new file("eurotext.tif"); tesseract instance = tesseract.getinstance(); // jna interface mapping // tesseract1 instance = new tesseract1(); // jna direct mapping seek { string result = instance.doocr(imagefile); system.out.println(result); } grab (tesseractexception e) { system.err.println(e.getmessage()); } } }
would effort ocr entire, 200+ page single string.
for particular case, way more need do, , i'm worried take very long time if allow 200+ pages , substring
first 500 or so.
the library has pdfutilities
class can extract pages of pdf.
pdf ocr tess4j
No comments:
Post a Comment