Tesseract for iOS

Tesseract for iOS

Introduction

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, and development has been sponsored by Google since 2006 until now (2020). This is the most popular and qualitative OCR-library. It uses artificial intelligence(AI) for text search and its recognition on images. It supports multiple platforms: MacOS, Window, Linux, but it can be compiled for iOS and Android also. 

This is the source code’ s repository.

https://github.com/tesseract-ocr/tesseract

The number of languages supported is over 100, which each .traineddata file is a language trained model.

https://github.com/tesseract-ocr/tessdata 

Now I would like to describe how to implement Tesseract for iOS.

Development Environment

  • Macos Catalina 10.15.2
  • Xcode 11.3, Swift 5
  • Tesseract 4.1.1
  • Leptonica 1.79.0
  • OpenCV 4.2.0

Download and include dependencies

First, create new xcode project with Single View App mode

Download tesseract 4.1.1 version for iOS

(This is the compiled version from https://github.com/tesseract-ocr/tesseract for iOS ONLY)

https://github.com/kang298/Tesseract-builds-for-iOS/tree/tesseract-4.1.1

After downloaded and unzipped, you will have 2 folders “include” and “lib”, drag and drop both to your xcode project 

Download OpenCV iOS framework

https://opencv.org/releases/ then drag drop to xcode project

Press command + R to build to make sure no error

Download languages’ trained model files

https://github.com/tesseract-ocr/tessdata

In this tutorial, we will test with 3 languages: English, Japanese, Vietnamese. So we should download the following models files and saved them in a folder named tessdata:

  • eng.traineddata
  • jpn.traineddata
  • vie.traineddata

Then drag drop that folder to xcode project. NOTE: choose “Create folder references” instead of “Create Group” when adding that folder to project

 

 

Coding

Because Tesseract is developed by C++ so you only code by C++. Create an C++ file named tesseract_wrapper.cpp in project like following

Remember to check “Also create a header file” so that Xcode will create a header (tesseract_wrapper.hpp) file for you C++ file.

tesseract_wrapper.hpp

//
//  tesseract_wrapper.hpp
//  TestTesseract
//
//  Created by Briswell on 1/13/20.
//  Copyright © 2020 Briswell. All rights reserved.
//

#ifndef tesseract_wrapper_hpp
#define tesseract_wrapper_hpp

#include "opencv2/imgproc.hpp"
#include "stdio.h"

using namespace cv;
String ocrUsingTesseractCPP(String image_path,String data_path,String language);

#endif /* tesseract_wrapper_hpp */

tesseract_wrapper.cpp

//
//  tesseract_wrapper.cpp
//  TestTesseract
//
//  Created by Briswell on 1/13/20.
//  Copyright © 2020 Briswell. All rights reserved.
//

#include "allheaders.h"
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include "baseapi.h"
#include "tesseract_wrapper.hpp"

using namespace cv;
using namespace tesseract;

/*
 matToPix():
    convert from OpenCV Image Container to Leptonica's Pix Struct
 Params:
    mat: OpenCV Mat image Container
 Output
    Leptonica's Pix Struct
 */
Pix* matToPix(Mat *mat){
    int image_depth = 8;
    //create a Leptonica's Pix Struct with width, height of OpenCV Image Container
    Pix *pixd = pixCreate(mat->size().width, mat->size().height, image_depth);
    for(int y=0; yrows; y++) {
        for(int x=0; xcols; x++) {
            pixSetPixel(pixd, x, y, (l_uint32) mat->at(y,x));
        }
    }
    return pixd;
}

/*
 ocrUsingTesseractCPP():
    Using Tesseract engine to read text from image
 Params:
    image_path: path to image
    data_path: path to folder containing .traineddata files
    language: expeted language to detect (eng,jpn,..)
 Output:
    String detected from image
 */
String ocrUsingTesseractCPP(String image_path,String data_path,String language){
    //load a Mat Image Container from image's path and gray scale mode
    Mat image = imread(image_path,IMREAD_GRAYSCALE);
    TessBaseAPI* tessEngine = new TessBaseAPI();
    //Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns, in this tutorial we just focus on LSTM only
    OcrEngineMode mode = tesseract::OEM_LSTM_ONLY;
    
    //init Tesseract engine
    tessEngine->Init(data_path.c_str(), language.c_str(), mode);
    
    //Set mode for page layout analysis, refer for all modes supporting
    //https://tesseract.patagames.com/help/html/T_Patagames_Ocr_Enums_PageSegMode.htm
    PageSegMode pageSegMode = tesseract::PSM_SINGLE_BLOCK;
    tessEngine->SetPageSegMode(pageSegMode);
    
    //increase accuracy for japanese
    if(language.compare("jpn") == 0){
        tessEngine->SetVariable("chop_enable", "true");
        tessEngine->SetVariable("use_new_state_cost", "false");
        tessEngine->SetVariable("segment_segcost_rating", "false");
        tessEngine->SetVariable("enable_new_segsearch", "0");
        tessEngine->SetVariable("language_model_ngram_on", "0");
        tessEngine->SetVariable("textord_force_make_prop_words", "false");
        tessEngine->SetVariable("edges_max_children_per_outline", "40");
    }
    
    
    //convert from OpenCV Image Container to Leptonica's Pix Struct
    Pix *pixImage = matToPix(&image);
    //set Leptonica's Pix Struct to Tesseract engine
    tessEngine->SetImage(pixImage);
    
    //get recognized text in UTF8 encoding
    char *text = tessEngine->GetUTF8Text();
    
    //release Tesseract's cache
    tessEngine->End();
    pixDestroy(&pixImage);
    
    return text;
}

Because Swift can not call C++ function directly so we will a objective-c wrapper file to handle that.

  • TesseractWrapper.h
  • TesseractWrapper.mm (not .m because this file is for C++ compilation)

TesseractWrapper.h

//
//  TesseractWrapper.h
//  TestTesseract
//
//  Created by Briswell on 1/13/20.
//  Copyright © 2020 Briswell. All rights reserved.
//

#import "Foundation/Foundation.h"
#import "UIKit/UIKit.h"
@interface TesseractWrapper : NSObject
+(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language;
@end

TesseractWrapper.mm

//
//  TesseractWrapper.m
//  TestTesseract
//
//  Created by Briswell on 1/13/20.
//  Copyright © 2020 Briswell. All rights reserved.
//

#import "TesseractWrapper.h"

#include "tesseract_wrapper.hpp"
@implementation TesseractWrapper

/*
ocrUsingTesseractObjectiveC()
    call ocrUsingTesseractCPP() to recognize  text from image
 params:
    image: image to recognize text
    language: eng/jpn/vie
 output:
    recognized string
 */
+(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language{
    //get path of folder containing .traineddata files
    NSString* data_path = [NSString stringWithFormat:@"%@/tessdata/",[[NSBundle mainBundle] bundlePath]];
    //save image to app's cache directory
    NSString* cache_dir = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) lastObject];
    NSString* image_path = [NSString stringWithFormat:@"%@/image.jpeg",cache_dir];
    NSData* data = UIImageJPEGRepresentation(image, 0.5);
    NSURL* url = [NSURL fileURLWithPath:image_path];
    [data writeToURL:url atomically:true];
    
    //get text from image using ocrUsingTesseractCPP() from file tesseract_wrapper.hpp
    String str = ocrUsingTesseractCPP([image_path UTF8String], [data_path UTF8String], [language UTF8String]);
    NSString* result_string = [NSString stringWithCString:str.c_str()
    encoding:NSUTF8StringEncoding];
    //remove cached image
    [[NSFileManager defaultManager] removeItemAtURL:url error:nil];
    return result_string;
}
@end

Create a simple screen with a textview and button only in ViewController.swift

 

ViewController.swift

//
//  ViewController.swift
//  TestTesseract
//
//  Created by Briswell on 1/13/20.
//  Copyright © 2020 Briswell. All rights reserved.
//

import UIKit
import CropViewController

class ViewController: UIViewController {

    @IBOutlet weak var txt: UITextView!
    
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
    }

    @IBAction func ocr(_ sender: Any) {
        //if camera not supported
        if !UIImagePickerController.isSourceTypeAvailable(.camera){
            return
        }
        //present camera to take image
        let pickerController = UIImagePickerController()
        pickerController.delegate = self as UIImagePickerControllerDelegate & UINavigationControllerDelegate
        pickerController.sourceType = .camera
        self.present(pickerController, animated: true, completion: nil)
    }
    
}

extension ViewController: UIImagePickerControllerDelegate,UINavigationControllerDelegate{
    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
        picker.dismiss(animated: true) {
            guard let image = info[.originalImage] as? UIImage else { return  }
            //present a crop image frame to focus on text content
            let cropViewController = CropViewController.init(image: image)
            cropViewController.delegate = self
            self.present(cropViewController, animated: true, completion: nil)
        }
    }

    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
        picker.dismiss(animated: true, completion: nil)
    }
}

extension ViewController:CropViewControllerDelegate{
    func cropViewController(_ cropViewController: CropViewController, didCropToImage image: UIImage, withRect cropRect: CGRect, angle: Int) {
            cropViewController.dismiss(animated: true) {
                //call objective-c wrapper with expected language
                let str = TesseractWrapper.ocr(usingTesseract: image, language: "jpn")
                self.txt.text = str
            }
        }
        
        func cropViewController(_ cropViewController: CropViewController, didFinishCancelled cancelled: Bool) {            
            cropViewController.dismiss(animated: true, completion: nil)            
        }
}

Here is the test result with Japanese language. You can check with English and Vietnamese also with the same above way.

 

 

Conclusion

The text recognition on images is realizable task but there are some difficulties. The main problem is quality (size, lightning, contrast) of images. And each image has different problems so adding a filter tool so that user can edit manually, which is also an option. Refer to below link for improving image quality:

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality