Translation-Inspired OCR

Dmitriy Genzel

Ashok C. Popat

Nemanja Spasojevic

Michael Jahr

Andrew Senior

Eugene Ie

Frank Yung-Fong Tang

ICDAR-2011

Google Scholar

Abstract

Optical character recognition is carried out using techniques
borrowed from statistical machine translation. In particular, the
use of multiple simple feature functions in linear combination,
along with minimum-error-rate training, integrated decoding, and
$N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using
both synthetic and real data in five languages.

Research Areas

Machine Translation

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Translation-Inspired OCR

Abstract

Research Areas

Learn more about how we research

Google Ai

Google Cloud

Google DeepMind

Google Labs