Abstract
Optical character recognition is carried out using techniques
borrowed from statistical machine translation. In particular, the
use of multiple simple feature functions in linear combination,
along with minimum-error-rate training, integrated decoding, and
$N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using
both synthetic and real data in five languages.