Published on Nov 30, 2023
This project automates the recognition of the written script through online. There have been many attempts on handwritten script identification in offline documents. The most important characteristic of online documents recognition is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification.
The proposed system uses the features of connected components to classify six different scripts (Arabic, Han, Cyrillic, Devnagari, Hebrew, and Roman) and reported a classification accuracy of 88 percent on document pages. There are a few important aspects of online documents that enable us to process them in a fundamentally different way than offline documents.
The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification as well as text recognition.
The system first collects the data in the mentioned six languages. The script is created in these languages and stored in the file. Based on this collection words in particular language can be detected with the properties of each language. The classifier design uses k-Nearest Neighbor method. All this procedure is explained as follows.
• Data collection and Preprocessing
• Line and Word Detection
• Feature Extraction
• Recognition
Hard disk : 40 GB
RAM : 512 MB
Processor Speed : 3.00GHz
Processor : Pentium IV Processor
JDK 1.5 and more
Java Swing - front end