This is an old revision of the document!


Google Summer of Code 2010: Sahana OCR

Exclusive Summary

  • Abstract : The data collecting and entering process can be considered as one of the most pain full exercises with manual handling during a huge disaster situation. Therefore Sahana OCR is recognised as a great tool in solving such problems. When it comes to OCR module, reliability and consistency are major areas to be addressed. By focusing and improving these two characteristics, Sahana OCR module can be optimally utilised when ever, where ever a disaster situation occurred.
  • Current Status : During a past disaster situation the data are collected from the distributed forms to the victims, which is the most successful method of data collecting within a disaster situation. Then the Sahana OCR module scans these forms using ScannerManager and sends them to create the form images. The form images then processed to extract the data fields and then the letter boxes within the data fields using FormProcessor and the ImageProcessor. Currently the character recognition task was done by a Neural network developed using FANN library. But the accuracy of the recognition was very poor since lack of training the neural network.
  • Student : Thilanka Kaushalya.
  • Mentor(s) : Gihan Chamara , Jo Fonseka, Chamindra de Silva, and Hayesha Somorathne.

Code

Progress

  • I have tested the Tesseract using with the existing system and manage to get a good accuracy of recognizing the data.Sample Code
  • I have followed the training process of Tesseract to measure the ability to train it for handwritten letters. Testing Results
  • The weekly meetings are scheduled on Saturdays at 1530 UTC. Calender
Project plan and Timeline

QR Code
QR Code foundation:gsoc_geganage (generated for current page)