This is an old revision of the document!


Google Summer of Code 2010: Sahana OCR

Exclusive Summary

  • Abstract : The data collecting and entering process can be considered as one of the most pain full exercises with manual handling during a huge disaster situation. Therefore Sahana OCR is recognised as a great tool in solving such problems. When it comes to OCR module, reliability and consistency are major areas to be addressed. By focusing and improving these two characteristics, Sahana OCR module can be optimally utilised when ever, where ever a disaster situation occurred.
  • Current Status : During a past disaster situation the data are collected from the distributed forms to the victims, which is the most successful method of data collecting within a disaster situation. Then the Sahana OCR module scans these forms using ScannerManager and sends them to create the form images. The form images then processed to extract the data fields and then the letter boxes within the data fields using FormProcessor and the ImageProcessor. Currently the character recognition task was done by a Neural network developed using FANN library. But the accuracy of the recognition was very poor since lack of training the neural network.
  • Student : Thilanka Kaushalya.
  • Mentor(s) : Gihan Chamara , Jo Fonseka, Chamindra de Silva, and Hayesha Somorathne.

Code

Progress

  • I have tested the Tesseract using with the existing system and manage to get a good accuracy of recognizing the data.Sample Code
  • I have followed the training process of Tesseract to measure the ability to train it for handwritten letters. Testing Results
  • The weekly meetings are scheduled on Saturdays at 1530 UTC. Calender
Project plan and Timeline
  • Basically my project plan is organize the SahanaOCR module as a complete module which can handle the whole process of the data entering, with a great accuracy.

The basic project ideas are as follows.

  • Make the system platform independent.
  • Integrate the Tesseract code to the project.
  • Differentiate the forms and the pages from each other and identify them by the system itself to automate the data sending process to the corresponding modules.

These are timeline which are allocated to specific tasks.


QR Code
QR Code foundation:gsoc_geganage (generated for current page)