This is an old revision of the document!


Sahana XForm

Introduction

Sahana is an effective disaster/crisis/emergency management system that provides timely access to comprehensive, relevant, and reliable information that are critical to humanitarian operations. The faster the humanitarian community is able to collect, analyze, disseminate, and act on key information, the more effective will be the response, the better needs will be met and the greater the benefit to affected population.

However as IT is still invaluable to manage the analysis of data, the data capered in forms have to reentered manually adding an additional step and potential bottleneck for the processing of information. Handwritten Character Recognition(HCR) technology can be used to automate this process to a great extend. The documents should be structured in accordance with well defined layouts for an error free data capturing using HCR based approach.

This approach extracts the elements in an active Web form and transform the XHTML elements in an active page to a printer friendly format (to an off-line usable form) leaving behind the images and unwanted XHTML elements.

HCR Form Design Process

The Joomla Content Management System has adapted a methodology of using a separate page to handle the printing request of the user for the content on the active page. But it depends on the server-side script execution to handle the request. The approach itself has several limitations in terms of time, resource utilization, traffic, reading of XHTML DOM elements, etc… This server-side dependency can be overcome through adapting a client-side mechanism. Using client-side technologies, such as XHTML, JavaScript and CSS to read the required XHTML DOM elements on the Web form. Organize each element pair (label and input or select or textarea, etc…) to match the existing layout of the active Web form. Following were the main requirements identified to be fulfilled during the initial development phase (as illustrated in Figure 1).

  • Placing of square markers on four corners of the page, placing two of them on the upper most part [1] while placing another two on the bottom most part [2] and placing another one in between the top two markers[3], which is mandatory for the identification of correct layout of the page.
  • This layout should be repeated on every page, eg: on every A4 page this pattern should be repeated.
  • Identification of the Web pages that contains XHTML form elements from text only pages.
  • Extraction of the required components (XHTML elements) from the active Web form.
  • Generation of suitable background layout for each field, such as date fields, detail description fields, multiple selection fields, etc… [4]
  • Grouping of entities (fields) according to the categories found on the Web form [5].

Figure 1: The basic format of each layout displayed in the generated form

HCR Form Design Best Practices

Recognition of handwritten characters in forms is challenging. To achieve a higher accuracy out of the HCR engine, it should design for a specific form layout adhering to a common set of rules. The effectiveness of the design is very vital in achieving a higher level of accuracy out of the recognized characters. In most cases, the proper design of the form layout greatly favors the elimination of virtually most of the errors in HCR and it reduces the number of times the characters required to undergo the verification process.

Following are some of guidelines that can be practiced during the design of form layout:

  • Define fields to encourage answers in the correct format such as MM/DD/YYYY for date fields and (###) ### - #### for telephone numbers.
  • Design your form with all lines and labels printed in a drop-out color. The scanner uses a colored light that eliminates one color on the resulting image file.
  • If you cannot use a drop-out color, assure that labels identifying fields do not constrain the area in which the respondent will write. Ideally, use rule lines or enclose them within strict boundaries to separate the label from the field.
  • Make check boxes designed for processing large enough and far enough apart to keep marks in one box from spilling over into the next.
  • To avoid confusion, use as few methods as possible to collect the information. Methods such as multiple choice or yes/no type questions, constrained answers and unconstrained answers.

HowTo

The main objective of this library is to provide an easy to use functionality to generate the HCR friendly form out of the current HTML based forms. Since this feature is disabled with the default configuration, to invoke this functionality first the user have to do the following steps:

  • To enable this feature goto Administrator » Config Utils » Config Values.
  • Scroll down the list until you finds the variable shn_xform_enabled.
  • Change the existing parameter from False to True.
  • Once the value is set to True you'll finds a link on the footer section called XForm
  • Navigates to the web from which you wants to generate the printable form, and click on the XForm link found on the footer.
  • It converts the page into to a printable form.
  • To print the page you may proceeds by invoking the browser's default printing feature( File » Print ).

Sample Data

You can use the following link to retrieve the sample data sheets that can be used to train the Artificial Neural Network of the Handwritten Character Recognition module.

Link : sahanaocrdata

Limitations

The functionality is in its initial stage of development, currently the functionality is available with developer code base. Following are the available features:

  • This feature currently works well with Mozilla Firefox and Internet Explorer version 6.
  • Formatting of the pages are currently available for A4 page size.
  • Full Functionality of the library is only available with locale en_us.

Preview

  • Normal Sahana web form

900

  • Generated OCR friendly form

702


Navigation
  • Navigate