CASE STUDY - Configuring and Customizing Sahana for the Lost Person Finder (LPF) Project

Populating the “Location Hierarchy” for a Region within a Large Nation; also the Country List

Original author and date: Glenn Pearson, May, 2009

Background

Location Hierarchy

In the Sahana Administration module, the administrator defines a location hierarchy, by number of levels and their names. A default installation would have this in a portion of the field_options table:

field_name = opt_location_type

field_name option_code option_description
opt_location_type 1 Country
opt_location_type 2 State
opt_location_type 3 City

Subsequently, the administrator must populate the hierarchy, which goes into the location table. The default install is empty. Using “Add New Location” will explicitly gather the name, description, and iso_code, and level in the hierarchy (aka location type). The ID fields are filled in automatically. The “iso_code” can be any unique string up to 20 characters. The “description” field is for the benefit of the administrator, and is not seen by the end users. Loc_uuid and parent_id values are library-generated unique values.

loc_uuid parent_id name description iso_code opt_location_type
(empty by default)

Country List

Separately, there is in field_options a related setting opt_country, that for the default install has:

field_name = opt_country

option_code option_description
uk United Kingdom
lanka Sri Lanka

This describes the values allowed in the corresponding column of the person_details table. Presumably populating opt_country codes and values is done by direct database manipulation or scripted import. A GUI control to select a country is found in the DVR and Volunteer modules. In general, potential uses for the country list might be for place of birth, citizenship, or country where now living. Since this is not a hierarchy, it does not handle states/provinces.

Problem with the Location Hierarchy’s Meaning in Light of MPR Module Usage

Because the purposes and use cases for the location hierarchy information are hard to discover, figuring out how to structure it is difficult. Furthermore, there are conflicting uses, for which the best structure will differ. The administrator might choose to enter a hierarchy:

(1) localized around the regional disaster, or

(2) reflective of a local organization’s area of geographic coverage (particularly if configured in advance of a disaster).

But consider the ambiguous “Origin” control, encountered when registering a new missing person in the Missing Person Registry, and populated by the foregoing hierarchy. For any meaning of “Origin” that restricts it to the disaster zone, then the hierarchy of form (1) will work fine, as will form (2) if the disaster zone is contained within the organization coverage area. Examples of such meanings will be given further below.

However, suppose the interpretation of “Origin” is either:

(3) where a person was born, or

(4) where their home is (from which they might be traveling).

Then a hierarchy of form (1) or (2) in many cases may be insufficiently broad and perhaps too deep in the localized region. A common world-wide hierarchy would be more appropriate (or the existing more limited country list).

In Sahana, the administrator can actually predefine more levels of the hierarchy than you actually wish to expose in a particular disaster, showing only those between the “starting level” and the “ending level”. However, that will not help with the problem here, since there is only one set of “starting level/ending level” filters for all uses.

Configuring the Hierarchy and MPR Usage to be Compatible

Let us assume that the existing structure is only to support uses compatible with forms (1) or (2). Then the “Origin” field in MPR might be interpreted (and renamed to make clearer), in 1 of 3 ways - as where the missing person:

  • encountered the disaster; or
  • was living/staying within the disaster region; or
  • was last seen within the disaster region.

Because there is already a separate free-text field for “last seen” in MPR, either of the first two is probably best.

Furthermore, any place in the GUI where birthplace or home address is needed, we will not use the opt_location as a basis for a control. This information should either be entered through the country list, text fields, or a new, separately-maintained hierarchy.

Example of a Local Hierarchy – Customization for the Washington, DC Metro Area

Our use case (with hierarchical form (2) above) calls for support centered on Bethesda, MD, to the northwest on the Capital Beltway, part of southern Montgomery County. Taking an hour’s commute in normal traffic as a guide, the support region would include DC, the counties comprising the inner metro suburbs of Maryland and Virginia, and the counties adjoining Montgomery in Maryland and Virginia. The region has the “strong county” form of local government, as opposed to, say, New England with its “strong town” governance. Thus, people very much know which county they are in.

Because we are only interested in one country, and a few states within that, and a handful of counties, we flatten all this into the top level of our hierarchy. By “County or Equivalent”, we include the counties of interest, plus only those embedded or adjoining jurisdictions that are not considered within a county. DC is the obvious example. In Maryland, the City of Baltimore is the only city not considered part of a county. In Virginia, all “cities” (even small ones) are considered outside of counties. Other incorporated Maryland cities (as well as incorporated and unincorporated towns and villages in either state) will be in level 2.

Field code: opt_location_type

option_code option_description Comments
1 County or Equivalent County choice will include state, e.g., “Montgomery County, MD”
2 Town or Neighborhood

Location table, with opt_location_type = 1

Iso_code name description
1702382 District of Columbia (There is also a City of Washington code: 2390665)
1712500 Montgomery County, MD
1714670 Prince George’s County, MD
1711211 Frederick County, MD
1709077 Howard County, MD
1695314 Baltimore County, MD
1702381 Baltimore (city), MD
1710958 Anne Arundel County, MD
1498415 Alexandria (city), VA
1480097 Arlington County, VA
1498422 Falls Church (city), VA
1480119 Fairfax County, VA
1789070 Fairfax (city), VA
1480141 Loudoun County, VA
1498430 Manassas (city), VA
1498431 Manassas Park (city), VA

The foregoing shows the top-level field; there will also be a second level field with several thousand items. Sources for the BGN codes are discussed below. The sources mentioned also have a nominal latitude and longitude. (If the often-controversial place boundaries are needed, one must look elsewhere. For example, a DC map with 127 neighborhoods tiled was found from http://en.wikipedia.org/wiki/List_of_neighborhoods_of_the_District_of_Columbia_by_ward.)

We choose to use the BGN codes here for both levels. For level 1, the codes are always associated with the “Civil” location category (i.e., “”). For level 2, “Populated Places” may provide most codes. (Of the other systems discussed below, FIPS would be an alternative, but NCIC 2000 does not code to the town/neighborhood level.) NOTE: Our project’s actual customization is subject to change. The level-1 codes were entered by hand using “Add New Location” (with the BGN code going in the ISO Code field as shown). Values for the level-2 data were

  • retrieved from the BGN site as delimited text files;
  • imported into Excel for manual data inspection and filtering;
  • exported as comma-delimited text files, one for each level-1 category;
  • batch imported into LPF/Sahana by a customization to the “Add New Location” page, that provides the appropriate loc_uuid and parent_id to each new entry. This customization will be offered back to Sahana.

A few example entries are shown below.

Location table, with opt_location_type = 2

Iso_code name Comments
583184 Bethesda, MD Example of a “Populated Place” code; parent = Montgomery County, MD
2390562 Annapolis (city), MD Example of a “Civic” code, for parent = Anne Arundel County, MD
2390567 Bowie (city), MD
2390578 College Park (city), MD Parent = Prince George’s County, MD
Lots more not shown

Country List to Cover the World – LPF Implementation

For the country list, while it is not required that the place codes and names reflect some standard, it would seem highly desirable. We pick one such standard here. Other alternatives are sketched further below.

ISO-3166 Country Codes

ISO 3166 defines standardized codes for both countries and their subdivisions such as states. For countries, there are both 2- and 3-letter encodings. For readability, ISO-3166-1 Alpha-3 is preferable (e.g., “USA”; http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3). This could well be useful as part of the default installation.

Hypothetical Discussion - Hierarchy to Cover the World

I am not making the argument here that Sahana SHOULD provide this hierarchy, but will explore what form it might take. There are practical reasons (e.g., minimize Pootle translation work) to restrict it to just two levels. Let’s invent a new field code:

Field code: opt_world_location_type

option_code option_description Comments
1 Country
2 State or Province Will be empty for some countries. Include major cities?

Again, while it is not required that the place codes and names reflect some standard, it would seem highly desirable.

ISO-3166 Country Subdivision Codes

ISO 3166 defines standardized codes for both countries (discussed above) and their subdivisions such as states. For subregions , ISO-3166-2 is useful, which generally has the 2-letter country code followed by a dash and subregion code, e.g., “US-VA” for Virginia (see en.wikipedia.org/wiki/ISO_3166-2:US).

The preceding populated hierarchy could well be useful as part of the default installation.

There can be a case for having additional level(s) specific to the installation, that provides, e.g., county, town, neighborhoods. We discuss this a bit more in the context of our customization examples.

Example of a Extending a Global Hierarchy – Customization for the US or North America

Hypothetically, suppose we wish to support a global hierarchy, and add a level 3 of detail. It would be possible to extend this level of detail to the world, but a lot of work. Let’s consider just extending this 3rd level content to North America (to the town but probably not neighborhood level). The top 2 levels can be coded as discussed above, or we could substitute IDs that match those used in the 3rd level. We restrict our discussion to candidate US coding systems.

Field code: opt_world_location_type

option_code option_description Comments
1 Country
2 State or Province Will be empty for some countries
3 Town or City May be hard to populate fully. May restrict this level of detail to North America.

BGN Codes

In the US, overall responsibility for defining official place names rests with the U.S. Board of Geographic Names (BGN). Responsibility for defining and promulgating domestic names lies with the US Geologic Survey (USGS). The USGS Geographic Names Information System (GNIS) now hosts the official codes for US populated places (defined in FIPS 55). Similarly, responsibility for official names for foreign places lies with the National Geospatial-Intelligence Agency (NGA). Its public GEONames server hosts codes for the world's countries and their subdivisions (defined in FIPS 10-4). It is fair to say that BGN and NGA base their work on input from the official place naming authorities of other countries.

All populated places, political jurisdictions, and significant geographic features are assigned by a BGN a unique positive integer value, up to 10 digits. It is referred to as the GNIS Feature ID or BGN Unique Feature ID. This can serve as the code in Sahana (at just the 3rd level, or all levels, as mentioned above). An alternative to consider for non-US locations is the BGN Unique Name ID, that identifies the placename text (and its possibly many variants) with individual codes.

For US government purposes, BGN codes and names are now the gold standard.

FIPS Codes (US Only) – Now ANSI

Earlier, the role now held by BGN Codes was held a series of Federal Information Processing Standards (FIPS), that formalized codes developed by the Census, US Post Office, and other agencies (see Table).

Table: FIPS Standards, Passed on to ANSI. In 2008, the U.S. National Institute of Standards and Technology (NIST) withdrew these as government standards (see http://www.itl.nist.gov/fipspubs/withdraw.htm), to allow maintenance to be assumed by industry/government standards bodies, namely the ANSI-accredited INCITE/L1, Geographic Information Systems (GIS), which is the U.S. representative to the corresponding ISO standards body.

FIPS Standard ID FIPS Standard Name ANSI Standard, from INCITE/L1 (GIS) work Comments
FIPS 5-2 Codes for the Identification of the States, the District of Columbia and the Outlying Areas of the United States, and Associated Areas Adopted as ANSI X3.38 FIP 5-2 is the standardization of 2-letter US postal codes.
FIPS 6-4 Counties and Equivalent Entities of the U.S., Its Possessions, and Associated Areas Uses ANSI X3.31 coding rules
FIPS 55-3 Named Populated Places, Primary County Divisions and other Entities of the U.S. and Its Outlying Areas for Information Interchange Uses ANSI X3.47 coding rules Has 5-digit integer place code, continues to be used internally by the US Census
FIPS 10-4 Countries, Dependencies, Areas of Special Sovereignty, and Their Principal Administrative Divisions
FIPS 8-6 Metropolitan Areas (Including MSAs, CMSAs, PMSAs, and NECMAs)

Sources –

For INCITE/LI, GIS: www.fgdc.gov/participation/coordination-group/meeting-minutes/2003%20meeting%20minutes/april/L1_Activities.pdf

Links in table from http://laits.gmu.edu/~achen/girm.html

NCIC 2000 Codes – North America

Another place coding to consider is that of the FBI’s National Crime Information Center (NCIC). The “NCIC 2000” encoding is part of a system to exchange information among law enforcement entities at the local, state, and federal level. Like Sahana’s MPR, the FBI system also deals with missing persons, although as fugitives, kidnap victims, or runaways rather than disaster victims. NCIC 2000 contains an extensive list of 2-letter codes that cover countries (and their dependencies and territories), US states, US territories, US Indian nations, Canadian provinces, and Mexican states. Unsurprisingly, the effort to pack all these into 2 letters means that there are non-obvious encodings (e.g., some US states are not given their customary postal codes). Except for the Indian nations, there would seem to be little advantage to using this encoding over ISO-3166, unless interchange with law enforcement was a dominant consideration.

(For our customization, the LPF deployment target in the mid-Atlantic region would not seem to require Canadian provinces or Mexican states to be enumerated. Treating US territories in the same way as states is probably reasonable. The case for including Indian nations as part of “place of birth” is unclear.)


Navigation
QR Code
QR Code dep:nlm-bhepp:casestudy:locations (generated for current page)