Table of Contents
Geonames for Haitian Departments Impacted by the Earthquake
In a Format Suitable for Populating Sahana's “Location Hierarchy”
Original author and date: Glenn Pearson, Jan, 2010
Background
Location Hierarchy
In the Sahana Administration module, the administrator (using Admin/Location) defines a location hierarchy, by number of levels and their names. A default installation would have this in a portion of the field_options table:
field_name = opt_location_type
field_name | option_code | option_description |
---|---|---|
opt_location_type | 1 | Country |
opt_location_type | 2 | State |
opt_location_type | 3 | City |
Subsequently, the administrator must populate the hierarchy, which goes into the location table. The default install is empty. Using “Add New Location” will explicitly gather the name, description, and iso_code, and level in the hierarchy (aka location type). The ID fields are filled in automatically. The “iso_code” can be any unique string up to 20 characters. The “description” field is for the benefit of the administrator, and is not seen by the end users. Loc_uuid and parent_id values are library-generated unique values.
loc_uuid | parent_id | name | description | iso_code | opt_location_type |
---|---|---|---|---|---|
(empty by default) |
Once the Admin/Location info is populated, it is exposed as the Origin pulldown, e.g., when reporting a missing person.
Example of Customization for the Haitian Earthquake
For Haiti, we could skip the country code, and start at the state level. A 2 level hierarchy is about all the available data source (described below) easily supports:
field_name | option_code | option_description |
---|---|---|
opt_location_type | 1 | Departement |
opt_location_type | 2 | City/Town/Neighborhood |
A Source of Geonames and Unique ID Codes
In the US, overall responsibility for defining official place names rests with the U.S. Board of Geographic Names (BGN). Responsibility for official names for non-US places lies with the National Geospatial-Intelligence Agency (NGA). Its public GEONames server hosts codes for the world's countries and their subdivisions (defined in FIPS 10-4). It is fair to say that BGN and NGA base their work on input from the official place naming authorities of other countries.
All populated places, political jurisdictions, and significant geographic features are assigned by a BGN a unique positive integer value, up to 10 digits. It is referred to as the BGN Unique Feature (or Name) ID, or simply Geoname_ID in our data source.
Initial Geoname Processing
The data source considered here is http://download.geonames.org/export/dump/. For Haiti, choose file HT.zip. This source includes many columns beyond the focus here, including latitudes and longitudes for some placenames (sometimes more than one geolocation per geoname). It also includes row information on, for example, rivers, that are not of interest. The types of features are described by a code, explained here
The set configures a 2-level hierarchy. (There is partial information suggestive of 3 levels, but insufficient; nor does this data source have polygonal bounding regions) The top of the hierarchy is the Haitian Departements (i.e., states), and all other populated places (including administrative regions) is a second level below a given state. An assumption is being made that an exhaustive listing of all possible populated locations is desirable. (Even at that, it may not be sufficient for some mappings to PFIF “neighborhood” fields, if the author gives the name of a major street instead. The data source does not include streets and their polylines.)
The “geoname_ID” is a unique value (although in practice there are often more than 1 associated with a given place), and just serves Sahana as an arbitrary unique ID (in the absence of ISO ID at the town/village level/neighborhood level. To repeat, we choose to take the Geonames_ID as the Sahana placename ID. Here, we show the column with that heading, but it maps to the ISO_ID in the Sahana table.
Intermediate processing is done to provide 3 tab-separated formats (because I don't know which will import easiest into the database). The “UTF8” and “Latin 1” formats have equivalent info; “ASCII” has less & should not be used unless all else fails. While some builds of Sahana can bulk import certain text files, I believe you'll have to use database tools to import directly to your instance. Just be sure to associate each file with the correct parent Departement.
The intermediate processing approach adopted was to import into an Excel spreadsheet, then create a separate worksheet for the list of Haitian Departments and each Departement.
In a first step, only rows of type ADM* (admininistrative regions) or PPL* (populated locations) were retained. Most excess columns were removed (but initially latitude & longitude retained).
This Excel file was copied, and in a second step, column renaming and column reordering was done. From this, a copy of each Department's data was made into an individual Excel file.
Top-Level of the Hierarchy - List of Departements
The overall list of Departements (of type ADM1), in a tab-separated format suitable for import into Sahana, are (with UTF8 encoding of accents):
Name | Description | Geoname_ID |
---|---|---|
Département du Nord-Ouest | Region 3 | 3719536 |
Département de l' Artibonite | Region 6 | 3731053 |
Département du Centre | Region 7 | 3728069 |
Département du Nord | Region 9 | 3719543 |
Département du Nord-Est | Region 10 | 3719540 |
Département de l' Ouest | Region 11 | 3719432 |
Département du Sud | Region 12 | 3716952 |
Département du Sud-Est | Region 13 | 3716950 |
Département de la Grand'Anse | Region 14 | 3724613 |
Département de Nippes | Region 15 | 7115999 |
This is shown in tabular form here for clarity. (Longer files below are given in tab-separated form, suitable for bulk import.) The source also provides an ASCII representation of placenames:
Name | Description | Geoname_ID |
---|---|---|
Departement du Nord-Ouest | Region 3 | 3719536 |
Departement de l' Artibonite | Region 6 | 3731053 |
Departement du Centre | Region 7 | 3728069 |
Departement du Nord | Region 9 | 3719543 |
Departement du Nord-Est | Region 10 | 3719540 |
Departement de l' Ouest | Region 11 | 3719432 |
Departement du Sud | Region 12 | 3716952 |
Departement du Sud-Est | Region 13 | 3716950 |
Departement de la Grand'Anse | Region 14 | 3724613 |
Departement de Nippes | Region 15 | 7115999 |
We have arbitrarily placed the source's Region Number into Sahana's Description field.
A further complication is that in 2003, the Nippes region was split off from Grand'Anse, but the data in the source was not subsequently segregated among the two regions. That is, Region 15 data is essentially empty. For that reason, it is expedient to combine the two for Sahana:
Name | Description | Geoname_ID |
---|---|---|
Département du Nord-Ouest | Region 3 | 3719536 |
Département de l' Artibonite | Region 6 | 3731053 |
Département du Centre | Region 7 | 3728069 |
Département du Nord | Region 9 | 3719543 |
Département du Nord-Est | Region 10 | 3719540 |
Département de l' Ouest | Region 11 | 3719432 |
Département du Sud | Region 12 | 3716952 |
Département du Sud-Est | Region 13 | 3716950 |
Département des Grand'Anse et Nippes | Region 14+15 | 3724613 |
Priority of Departements Based on Earthquake Event
The epicenter was near and southwest of Port-au-Prince. The faultline runs through the southern arm of the country, approximately parallel to the equator. Consequently, this priority was (eventually) estimated:
Highest Priority:
- Ouest 11
High Priority (adjacent to Quest, or in southern arm):
- Artibonite 6
- Centre 7
- Sud-Est 13
- Sud 12
- Grand'Anse 14 + Nippes 15
Low (skip for now):
- Nord 9
- Nord-Est 10
- Nord-Ouest 3
Finalizing Files for Individual Haitian Departements
Processing when we're down to 5 columns:
GeonameID, UTF8_Name, ASCII_Name, Alternate_Names, Feature_Code
1) Move Geoname_ID column to rightmost column (cut, paste, delete empty column A)
2) Rename Alternate_Names to Description
3) Select a name column, sort alphabetically (and accept expand to all columns when so prompted). (Initially, we sorted by UTF8_Name; later, for Regions 6,14+15, ASCII_Name was used, which is better in terms of final ordering.) Then remove rows with duplicate UTF8 place names (but see details below). Choose one row per name to keep, delete others (rightclick on start of row, delete). (We tended to retain both rows if the UTF8 place names differed [sometimes one with, one without accents], but ASCII was the same. Or if there was a difference due to capitalization or to hyphens.)
Note: Here is the somewhat arbitrary method of choosing geoname code (i.e., appropriate row) when there are multiple codes for same place name. The Geoname_ID doesn't matter all that much. Rules here are just for some consistency, but don't worry if you must deviate from them.
a) If Feature_Code = ADM2, choose that. Fill in Description with “Arrondissement ” (followed by “de”,“du”,“des”,“d'”), and delete that prefix from UTF8 and ASCII place name. If the other rows (typically ADM3) with the same name have alternate names, append “; Autre: ” & those alt names to the Description, then delete the other rows.
b) If it's of ADM3 type, prefer that to PPL* type (i.e., “PPL” or any code beginning with that). If PPL* type has alt names, move them to ADM3 row, then delete PPL* row(s). (This rule was not followed for first region practice-processed, region 3, Dept. Nord-Est)
c) If it's of PPL type, prefer that to PPLX type (& move alt names)
d) If there are multiple PPL types, choose one furtherest down list with highest geoname_ID, unless one has alt names, then choose that (this is for speed of manual processing)
4) Put “Autre: ” prefix before remaining alternative name lists in Description.
5) Once rows are processed, delete the Feature_Code column, then copy Description column, rename the two columns UTF8_Description, ASCII_Description.
6) In ASCII_Description cells, delete place names (and excess commas) that are UTF8 encodes. These are easily recognizable by odd characters like copyright symbol.
7) Copy whole worksheet to separate instance, label one “…UTF *”, the other “…ASCII”.
8) Delete 2 columns from each worksheet to make it either all UTF8 or all ASCII. Rename columns to remove “UTF8_” or “ASCII_” prefix.
9) Save
Next, make 3 different formats to make it easy to import into Sahana under different circumstances.
10) Go to ASCII worksheet, save as type “Text (tab delimited) .txt”, say “Yes” to prompts, call file “Region <whatever> (ASCII)”. Exit (say “No” to save change, because don't want to save .txt version of workbook as such), restart Step 9 .xls
11) Go to UTF8 worksheet, save as type “tab delimited .txt”, call it “Region <whatever> (Latin 1)”. Exit, “No” to change, restart .xls
12) Go to UTF8 worksheet, save as type “Unicode Text (tab delimited) .txt”, call it “Region <whatever> (UTF8)”. Exit, “No” to change.
Finalized Files for Individual Haitian Departements
Download files are read-only, with header, tab-separated. Fields whose content contains commas are wrapped in double quotes (blame Microsoft).
Region | Departement | # of Places | Data as UTF8 | Data as Latin 1 | Data as ASCII |
---|---|---|---|---|---|
3 | Nord-Ouest | skip | skip | skip | |
6 | Artibonite | Reg 6 Artibonite (UTF8).txt | Reg 6 Artibonite (Latin 1).txt | Reg 6 Artibonite (ASCII).txt | |
7 | Centre | Reg 7 Dept Centre (UTF8).txt | Reg 7 Dept Centre (Latin 1).txt | Reg 7 Dept Centre (ASCII).txt | |
9 | Nord | skip | skip | skip | |
10 | Nord-Est | skip | skip | skip | |
11 | Ouest | appx. 1600 | Reg 11 Dept Ouest (UTF8).txt | Reg 11 Ouest (Latin 1).txt | Reg 11 Ouest (ASCII).txt |
12 | Sud | Reg 12 Dept Sud (UTF8).txt | Reg 12 Dept Sud(Latin 1).txt | Reg 12 Dept Sud(ASCII).txt | |
13 | Sud-Est | 800+ | Reg 13 Dept Sud-Est (UTF8).txt | Reg 13 Dept Sud-Est (Latin 1).txt | Reg 13 Dept Sud-Est (ASCII).txt |
14+15 | Grand'Anse+Nippes | Grand'Anse,Nippes(UTF8).txt | Grand'Anse,Nippes(Latin 1).txt | Grand'Anse,Nippes(ASCII).txt |