Geonames for Haitian Departments Impacted by the Earthquake

Geonames for Haitian Departments Impacted by the Earthquake

In a Format Suitable for Populating Sahana's “Location Hierarchy”

Original author and date: Glenn Pearson, Jan, 2010

Background

Location Hierarchy

In the Sahana Administration module, the administrator (using Admin/Location) defines a location hierarchy, by number of levels and their names. A default installation would have this in a portion of the field_options table:

field_name = opt_location_type

field_name	option_code	option_description
opt_location_type	1	Country
opt_location_type	2	State
opt_location_type	3	City

Subsequently, the administrator must populate the hierarchy, which goes into the location table. The default install is empty. Using “Add New Location” will explicitly gather the name, description, and iso_code, and level in the hierarchy (aka location type). The ID fields are filled in automatically. The “iso_code” can be any unique string up to 20 characters. The “description” field is for the benefit of the administrator, and is not seen by the end users. Loc_uuid and parent_id values are library-generated unique values.

loc_uuid	parent_id	name	description	iso_code	opt_location_type
			(empty by default)

Once the Admin/Location info is populated, it is exposed as the Origin pulldown, e.g., when reporting a missing person.

Example of Customization for the Haitian Earthquake

For Haiti, we could skip the country code, and start at the state level. A 2 level hierarchy is about all the available data source (described below) easily supports:

field_name	option_code	option_description
opt_location_type	1	Departement
opt_location_type	2	City/Town/Neighborhood

A Source of Geonames and Unique ID Codes

In the US, overall responsibility for defining official place names rests with the U.S. Board of Geographic Names (BGN). Responsibility for official names for non-US places lies with the National Geospatial-Intelligence Agency (NGA). Its public GEONames server hosts codes for the world's countries and their subdivisions (defined in FIPS 10-4). It is fair to say that BGN and NGA base their work on input from the official place naming authorities of other countries.

All populated places, political jurisdictions, and significant geographic features are assigned by a BGN a unique positive integer value, up to 10 digits. It is referred to as the BGN Unique Feature (or Name) ID, or simply Geoname_ID in our data source.

Initial Geoname Processing

The data source considered here is http://download.geonames.org/export/dump/. For Haiti, choose file HT.zip. This source includes many columns beyond the focus here, including latitudes and longitudes for some placenames (sometimes more than one geolocation per geoname). It also includes row information on, for example, rivers, that are not of interest. The types of features are described by a code, explained here

The set configures a 2-level hierarchy. (There is partial information suggestive of 3 levels, but insufficient; nor does this data source have polygonal bounding regions) The top of the hierarchy is the Haitian Departements (i.e., states), and all other populated places (including administrative regions) is a second level below a given state. An assumption is being made that an exhaustive listing of all possible populated locations is desirable. (Even at that, it may not be sufficient for some mappings to PFIF “neighborhood” fields, if the author gives the name of a major street instead. The data source does not include streets and their polylines.)

The “geoname_ID” is a unique value (although in practice there are often more than 1 associated with a given place), and just serves Sahana as an arbitrary unique ID (in the absence of ISO ID at the town/village level/neighborhood level. To repeat, we choose to take the Geonames_ID as the Sahana placename ID. Here, we show the column with that heading, but it maps to the ISO_ID in the Sahana table.

Intermediate processing is done to provide 3 tab-separated formats (because I don't know which will import easiest into the database). The “UTF8” and “Latin 1” formats have equivalent info; “ASCII” has less & should not be used unless all else fails. While some builds of Sahana can bulk import certain text files, I believe you'll have to use database tools to import directly to your instance. Just be sure to associate each file with the correct parent Departement.

The intermediate processing approach adopted was to import into an Excel spreadsheet, then create a separate worksheet for the list of Haitian Departments and each Departement.

In a first step, only rows of type ADM* (admininistrative regions) or PPL* (populated locations) were retained. Most excess columns were removed (but initially latitude & longitude retained).

This Excel file was copied, and in a second step, column renaming and column reordering was done. From this, a copy of each Department's data was made into an individual Excel file.

Top-Level of the Hierarchy - List of Departements

The overall list of Departements (of type ADM1), in a tab-separated format suitable for import into Sahana, are (with UTF8 encoding of accents):

Name	Description	Geoname_ID
DÃ©partement du Nord-Ouest	Region 3	3719536
DÃ©partement de l' Artibonite	Region 6	3731053
DÃ©partement du Centre	Region 7	3728069
DÃ©partement du Nord	Region 9	3719543
DÃ©partement du Nord-Est	Region 10	3719540
DÃ©partement de l' Ouest	Region 11	3719432
DÃ©partement du Sud	Region 12	3716952
DÃ©partement du Sud-Est	Region 13	3716950
DÃ©partement de la Grand'Anse	Region 14	3724613
DÃ©partement de Nippes	Region 15	7115999

This is shown in tabular form here for clarity. (Longer files below are given in tab-separated form, suitable for bulk import.) The source also provides an ASCII representation of placenames:

Name	Description	Geoname_ID
Departement du Nord-Ouest	Region 3	3719536
Departement de l' Artibonite	Region 6	3731053
Departement du Centre	Region 7	3728069
Departement du Nord	Region 9	3719543
Departement du Nord-Est	Region 10	3719540
Departement de l' Ouest	Region 11	3719432
Departement du Sud	Region 12	3716952
Departement du Sud-Est	Region 13	3716950
Departement de la Grand'Anse	Region 14	3724613
Departement de Nippes	Region 15	7115999

We have arbitrarily placed the source's Region Number into Sahana's Description field.

A further complication is that in 2003, the Nippes region was split off from Grand'Anse, but the data in the source was not subsequently segregated among the two regions. That is, Region 15 data is essentially empty. For that reason, it is expedient to combine the two for Sahana:

Name	Description	Geoname_ID
DÃ©partement du Nord-Ouest	Region 3	3719536
DÃ©partement de l' Artibonite	Region 6	3731053
DÃ©partement du Centre	Region 7	3728069
DÃ©partement du Nord	Region 9	3719543
DÃ©partement du Nord-Est	Region 10	3719540
DÃ©partement de l' Ouest	Region 11	3719432
DÃ©partement du Sud	Region 12	3716952
DÃ©partement du Sud-Est	Region 13	3716950
DÃ©partement des Grand'Anse et Nippes	Region 14+15	3724613

Priority of Departements Based on Earthquake Event

The epicenter was near and southwest of Port-au-Prince. The faultline runs through the southern arm of the country, approximately parallel to the equator. Consequently, this priority was (eventually) estimated:

Highest Priority:

Ouest 11

High Priority (adjacent to Quest, or in southern arm):

Artibonite 6
Centre 7
Sud-Est 13
Sud 12
Grand'Anse 14 + Nippes 15

Low (skip for now):

Nord 9
Nord-Est 10
Nord-Ouest 3

Finalizing Files for Individual Haitian Departements

Processing when we're down to 5 columns:

GeonameID, UTF8_Name, ASCII_Name, Alternate_Names, Feature_Code

1) Move Geoname_ID column to rightmost column (cut, paste, delete empty column A)

2) Rename Alternate_Names to Description

3) Select a name column, sort alphabetically (and accept expand to all columns when so prompted). (Initially, we sorted by UTF8_Name; later, for Regions 6,14+15, ASCII_Name was used, which is better in terms of final ordering.) Then remove rows with duplicate UTF8 place names (but see details below). Choose one row per name to keep, delete others (rightclick on start of row, delete). (We tended to retain both rows if the UTF8 place names differed [sometimes one with, one without accents], but ASCII was the same. Or if there was a difference due to capitalization or to hyphens.)

Note: Here is the somewhat arbitrary method of choosing geoname code (i.e., appropriate row) when there are multiple codes for same place name. The Geoname_ID doesn't matter all that much. Rules here are just for some consistency, but don't worry if you must deviate from them.

a) If Feature_Code = ADM2, choose that. Fill in Description with “Arrondissement ” (followed by “de”,“du”,“des”,“d'”), and delete that prefix from UTF8 and ASCII place name. If the other rows (typically ADM3) with the same name have alternate names, append “; Autre: ” & those alt names to the Description, then delete the other rows.

b) If it's of ADM3 type, prefer that to PPL* type (i.e., “PPL” or any code beginning with that). If PPL* type has alt names, move them to ADM3 row, then delete PPL* row(s). (This rule was not followed for first region practice-processed, region 3, Dept. Nord-Est)

c) If it's of PPL type, prefer that to PPLX type (& move alt names)

d) If there are multiple PPL types, choose one furtherest down list with highest geoname_ID, unless one has alt names, then choose that (this is for speed of manual processing)

4) Put “Autre: ” prefix before remaining alternative name lists in Description.

5) Once rows are processed, delete the Feature_Code column, then copy Description column, rename the two columns UTF8_Description, ASCII_Description.

6) In ASCII_Description cells, delete place names (and excess commas) that are UTF8 encodes. These are easily recognizable by odd characters like copyright symbol.

7) Copy whole worksheet to separate instance, label one “…UTF *”, the other “…ASCII”.

8) Delete 2 columns from each worksheet to make it either all UTF8 or all ASCII. Rename columns to remove “UTF8_” or “ASCII_” prefix.

9) Save

Next, make 3 different formats to make it easy to import into Sahana under different circumstances.

10) Go to ASCII worksheet, save as type “Text (tab delimited) .txt”, say “Yes” to prompts, call file “Region <whatever> (ASCII)”. Exit (say “No” to save change, because don't want to save .txt version of workbook as such), restart Step 9 .xls

11) Go to UTF8 worksheet, save as type “tab delimited .txt”, call it “Region <whatever> (Latin 1)”. Exit, “No” to change, restart .xls

12) Go to UTF8 worksheet, save as type “Unicode Text (tab delimited) .txt”, call it “Region <whatever> (UTF8)”. Exit, “No” to change.

Finalized Files for Individual Haitian Departements

Download files are read-only, with header, tab-separated. Fields whose content contains commas are wrapped in double quotes (blame Microsoft).

Region	Departement	# of Places	Data as UTF8	Data as Latin 1	Data as ASCII
3	Nord-Ouest		skip	skip	skip
6	Artibonite		Reg 6 Artibonite (UTF8).txt	Reg 6 Artibonite (Latin 1).txt	Reg 6 Artibonite (ASCII).txt
7	Centre		Reg 7 Dept Centre (UTF8).txt	Reg 7 Dept Centre (Latin 1).txt	Reg 7 Dept Centre (ASCII).txt
9	Nord		skip	skip	skip
10	Nord-Est		skip	skip	skip
11	Ouest	appx. 1600	Reg 11 Dept Ouest (UTF8).txt	Reg 11 Ouest (Latin 1).txt	Reg 11 Ouest (ASCII).txt
12	Sud		Reg 12 Dept Sud (UTF8).txt	Reg 12 Dept Sud(Latin 1).txt	Reg 12 Dept Sud(ASCII).txt
13	Sud-Est	800+	Reg 13 Dept Sud-Est (UTF8).txt	Reg 13 Dept Sud-Est (Latin 1).txt	Reg 13 Dept Sud-Est (ASCII).txt
14+15	Grand'Anse+Nippes		Grand'Anse,Nippes(UTF8).txt	Grand'Anse,Nippes(Latin 1).txt	Grand'Anse,Nippes(ASCII).txt