Skip to content

[common] Extend City<T> with country, AdminRegion and type; enrich cities.xml#189

Open
comoglu wants to merge 8 commits into
SeisComP:mainfrom
comoglu:feature/cityd-state-type
Open

[common] Extend City<T> with country, AdminRegion and type; enrich cities.xml#189
comoglu wants to merge 8 commits into
SeisComP:mainfrom
comoglu:feature/cityd-state-type

Conversation

@comoglu
Copy link
Copy Markdown
Contributor

@comoglu comoglu commented Apr 13, 2026

Extends City<T> with three new fields to enable richer location descriptions in SeisComP applications.

Changes

AdminRegion struct (replaces flat state/stateFull strings)

A dedicated struct serialised as a child element:

<state abbr="NSW"><name>New South Wales</name></state>
  • abbr — ISO 3166-2 alphabetic subdivision suffix (e.g. NSW, CA); empty where only numeric codes exist (e.g. Japan, China, Türkiye)
  • name — full subdivision name

country field

Full English country name serialised as a <country> child element:

<country>Australia</country>

Allows applications to display a human-readable country name without maintaining a separate ISO A2 → name lookup table.

type field

Location type derived from GeoNames feature codes. One of (or empty if unknown):

Value Meaning
city Capital or administrative centre (PPLC, PPLA, PPLA2)
town Populated place or minor admin centre (PPL, PPLA3, PPLA4)
village Small settlement (PPLF, PPLL, PPLR, PPLS, etc.)
suburb Section of a populated place (PPLX)

Updated cities.xml

Re-enriched from freely available sources:

Field Source Coverage
countryID Natural Earth 10m country polygons 71,682 / 71,685
country Natural Earth SQLite (ne_10m_admin_0_countries) 70,242
state Natural Earth ne_10m_admin_1_states_provinces (ISO 3166-2 abbreviations) 71,654
type GeoNames cities500.txt feature codes 56,860

All fields are optional and additive — existing applications ignoring them are unaffected.

Enrichment tooling

The pipeline script used to produce the enriched cities.xml is available at
https://github.com/comoglu/cities-xml-update — allowing operators to add their own cities or re-enrich from updated upstream data.

@cla-bot cla-bot Bot added the cla-signed The CLA has been signed by all contributors label Apr 13, 2026
Comment thread libs/seiscomp/math/coord.h Outdated
std::string _countryID;
double _population;
std::string _category;
std::string _type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a type enumeration be a better approach? Is there a defined set of values for type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @gempa-jabe the values are indeed fixed. GeoNames feature codes map to exactly four types: city (PPLC/PPLA/PPLA2), town (PPL/PPLA3/PPLA4), village (PPLF/PPLL/PPLR etc.) and suburb (PPLX), plus Unknown for entries where the attribute is absent.

I've replaced the std::string with a CityType enum class and added parseCityType()/toString() helpers for the XML round-trip. The cities.xml format is unchanged — serialization still uses the lowercase string attribute, so there's no migration needed
comoglu@2b7623a

Comment thread libs/seiscomp/math/coord.h Outdated
@gempa-jabe
Copy link
Copy Markdown
Contributor

I would be interested in how big that will grow. Given all the redundancy of state and type and also country cannot be for free. I haven't checked but maybe there is already a standard out there to describe such administrative information.

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

cities_enriched.xml

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

15MB

@gempa-jabe
Copy link
Copy Markdown
Contributor

OK, that is not too much of a difference. The schema itself would be something I would like to address. I personally am not a big fan of "stateFull" which is actually "state". Unfortunately I do not have the time right now.

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

Thanks for the feedback. To address the stateFull naming concern: in the updated implementation I have already replaced both state and stateFull flat strings with a dedicated AdminRegion that has two clear fields — abbr (ISO 3166-2 suffix, like. "NSW") and name (full subdivision name, e.g. "New South Wales"). This serializes as a child element:

New South Wales

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

Updating Cities xml for entire world in one rule is indeed a hard work.

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

cities_enriched.xml

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 13, 2026

I've shared the enriched cities.xml as a preview. Enriching city types globally with a single rule set is harder than I would expect — GeoNames feature codes are inconsistently applied (e.g. a 1M+ population city classified as PPL → "town" based on natural earth data (shape file)).

Before going further, I'd appreciate your direction:

Is the type field worth pursuing, or should I drop it ?
For state: the ISO 3166-2 abbreviation approach works well for ~40 countries (US, AU, CA, IN) but many countries only have numeric codes. I would go with partial coverage, if it is OK, or would you prefer I omit state for countries without alphabetic abbreviations?

@comoglu comoglu changed the title Feature/cityd state type [common] Extend City<T> with country, AdminRegion and type; enrich cities.xml Apr 13, 2026
comoglu added a commit to comoglu/common that referenced this pull request Apr 29, 2026
Addresses @Jabe request in PR SeisComP#189: the location type was a free
std::string accepting any value, but the set of valid values derived
from GeoNames feature codes is fixed and well-defined. Introduce a
CityType enum class (Unknown, City, Town, Village, Suburb) together
with parseCityType()/toString() helpers. Serialization still uses the
lowercase string form in XML so cities.xml requires no changes.
comoglu added a commit to comoglu/common that referenced this pull request Apr 29, 2026
Addresses @gempa-jabe request in PR SeisComP#189: the location type was a free
std::string accepting any value, but the set of valid values derived
from GeoNames feature codes is fixed and well-defined. Introduce a
CityType enum class (Unknown, City, Town, Village, Suburb) together
with parseCityType()/toString() helpers. Serialization still uses the
lowercase string form in XML so cities.xml requires no changes.
@comoglu comoglu force-pushed the feature/cityd-state-type branch from 4a08572 to 2b7623a Compare April 29, 2026 13:30
@gempa-jabe
Copy link
Copy Markdown
Contributor

Actually I do not have strong feelings about it. Maybe we can start extending the city model with the new attributes. You can then test and optimize your cities.xml locally. Updating the cities.xml can be done in a separate PR. What do you think?

@comoglu comoglu force-pushed the feature/cityd-state-type branch from 2b7623a to 66f68fc Compare April 30, 2026 09:38
Comment thread libs/seiscomp/math/coord.cpp Outdated

// >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
void AdminRegion::serialize(Core::BaseObject::Archive& ar) {
abbr = ""; name = "";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would write an object, it would reset your attributes. Maybe checking isReading() could help in that regard. Why would you want to do that anyway? Is there any use-case you have in mind requiring this reset?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I just followed the same pattern already in the file. Since the strings default to empty I'll remove it. Should the pre-existing resets in City::serialize() be cleaned up as well or leave them as-is?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove them in the main branch as this is just wrong. Since we are just reading those classes, it might not be important but it is still wrong.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed in 5f11d4a.

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 30, 2026

Actually I do not have strong feelings about it. Maybe we can start extending the city model with the new attributes. You can then test and optimize your cities.xml locally. Updating the cities.xml can be done in a separate PR. What do you think?

I agree that makes sense. The model extensions (country, AdminRegion and CityType) are already in this PR. I can split the updated cities.xml into a separate PR. Do you think then this PR can be merged after you finish reviewing it. That also gives me time to test and optimise the data locally before submitting it.

For future attributes, elevation could be a natural addition to the dataset. I'll think about what else might be useful during local testing.

@gempa-jabe
Copy link
Copy Markdown
Contributor

Do you think then this PR can be merged after you finish reviewing it.

Yep, this is actually just an extension to the current model.

@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 30, 2026

Do you think then this PR can be merged after you finish reviewing it.

Yep, this is actually just an extension to the current model.

Great, thanks.

comoglu added a commit to comoglu/main that referenced this pull request Apr 30, 2026
city.type() now returns a CityType enum wrapper (MAKEENUM) rather than
std::string following the change in SeisComP/common#189.
Comment thread libs/seiscomp/math/coord.cpp Outdated
void City<T>::serialize(Core::BaseObject::Archive& ar) {
NamedCoord<T>::serialize(ar);
_category = ""; _countryID = ""; _country = ""; _type = "";
_category = ""; _countryID = ""; _country = ""; _type = CITYTYPE_UNKNOWN;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove that as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all other occurrences with e3a8924.

comoglu added 6 commits April 30, 2026 22:19
Two new options for controlling the map view when an event arrives,
applied in priority order:

1. display.lonmin/max/latmin/max (already existing) — if set, the map
   is anchored to that region on every event popup, regardless of where
   the event is located. Useful for regional networks monitoring a fixed
   area (e.g. an scesv alias per region).

2. display.defaultEventRadius — configurable radius in degrees, centred
   on the event epicentre (mirrors olv.map.event.defaultRadius in scolv).
   A negative value (default) restores the original automatic behaviour
   of using the maximum arrival distance capped at 30 degrees.

If neither option is set the existing behaviour is fully preserved.
Extends the City<T> template class with three new optional attributes
to carry richer location metadata sourced from GeoNames and Natural Earth:

  type      — location type, e.g. "city", "town", "village", "suburb"
  state     — ISO 3166-2 alphabetic subdivision abbreviation, e.g. "NSW", "CA"
               Only present for countries that use alphabetic codes.
  stateFull — full administrative region name, e.g. "New South Wales"

All three fields are serialized as XML attributes (NAMED_OBJECT),
consistent with the existing countryID and category fields. They default
to empty strings and are fully backwards-compatible — existing cities.xml
files without these attributes parse without error.

This allows applications such as scolv to display type and state
information via SCApp->cities() without requiring a separate supplementary
locations file.
- City<T>: add country() / setCountry() serialised as <country> child element
- City<T>: replace state/stateFull strings with AdminRegion { abbr, name }
  serialised as <state abbr="NSW"><name>New South Wales</name></state>
- Document fixed type values: city, town, village, suburb
- Regenerate cities.xml with country names (Natural Earth), admin region
  (ISO 3166-2 abbreviation + full name via NE admin1 shapefile), and type
  (GeoNames feature codes); 70242 country names, 71654 states, 56860 types
Replace the manual enum class and custom string conversion with
MAKEENUM as suggested by @gempa-jabe. The archive now handles
string serialization automatically.
std::string members default to empty so the explicit reset is redundant
and would corrupt data on write.
CityType and AdminRegion are optional in cities.xml; using plain
types caused the archive to mark the entire City object invalid
when those attributes/elements were absent, silently dropping all
cities.  Wrap both fields in OPT() (std::optional) so the archive
skips missing fields gracefully.

type() and adminRegion() return safe defaults when the optional is
not set.
@comoglu comoglu force-pushed the feature/cityd-state-type branch from 5f11d4a to 6be58f4 Compare April 30, 2026 12:25
comoglu added a commit to comoglu/main that referenced this pull request Apr 30, 2026
city.type() now returns a CityType enum wrapper (MAKEENUM) rather than
std::string following the change in SeisComP/common#189.
@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented Apr 30, 2026

Rebased onto current main to incorporate upstream commit e3a89244 ([math] Remove variable resets in serialize) and 23 other upstream commits. Conflicts in coord.cpp serialize methods were resolved by dropping the reset lines (aligned with the upstream change). Functional diff is unchanged.

@gempa-stephan
Copy link
Copy Markdown
Contributor

One minor change request: Could please use American English in your source code as well as in the package documentation and description.xml files.

Replace British spellings: honoured→honored, unrecognised→unrecognized,
centre→center.
@comoglu
Copy link
Copy Markdown
Contributor Author

comoglu commented May 1, 2026 via email

comoglu added a commit to comoglu/main that referenced this pull request May 14, 2026
city.type() now returns a CityType enum wrapper (MAKEENUM) rather than
std::string following the change in SeisComP/common#189.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed by all contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants