FastAPI service (v2.0.0) that parses and standardizes US physical addresses per USPS Publication 28.
All API routes live under /api/v1/ and return an api_version field
in every response body plus an API-Version: 1 response header.
- Parse a raw address string into labelled components using the usaddress library.
- Standardize addresses to USPS format: all-caps, official suffix abbreviations (Avenue → AVE), directional abbreviations (South → S), state abbreviations (Illinois → IL), secondary unit designators (Suite → STE), and ZIP code normalization.
- Intersection support —
"Hollywood Blvd and Vine St"→"HOLLYWOOD BLVD & VINE ST". - Country validation — requests accept an optional
countryfield (ISO 3166-1 alpha-2, default"US"). Invalid codes are rejected using thepycountrylibrary; onlyUSis currently supported. - API key authentication —
/api/v1/*endpoints require anX-API-Keyheader; docs remain open. - CORS enabled — cross-origin requests are allowed from any origin.
- Health check —
GET /api/v1/healthfor liveness probes.
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/v1/parse |
🔒 | Parse raw address string into components |
POST |
/api/v1/standardize |
🔒 | Standardize address to USPS Pub 28 format |
GET |
/api/v1/health |
Service health check | |
GET |
/docs |
Interactive Swagger UI | |
GET |
/redoc |
ReDoc API documentation |
All POST endpoints accept and return application/json. Address
inputs are limited to 1000 characters.
Request:
{
"address": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
"country": "US"
}The country field is optional (defaults to "US").
Response:
{
"input": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
"country": "US",
"components": {
"spec": "usps-pub28",
"spec_version": "unknown",
"values": {
"address_number": "1600",
"street_name": "Pennsylvania",
"street_name_post_type": "Avenue",
"street_name_post_directional": "NW",
"city": "Washington",
"state": "DC",
"zip_code": "20500"
}
},
"type": "Street Address",
"warnings": [],
"api_version": "1"
}The type field is one of "Street Address", "Intersection", or
"Ambiguous". The warnings list is populated whenever input is
silently modified during parsing — for example when parenthesized text
is stripped, when repeated address numbers are joined into a range, or
when a unit designator is recovered from a mis-tagged field. It is
empty on clean input.
The components field is a ComponentSet containing:
spec— machine identifier for the component schema (e.g."usps-pub28").spec_version— edition of the spec the values conform to.values— the labelled address component key/value pairs.
Accepts either a raw address string or pre-parsed components. When
both are provided, components takes precedence and address is
ignored.
Request (string):
{
"address": "350 Fifth Ave Suite 3300, New York, NY 10118",
"country": "US"
}Request (components):
{
"components": {
"address_number": "350",
"street_name": "Fifth",
"street_name_post_type": "Ave",
"occupancy_type": "Suite",
"occupancy_identifier": "3300",
"city": "New York",
"state": "NY",
"zip_code": "10118"
},
"country": "US"
}Response:
{
"address_line_1": "350 FIFTH AVE",
"address_line_2": "STE 3300",
"city": "NEW YORK",
"region": "NY",
"postal_code": "10118",
"country": "US",
"standardized": "350 FIFTH AVE STE 3300 NEW YORK, NY 10118",
"components": {
"spec": "usps-pub28",
"spec_version": "unknown",
"values": {
"address_number": "350",
"street_name": "FIFTH",
"street_name_post_type": "AVE",
"occupancy_type": "STE",
"occupancy_identifier": "3300",
"city": "NEW YORK",
"state": "NY",
"zip_code": "10118"
}
},
"warnings": [],
"api_version": "1"
}The response uses geography-neutral field names: region (not state)
and postal_code (not zip_code). The standardized field uses
two-space separators between address lines, matching the USPS
single-line format convention.
Response:
{"status": "ok", "api_version": "1"}No authentication required.
All /api/v1/* endpoints (except /api/v1/health) require an
X-API-Key header. Set the expected key via the API_KEY environment
variable:
export API_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')Requests without a valid key receive 401 or 403. Swagger (/docs)
and ReDoc (/redoc) remain open.
curl -X POST http://localhost:8000/api/v1/standardize \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_KEY' \
-d '{"address": "350 Fifth Ave, New York, NY 10118"}'Requires Python ≥ 3.12. Dependencies are managed with uv.
uv sync # install dependencies into .venv/
export API_KEY="your-secret-key"
uv run uvicorn address_validator.main:app --host 0.0.0.0 --port 8000A systemd unit file (infra/address-validator.service) is included for
persistent deployment. The API key is stored in
/etc/address-validator/.env and loaded via EnvironmentFile=.
src/address_validator/
main.py # FastAPI app, lifespan, exception handlers
auth.py # API key authentication dependency
models.py # Pydantic request/response models (API contract)
logging_filter.py # RequestIdFilter — injects request_id into logs
middleware/
api_version.py # Appends API-Version header on /api/v1/ and /api/v2/ responses
audit.py # Records every API request to audit_log (fire-and-forget)
request_id.py # ULID generation, X-Request-ID header
core/
address_format.py # Canonical single-line address string builder
countries.py # SUPPORTED_COUNTRIES, check_country(), check_country_v2()
errors.py # APIError, api_error_response()
routers/
deps.py # Shared FastAPI dependencies (registry, libpostal client)
v1/ # USPS Pub 28 surface (parse, standardize, validate, countries, health)
v2/ # ISO 19160-4 surface; component_profile query param
admin/ # Admin dashboard (Jinja2 + HTMX, exe.dev auth)
queries/ # SQLAlchemy Core query helpers (audit, candidates, batches, dashboard)
services/
parser.py # usaddress (US) + libpostal (CA) dispatcher
standardizer.py # USPS Pub 28 standardization logic
country_format.py # i18naddress → CountryFormatResponse
audit.py # Audit ContextVars + write_audit_row
training_candidates.py # Training candidate ContextVars + write_training_candidate
training_batches.py # Batch lifecycle state machine + CRUD
libpostal_client.py # Async httpx client for libpostal sidecar (port 4400)
street_splitter.py # Bilingual CA street component splitter
spec.py # ISO 19160-4 spec identifiers
component_profiles.py # ISO 19160-4 ↔ USPS Pub 28 key translation
validation/ # Provider abstraction (USPS, Google, chain, cache, pipeline)
db/
engine.py # AsyncEngine singleton + Alembic migration runner
tables.py # SQLAlchemy Core Table definitions
usps_data/ # Pub 28 lookup tables (suffixes, directionals, states, units)
canada_post_data/ # Canada Post lookup tables (provinces, suffixes, directionals)
alembic/ # Database migrations
docs/ # Architecture docs, USPS/ISO research, design plans
infra/ # systemd units, timer files, and archive_audit.py for VM deployment
scripts/
build/ # Tailwind CLI build and pre-commit hook
db/ # DB maintenance and one-time migration scripts
model/ # Training pipeline scripts (identify → contribute)
tests/
unit/ # Unit tests
integration/ # Integration tests (HTTP endpoints)
js/ # Vitest + jsdom tests for admin JS
training/ # Per-batch training artifacts and upstream usaddress data
skills/ # local skill overrides and symlinks into skills-vendor/
skills-vendor/ # vendored skill repo submodules
pyproject.toml # Project metadata, dependencies, tool config
- All output uppercased
- Parenthesized text removed (USPS Pub 28 §354 — not valid in standardized addresses; typically wayfinding notes)
- Trailing commas, semicolons, and stray punctuation stripped
- Street suffixes abbreviated (USPS Pub 28 Appendix C)
- Directionals abbreviated (N, S, E, W, NE, NW, SE, SW)
- State names converted to two-letter abbreviations
- Secondary unit designators abbreviated (Suite → STE, Apartment → APT, Building/Bldg/Bld → BLDG, etc.)
- Unit identifiers without a designator default to
# - Designator words folded into identifiers are extracted
(e.g.
NO. 16→# 16) - Both occupancy and subaddress designators preserved when present
- Dual address numbers joined with hyphen (
1804 & 1810→1804-1810) - Periods removed from all components
- ZIP codes normalized to 5-digit or 5+4 format
- Unit designators mis-tagged as city by the parser are recovered
(e.g.
BASEMENT, FREELAND→ line 2BSMT, cityFREELAND) - Non-address wayfinding words (e.g.
YARD) dropped from city - Line 2 ordering: larger container (BLDG) before specific unit (STE)
- Intersections formatted as
STREET1 & STREET2
uv run pytest # full suite with coverage
uv run pytest --no-cov # fast, no coverage
uv run pytest tests/unit/services/test_parser.py # single file
uv run ruff check . # lint
uv run ruff format . # formatCoverage floor: 80% line + branch (enforced by --cov-fail-under=80).