chchchanges

The missing API for OSM changeset metadata queries.

The OpenStreetMap API lets you query changeset metadata, but the usefulness of that API is limited by a result maximum of 100.

This API lets you query by entire days, hours, minutes, or ID ranges, and supports additional output formats like jsonl for efficient streaming processing.

The code is open source (ISC), you can find it on Codeberg

About Changesets

Each time an OpenStreetMap contributor uploads their edits, a changeset gets created. A changeset has two components: metadata and map changes. The metadata part, which is what this service provides, tells you useful things about the changeset: when it was opened, closed, by whom, which editor was used, and some more useful information about the context of the changes made to the map. The map changes is the information about the actual edits the contributor made to OSM. This can be any combination of adding new features, changing existing ones, or deleting features. There are some links below if you want to learn more.

API Endpoints

GET /health - Health check

/health

GET /changesets/day/{YYYYMMDD} - All changesets on a date

# XML format (default)
/changesets/day/20240115

# JSON format
/changesets/day/20240115.json

# JSONL format (one changeset per line, useful for large results)
/changesets/day/20240115.jsonl

GET /changesets/day/{YYYYMMDD}/hour/{HH} - All changesets in a specific hour

/changesets/day/20240115/hour/23.json

GET /changesets/day/{YYYYMMDD}/hour/{HH}/minute/{MM} - All changesets in a specific minute

/changesets/day/20240115/hour/23/minute/59.json

GET /changesets/id/from/{start_id}/to/{end_id} - All changesets in an ID range

# XML format (default)
/changesets/id/from/100000/to/100500

# JSON format
/changesets/id/from/100000/to/100500.json

# With filters
/changesets/id/from/100000/to/200000.json?user=alice&limit=100

Note: ID ranges are limited to 1,000,000 IDs maximum per query. Results are ordered by changeset ID ascending.

GET /changesets/date/from/{start_date}/to/{end_date} - All changesets in a date range

# XML format (default)
/changesets/date/from/20240115/to/20240116

# JSON format
/changesets/date/from/20240115/to/20240116.json

# JSONL format
/changesets/date/from/20240115/to/20240116.jsonl

Note: Date ranges are limited to 1 day maximum (e.g., consecutive dates only). For larger ranges, make multiple requests.

GET /meta/state - Dataset statistics and replication status

/meta/state

Returns comprehensive dataset information including:

Use this endpoint to monitor data freshness and detect replication issues. If sequences_behind is consistently high or seconds_since_update is large, the replication worker may be down.

Format Options

All endpoints support multiple output formats:

Specify format using file extension (e.g., .json) or query parameter (e.g., ?format=json)

Query Parameters

Batch endpoints (day/hour/minute/date range/ID range) support filtering and pagination:

# Example: Get first 50 changesets by user 'alice' on a specific day
/changesets/day/20240115.json?user=alice&limit=50

# Example: Find long-running changesets (open > 1 hour)
/changesets/day/20240115.json?min_duration_seconds=3600

# Example: Find large mapping projects (area > 5 km²) sorted by size
/changesets/day/20240115.json?min_bbox_area_km2=5.0&order_by=bbox_area_km2&order_direction=DESC

Remember that OSM usernames are case-sensitive, so alice is probably not the same mapper as Alice.

Data Schema

Each changeset includes:

Note: In XML format, changeset comments/descriptions appear as a comment tag. In JSON/JSONL, the same information is in tags.comment.

Technical details

The system consists of three components: a converter for bulk importing planet dump files, a server that serves the API, and a worker that continuously fetches and processes minutely replication data from OpenStreetMap.

Database-less Architecture

The API has no traditional database backend. Data is stored as hive-partitioned Parquet files (year=YYYY/month=MM/*.parquet), queried directly using DuckDB's embedded SQL engine. Some benefits (not all of which we actually use today) are:

Lambda Architecture

Because Parquet files don't support CRUD operations, the system uses a lambda architecture with two layers:

The query engine reads both layers simultaneously and deduplicates using the replication_sequence field (higher = newer, NULL = base). This provides up-to-the-minute freshness without rebuilding multi-GB base files.

Smart Batching

The replication worker doesn't write one file per minute — that would create millions of tiny files. Instead, it aggregates minutely updates into hourly Parquet files, flushing when:

This balances write efficiency (fewer files, better compression) with query freshness (max 1 hour lag during catch-up).

Automatic Updates

The API server periodically scans for new partition files (every 30 seconds). When new Parquet files appear in the data directory, they're automatically discovered and become available for queries.

Trade-offs and quirks

See also


Data from OpenStreetMap © OpenStreetMap contributors, available under the Open Database License.