chchchanges

The missing API for OSM changeset metadata queries.

The OpenStreetMap API lets you query changeset metadata, but the usefulness of that API is limited by a result maximum of 100.

This API lets you query by entire days, hours, minutes, or ID ranges, and supports additional output formats like jsonl for efficient streaming processing.

The code is open source (ISC), you can find it on Codeberg

About Changesets

Each time an OpenStreetMap contributor uploads their edits, a changeset gets created. A changeset has two components: metadata and map changes. The metadata part, which is what this service provides, tells you useful things about the changeset: when it was opened, closed, by whom, which editor was used, and some more useful information about the context of the changes made to the map. The map changes is the information about the actual edits the contributor made to OSM. This can be any combination of adding new features, changing existing ones, or deleting features. There are some links below if you want to learn more.

API Endpoints

GET /health - Health check

/health

GET /changesets/day/{YYYYMMDD} - All changesets on a date

# XML format (default)
/changesets/day/20240115

# JSON format
/changesets/day/20240115.json

# JSONL format (one changeset per line, useful for large results)
/changesets/day/20240115.jsonl

GET /changesets/day/{YYYYMMDD}/hour/{HH} - All changesets in a specific hour

/changesets/day/20240115/hour/23.json

GET /changesets/day/{YYYYMMDD}/hour/{HH}/minute/{MM} - All changesets in a specific minute

/changesets/day/20240115/hour/23/minute/59.json

GET /changesets/id/from/{start_id}/to/{end_id} - All changesets in an ID range

# XML format (default)
/changesets/id/from/100000/to/100500

# JSON format
/changesets/id/from/100000/to/100500.json

# With filters
/changesets/id/from/100000/to/200000.json?user=alice&limit=100

Note: ID ranges are limited to 1,000,000 IDs maximum per query. Results are ordered by changeset ID ascending.

GET /changesets/date/from/{start_date}/to/{end_date} - All changesets in a date range

# XML format (default)
/changesets/date/from/20240115/to/20240116

# JSON format
/changesets/date/from/20240115/to/20240116.json

# JSONL format
/changesets/date/from/20240115/to/20240116.jsonl

Note: Date ranges are limited to 1 day maximum (e.g., consecutive dates only). For larger ranges, make multiple requests.

GET /meta/state - Dataset statistics and replication status

/meta/state

Returns comprehensive dataset information including:

Local replication sequence and time since last update
Changeset ID range (earliest/latest)
Total changeset count (approximate, may include duplicates between base/delta layers)
OSM server's current sequence and how far behind we are

Use this endpoint to monitor data freshness and detect replication issues. If sequences_behind is consistently high or seconds_since_update is large, the replication worker may be down.

Format Options

All endpoints support multiple output formats:

xml - OSM XML format (default)
json - JSON format
jsonl - JSON Lines format (one changeset per line, for batch endpoints only)

Specify format using file extension (e.g., .json) or query parameter (e.g., ?format=json)

Query Parameters

Batch endpoints (day/hour/minute/date range/ID range) support filtering and pagination:

user - Filter by username (e.g., ?user=alice)
uid - Filter by user ID (e.g., ?uid=123)
open - Filter by open/closed status (e.g., ?open=true)
min_duration_seconds - Filter by minimum changeset duration in seconds (e.g., ?min_duration_seconds=3600 for changesets open at least 1 hour)
max_duration_seconds - Filter by maximum changeset duration in seconds (e.g., ?max_duration_seconds=60 for quick changesets)
min_bbox_area_km2 - Filter by minimum bounding box area in km² (e.g., ?min_bbox_area_km2=5.0 for large mapping projects)
max_bbox_area_km2 - Filter by maximum bounding box area in km² (e.g., ?max_bbox_area_km2=1.0 for smaller edits)
limit - Limit number of results (e.g., ?limit=100)
offset - Offset for pagination (e.g., ?offset=100)
order_by - Sort results by a field (e.g., ?order_by=created_at). Valid fields: id, created_at, closed_at, user, uid, num_changes, comments_count, open, duration_seconds, bbox_area_km2
order_direction - Sort direction: ASC or DESC (e.g., ?order_direction=DESC). Defaults to ASC

# Example: Get first 50 changesets by user 'alice' on a specific day
/changesets/day/20240115.json?user=alice&limit=50

# Example: Find long-running changesets (open > 1 hour)
/changesets/day/20240115.json?min_duration_seconds=3600

# Example: Find large mapping projects (area > 5 km²) sorted by size
/changesets/day/20240115.json?min_bbox_area_km2=5.0&order_by=bbox_area_km2&order_direction=DESC

Remember that OSM usernames are case-sensitive, so alice is probably not the same mapper as Alice.

Data Schema

Each changeset includes:

id - Changeset ID (integer)
created_at - Timestamp (ISO 8601 format)
closed_at - Timestamp (ISO 8601 format, optional - only present if changeset is closed)
open - Boolean (true if changeset is still open)
user - Username (string)
uid - User ID (integer)
num_changes - Number of edits in this changeset (integer)
comments_count - Number of discussion comments (integer)
min_lat, max_lat, min_lon, max_lon - Bounding box coordinates (decimal degrees, optional - only present if changeset has geographic extent)
duration_seconds - Time between created_at and closed_at in seconds (integer, optional - only present for closed changesets)
bbox_area_km2 - Approximate bounding box area in square kilometers, computed using the Haversine formula (float, optional - only present if all coordinates are available, 0.0 for point geometries)
tags - Key-value metadata tags (map/object). Common tags include:
- comment - Changeset description/summary
- created_by - Editor software used
- imagery_used - Aerial imagery sources
- source - Data sources

Note: In XML format, changeset comments/descriptions appear as a comment tag. In JSON/JSONL, the same information is in tags.comment.

Technical details

The system consists of three components: a converter for bulk importing planet dump files, a server that serves the API, and a worker that continuously fetches and processes minutely replication data from OpenStreetMap.

Database-less Architecture

The API has no traditional database backend. Data is stored as hive-partitioned Parquet files (year=YYYY/month=MM/*.parquet), queried directly using DuckDB's embedded SQL engine. Some benefits (not all of which we actually use today) are:

No indexes to build or caches to prime on startup. Queries run directly against columnar files
Partition pruning happens via directory structure rather than B-trees
Files can live on any storage backend DuckDB can read (S3, NFS, local SSD)
Historical data is immutable parquet files that never change

Lambda Architecture

Because Parquet files don't support CRUD operations, the system uses a lambda architecture with two layers:

Base layer: Immutable weekly snapshots from planet.osm.org (full rebuilds of historical data)
Delta layer: Streaming updates from minutely replication feeds, batched into hourly Parquet files

The query engine reads both layers simultaneously and deduplicates using the replication_sequence field (higher = newer, NULL = base). This provides up-to-the-minute freshness without rebuilding multi-GB base files.

Smart Batching

The replication worker doesn't write one file per minute — that would create millions of tiny files. Instead, it aggregates minutely updates into hourly Parquet files, flushing when:

An hour boundary is crossed (natural time partition)
120 sequences accumulate (safety limit to prevent memory bloat)
The worker shuts down (graceful flush of partial batches)

This balances write efficiency (fewer files, better compression) with query freshness (max 1 hour lag during catch-up).

Automatic Updates

The API server periodically scans for new partition files (every 30 seconds). When new Parquet files appear in the data directory, they're automatically discovered and become available for queries.

Trade-offs and quirks

Date range queries are limited to 1 day maximum to prevent out-of-memory errors. For larger ranges, paginate or use multiple requests.
ID range queries have two limits:
- Range span: Maximum 1,000,000 IDs between start and end (e.g., from ID 100 to 1000100)
- Result size: Maximum 100,000 changesets actually returned
If your range contains more than 100k changesets, use filters (user, uid, open) or pagination (limit/offset) to reduce results.
All batch queries (day/hour/minute/date range/ID range) returning more than 100,000 changesets are rejected regardless of range size. Use filters or pagination to narrow results.
Query deduplication happens at query-time (not write-time), so every query scans both base + delta layers. Simplifies writing but slows reads. Mitigated somewhat by the daily compacting strategy.
When backfilling historical data, hourly batching means new data appears with up to 1-hour granularity. This is not an issue once the data is caught up.
The API is read-only with fixed endpoints (day/hour/minute/date range/ID range). There's no SQL interface or arbitrary filtering. No spatial queries for example.
OSM's replication state files are off-by-one (state file N contains sequence N-1's timestamp). The worker normalizes this internally, but it adds some complexity.