Automating Nightly GeoJSON Rebuilds with GitHub Actions

Part of the Scheduled Map Rebuild Workflows guide.

Operative rule: commit the GeoJSON only when git diff --staged --quiet returns non-zero — committing an identical file wastes runner minutes, inflates repository history, and triggers unnecessary downstream cache invalidation.

How It Works

GitHub Actions evaluates cron expressions in UTC and queues a runner to execute your workflow at the scheduled time. The runner checks out your repository, runs your transformation script, and — if the output differs from the last committed version — pushes a new commit back via the default GITHUB_TOKEN. The entire pipeline is serverless: no persistent VM, no cron daemon, no external scheduler service.

The critical mechanism is the idempotency guard. Because raw spatial sources sometimes return identical data on consecutive nights, a naive git add && git commit would create empty-diff commits that clutter history without adding information. Checking git diff --staged before committing makes the workflow safe to run as often as needed. This pairs naturally with cache invalidation strategies — your CDN or tile cache only needs to be busted when the committed file actually changes.

The diagram below shows the full execution path from cron trigger to a dashboard consuming the committed file.

Nightly GeoJSON rebuild workflow A flowchart showing six stages: cron trigger, fetch and transform, validate GeoJSON, diff check, commit and push, CDN cache refresh. An arrow from diff check loops back to end when no change is detected. Cron trigger 02:00 UTC Fetch & transform Source API → GeoJSON Validate geojsonhint / ajv Diff check git diff --staged Commit & push GITHUB_TOKEN CDN cache refresh no change — skip

Production-Ready Workflow Configuration

Place the following YAML in .github/workflows/nightly-geojson.yml. It triggers at 02:00 UTC daily, accepts a manual dispatch for debugging, installs dependencies from a lockfile for reproducible builds, runs the transformation, validates output, and pushes only when the file has changed.

name: Nightly GeoJSON Rebuild
on:
  schedule:
    - cron: '0 2 * * *'   # 02:00 UTC every day
  workflow_dispatch:        # manual trigger for debugging
permissions:
  contents: write
jobs:
  rebuild:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          persist-credentials: true
      - uses: actions/setup-node@v4
        with:
          node-version: '22'
          cache: 'npm'
      - name: Install dependencies
        run: npm ci
      - name: Fetch & Transform
        env:
          SOURCE_API_URL: ${{ secrets.SOURCE_API_URL }}
          API_TOKEN: ${{ secrets.API_TOKEN }}
        run: node scripts/rebuild-geojson.js
      - name: Validate GeoJSON
        run: npx @mapbox/geojsonhint data/output.geojson
      - name: Commit & Push
        run: |
          git config user.name  "github-actions[bot]"
          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
          git add data/output.geojson
          git diff --staged --quiet || \
            git commit -m "chore: nightly GeoJSON rebuild [skip ci]"
          git push

Key points in this configuration:

  • persist-credentials: true keeps the default GITHUB_TOKEN write-capable for the push step.
  • npm ci installs from package-lock.json, ensuring the same @turf/turf version runs in CI as locally.
  • [skip ci] in the commit message prevents recursive workflow triggers on the resulting push.
  • Secrets are injected via env — never via run argument strings — so they are redacted in logs.

Transformation Script

The script fetches, maps, and writes the FeatureCollection. Coordinate truncation to five decimal places reduces file size by 30–50% with no perceptible impact on web map rendering, which directly benefits dashboard load times and CDN cache efficiency.

// scripts/rebuild-geojson.js
import fs from 'fs/promises';
import * as turf from '@turf/turf';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';

const __dirname = dirname(fileURLToPath(import.meta.url));
const OUT_PATH   = join(__dirname, '../data/output.geojson');

async function rebuild() {
  // 1. Fetch source data
  const res = await fetch(process.env.SOURCE_API_URL, {
    headers: { Authorization: `Bearer ${process.env.API_TOKEN}` }
  });
  if (!res.ok) throw new Error(`Source API returned ${res.status}`);
  const raw = await res.json();

  // 2. Build FeatureCollection (RFC 7946: [lng, lat] coordinate order)
  const features = raw.items.map(item => ({
    type: 'Feature',
    properties: {
      id:         item.id,
      category:   item.category,
      updated_at: new Date().toISOString()
    },
    geometry: {
      type: 'Point',
      coordinates: [item.lng, item.lat]   // longitude first
    }
  }));
  const collection = { type: 'FeatureCollection', features };

  // 3. Truncate to 5 decimal places (~1 m precision at the equator)
  const optimized = turf.truncate(collection, { precision: 5 });

  // 4. Write — fs.writeFile truncates the file before writing
  await fs.writeFile(OUT_PATH, JSON.stringify(optimized, null, 2), 'utf8');
  console.log(`Wrote ${features.length} features to ${OUT_PATH}`);
}

rebuild().catch(err => {
  console.error('GeoJSON rebuild failed:', err.message);
  process.exit(1);   // non-zero exit halts the workflow before commit
});

The catch block calls process.exit(1), which causes the workflow step to fail and prevents the commit step from running — no corrupt or empty file ever reaches the repository.

Alternative: Python Variant

Teams using GeoPandas in the same repository can replace the Node.js script with the following. It uses geopandas 0.14+ and writes output.geojson via the built-in to_file driver.

# scripts/rebuild_geojson.py
import os
import sys
import requests
import geopandas as gpd
from shapely.geometry import Point

SOURCE_URL = os.environ["SOURCE_API_URL"]
API_TOKEN  = os.environ["API_TOKEN"]
OUT_PATH   = "data/output.geojson"

def rebuild() -> None:
    resp = requests.get(
        SOURCE_URL,
        headers={"Authorization": f"Bearer {API_TOKEN}"},
        timeout=30,
    )
    resp.raise_for_status()
    items = resp.json()["items"]

    gdf = gpd.GeoDataFrame(
        [{"id": i["id"], "category": i["category"]} for i in items],
        geometry=[Point(i["lng"], i["lat"]) for i in items],
        crs="EPSG:4326",   # always store in WGS 84 for web maps
    )

    # Round coordinates to 5 decimal places before export
    gdf.geometry = gdf.geometry.apply(
        lambda geom: geom.__class__(
            *[round(c, 5) for c in geom.coords[0]]
        )
    )

    gdf.to_file(OUT_PATH, driver="GeoJSON")
    print(f"Wrote {len(gdf)} features to {OUT_PATH}")

if __name__ == "__main__":
    try:
        rebuild()
    except Exception as exc:
        print(f"Rebuild failed: {exc}", file=sys.stderr)
        sys.exit(1)

Adjust the crs parameter if your source data uses a projected coordinate system — always reproject to EPSG:4326 before writing GeoJSON destined for a web map. CRS & Projection Management covers the reprojection step in detail.

Verification Steps

Run these checks locally before relying on the scheduled run:

  1. Copy .env.example to .env and populate SOURCE_API_URL and API_TOKEN.
  2. Run npm ci && node scripts/rebuild-geojson.js (or pip install -r requirements.txt && python scripts/rebuild_geojson.py).
  3. Validate the output: npx @mapbox/geojsonhint data/output.geojson — a clean exit means no structural violations.
  4. Inspect the file: confirm type is "FeatureCollection", that features is a non-empty array, and that coordinates are [longitude, latitude] (not reversed).
  5. Stage and diff: git add data/output.geojson && git diff --staged — verify only the expected properties changed.
  6. Push to a feature branch and check the Actions tab to confirm the workflow triggers on workflow_dispatch before relying on the nightly cron.

Common Errors & Fixes

The workflow runs but no commit appears in the repository

The git diff --staged --quiet guard exited with code 0, meaning the fetched data is identical to the last committed version. This is correct behaviour. If you expected a change, add a debug step — git diff HEAD data/output.geojson — to confirm the file content before the diff check runs.

geojsonhint reports “right-hand rule violation”

RFC 7946 requires polygon exterior rings to follow the right-hand rule (counter-clockwise winding). Add turf.rewind(collection, { reverse: true }) before turf.truncate in the Node.js script, or call gdf.geometry = gdf.geometry.apply(lambda g: g) after importing shapely.ops.orient and orienting each polygon to orient(g, 1.0) in the Python variant.

The runner times out fetching from the source API

The default step timeout is 6 hours but network calls can hang indefinitely. Add a timeout-minutes: 10 key to the rebuild job. In the fetch call, pass an AbortSignal with AbortSignal.timeout(25_000) (Node.js 18+) or set timeout=30 in requests.get (Python) so the process exits cleanly rather than hanging.

Duplicate features appear after several nightly runs

The script is appending rather than overwriting. fs.appendFile and Python’s open(path, 'a') both add to existing content. Use fs.writeFile (Node.js) or open(path, 'w') (Python) — both truncate the file before writing.