FIELD REPORT 2026 · §6

Dataset of record

How the daily NDJSON archives are produced, what's in them, and how to consume them

IN DEVELOPMENT FILED · 2026-05-07 TAGS · DATASET · ARCHIVE · NDJSON · METHODOLOGY
ABSTRACT
ON FILE DRAFT · 2026-05-07

The methodology dataset and the public dataset are the same dataset. This chapter describes the schema, the daily archival cadence, the gap-honest treatment per ADR-010, and how a downstream researcher can reproduce any analysis from the archive bucket alone.

§6.1 · The bucket

plantir.garden/data/
├── latest.json                       # 5-min snapshot — current state
├── recent-24h.json                   # rolling window — last 24h
└── archive/
    ├── index.json                    # listing — keys + sizes + sha256
    ├── 2026-04-23.ndjson.gz          # day archive — immutable once written
    ├── 2026-04-24.ndjson.gz
    └── ...

latest.json and recent-24h.json are full-replace every 5 minutes. archive/<YYYY-MM-DD>.ndjson.gz is finalised once after that day’s UTC midnight + 30 min grace, then never rewritten.

DRAFT — exporter implementation lives at PLN-003; deployment gated on LIVE- prod cutover (Phase 5.9). Schema below is what will land.*

§6.2 · Per-row schema (NDJSON)

One JSON object per line. Compressed with gzip (Content-Encoding: gzip, served as-gzipped from S3 via Cloudflare).

{"recorded_at":"2026-04-27T00:00:12Z","node_id":"esp32-greenhouse","temp_c":21.4,"humidity_pct":51.2,"moisture_pct":40.8}

Fields:

FieldTypeNotes
recorded_atISO-8601 UTCAurora server-side timestamp; canonical.
node_idstringStable across renames via stable_id map.
temp_cnumber | nullBME280 reading. NULL means sensor absent.
humidity_pctnumber | nullBME280 reading. NULL means sensor absent.
moisture_pctnumber | nullCapacitive ADC. NULL means sensor absent.

raw::jsonb is intentionally excluded per ADR-010 § Privacy. If a future audit needs raw, that’s a separate authenticated tier — out of v1 scope.

DRAFT — air-quality columns (lux, co2_ppm, pressure_hpa, pm25) proposal: nest under a separate air object. Decision pending.

§6.3 · Gap treatment

The dataset is gap-honest. A node that has been silent past its threshold produces no rows in the archive for that period. The absence is the data point.

Programmatic detection: an analysis script that joins by recorded_at will see the gap as missing rows, not as zeros and not as forward-filled last-known-good values. Don’t forward-fill — the gap is the signal.

INC-001 (2026-04-23 four-day gap) is in the archive as a gap.

§6.4 · Reproducibility

Any analysis published alongside the thesis should be reproducible from the public bucket alone, with no Aurora access. Specifically:

DRAFT — example analysis notebook + a README in the bucket pending by 2026-08.

§6.5 · License

Public Domain (CC0) for the data. Code (analysis, exporters) is whatever the source repo declares — see the GitHub link in /about.

DRAFT — confirm CC0 with the supervisor before defence.


Status: structural draft v0.1, 2026-05-07. Citation: https://plantir.garden/thesis/2026/dataset is locked per ADR-011. Related ADRs: docs/adr/010-public-sensor-data-policy.md, docs/adr/011-thesis-url-schema.md.

RELATED DECISIONS
NOMINAL DISCLOSURE SCHEDULE
  • ADR-010 — see docs/adr/ in source repo
  • ADR-011 — see docs/adr/ in source repo