Skip to content

Turning 25GB of Government Data into a 3D London Map: Our Process

Diya
Diya

The file exists. It is public. It is free to download. It contains energy performance data for millions of homes across the United Kingdom, assessed and recorded by certified surveyors, updated continuously, and published by the government for anyone to use.

Reviewed by:
Diya - Full Stack Developer - CesiumJS @ Yaksha Visual Technologies Private Limited


It is also 25 gigabytes. It has no coordinates. It cannot be opened in Excel. And by itself, it tells you almost nothing you can act on.

That gap — between data that technically exists and insight that actually works — is the problem we build for. This is the story of how we took the UK's EPC national register, joined it to a second dataset of 2.1 million property coordinates, stripped it down to the 16 streets of Marylebone, grouped 3,233 individual flat certificates into 65 buildings, and delivered the result as a real-time interactive 3D map — served from a single HTML file, with no backend, no database, and no framework. Just open data, made usable.

Use Case — PropTech & Geospatial Intelligence

From 25 gigabytes of raw data
to a navigable 3D city

How we transformed the UK government's entire EPC dataset into a real-time interactive building map of Marylebone — no backend server, no framework, no approximations.

Cesium.js 3D WebGL Node.js Streaming OS OpenUPRN UK EPC Open Data Energy Performance PropTech Zero-server Deploy
cesium map of uk city
Live application 107 Baker Street, W1U 6RP — Avg: E (44) · Worst: G · 15 flats · Cesium.js 3D WebGL
27GB
Raw government data processed with no database
3,233
Individual flats grouped and aggregated by building
301KB
Final JSON payload served to the browser

The challenge

Unusable by default. Invaluable if you know what to do.

The UK's Energy Performance Certificate register is one of the most complete public records of domestic energy efficiency in the world. It covers millions of properties. It is also, practically speaking, unusable out of the box — a 25 GB flat CSV file with no geographic coordinates, no building-level groupings, and no front end.

The brief was precise: target the Marylebone neighbourhood, and build something a property analyst, sustainability consultant, or planning officer could actually use — without standing up a server or paying for a database.

"The raw data exists. The question is whether you can make it speak. We needed every building in the street to tell its own energy story at a glance."

Beyond the sheer file size, there was a second obstacle: EPC records contain addresses and a UPRN (Unique Property Reference Number) — but no coordinates. Mapping the data required joining it to a second enormous file, the OS OpenUPRN dataset (~2.1 GB), which maps every property reference in Great Britain to a latitude/longitude pair. Two enormous files. One coherent map.


The solution

A four-phase streaming pipeline — no RAM overload, no database.

We designed a Node.js pipeline that processes both source files in a single streaming pass each, keeping peak memory consistently under 1 GB throughout the entire extraction run on a standard laptop.

Processing pipeline — from 25 GB to interactive JSON
1
Stream & filter the 25 GB EPC file
Row-by-row streaming of the national CSV. Each record is tested against 16 Marylebone streets and two local authority codes (Westminster and Camden). ~482,000 certificates distilled to a few thousand targeted matches — nothing is loaded into memory wholesale.
~482,000 certificates → ~3,400 candidates
2
Resolve coordinates from OS OpenUPRN (2.1 GB)
Phase 1 reveals which UPRNs we actually need. The coordinate file is then streamed once: only matching UPRNs are retained. This two-pass approach avoids loading 2.1 million coordinate pairs into RAM — we store only the few hundred we need.
~2.1 GB streamed, hundreds of lat/lons kept
3
Group flats by building with regex name extraction
EPC certificates are issued per flat, not per building. "Flat 1, Chalfont Court" and "Flat 15, Chalfont Court" are extracted, normalised, and merged into a single building entity. Regex patterns handle named blocks, numbered conversions, and street-addressed terraces uniformly.
3,233 individual flats → 65 distinct buildings
4
Aggregate per-building statistics
For each building: flat count, average EPC band (A–G), SAP efficiency score, worst individual rating, full A–G histogram, most recent inspection date, and representative flat details — heating type, floor area, estimated annual energy costs, CO₂ emissions.
Output: 301 KB JSON — the only file the browser needs

The visualisation

A real 3D globe. Real terrain. Real buildings. Real data.

The output JSON powers a fully static front end — a single HTML file using Cesium.js 1.137 loaded from CDN. Cesium renders the scene as a full WebGL globe with real terrain elevation data and OpenStreetMap-sourced building geometry extruded into three dimensions. Our data layer sits on top.

Each of the 65 buildings receives a colour-coded pin placed at its exact coordinates. Clicking any pin or extruded building slides in a panel showing the full breakdown — rating histogram, worst rating, floor area, heating system, estimated CO₂, and annual energy costs.

Live application — click any building for its full EPC breakdown
interactive 3d visualisation
C Gem House, 76 Chiltern Street 16 flats · Avg C (75) · Worst C · Inspected Apr 2025
vs
webgl
E 107 Baker Street, W1U 6RP 15 flats · Avg E (44) · Worst G · Inspected Oct 2025 — retrofit candidate

The contrast above captures exactly why the tool matters. Two buildings 300 metres apart on the same street grid: one consistently C-rated, one averaging E with individual flats reaching G. A planning officer or retrofit consultant can spot that difference in under three seconds — no spreadsheet, no commissioned report.

EPC rating distribution — Marylebone sample (65 buildings)
A
 
Best — few
B
 
~18%
C
 
~42% majority
D
 
~22%
E
 
~9%
F
 
Watch list

Technology stack

Four tools. No framework. No server.

🌐
Cesium.js 1.137
WebGL globe, real terrain, extruded OSM buildings, custom pin entities and click handlers
⚙️
Node.js Streams
Line-by-line CSV processing — 27 GB handled in under 1 GB RAM peak, no temp files
📂
UK Government Open Data
EPC national register + OS OpenUPRN coordinate file — both fully public domain
📄
Zero-server Deploy
Static HTML + 301 KB JSON — serves from any CDN with no backend, no database costs

What this unlocks

The same map. Four completely different user stories.

Planning departments can instantly identify streets with clusters of F and G-rated buildings and prioritise them for retrofit grant schemes — work that previously meant commissioning a bespoke data analysis.
Property investors can compare a target building's average EPC band against its immediate neighbours before any due diligence spend, surfacing energy upgrade liability at a glance.
Sustainability consultants can pull estimated annual CO₂ and energy cost figures per building directly from the 3D model and incorporate them into client proposals without separate data requests.
Journalists and researchers get a shareable, linkable tool — no login, no app installation, no account. A URL is the only distribution mechanism needed.

Architecture insight

The pre-computation boundary — the design decision that makes this work.

All the heavy lifting — streaming, joining, deduplicating, aggregating — happens once, offline, in Node.js. The browser receives only the distilled result. This pattern means the interactive experience is instant and deployment is trivially cheap: a static file host and a CDN is sufficient infrastructure for an application that summarises 27 GB of government data.

The same pattern scales without modification. Swap Marylebone for any UK postcode district. Swap EPC bands for planning permissions, flood risk scores, broadband speeds, or census data. The pipeline architecture remains identical — only the filter criteria and the aggregation logic change.

Ready to build?

Your dataset. Your neighbourhood. Your insight.

We build custom geospatial data pipelines and Cesium.js 3D visualisations for property, infrastructure, and public sector clients. Tell us your data problem and the audience who needs to understand it.

Share this post