Key Projects & Open-Source Tools
VLM-Powered Engineering Drawing Data Extraction
Problem: Legacy engineering drawings contained vast amounts of unstructured data that was manually extracted—a time-intensive process prone to human error. A single client projected 7,000 hours of manual effort to process their drawing backlog.
Approach: I engineered a solution utilising open-weight Vision Language Models (llama.cpp, Python) to automatically parse and extract structured data from unstructured engineering drawings. The system was trialled within the water domain, achieving 90%+ accuracy on complex drawing interpretation tasks.
Outcome: The solution identified projected savings of 7,000 hours for a single client, demonstrating the transformative potential of VLMs for legacy document processing in the engineering sector.
Standards Ontology & MCP Server for Clause Retrieval
Problem: Construction and infrastructure professionals must navigate multiple overlapping standards (ISO 19650, ISO 55000, etc.), making compliance complex and time-consuming. Understanding semantic relationships between standards requires deep domain expertise.
Approach: I developed an Information Management standards ontology harnessing local LLM embedding models to evaluate semantic similarity between ISO 19650 and other standards heavily adopted within the construction and adjacent industries. Actively constructing an MCP server to enhance clause retrieval and domain accuracy.
Outcome: The ontology provides a structured understanding of how standards interrelate, while the MCP server will enable intelligent clause retrieval—reducing compliance overhead and improving accuracy for Information Management professionals.
Geospatial Data Automation for Digital Twins
Problem: Spatial data transformation workflows for digital twin products were cumbersome and manual, requiring significant effort to structure, validate, and prepare complex geographic data for downstream consumption.
Approach: I streamlined these workflows using Python, building automated pipelines that efficiently structure, validate, and prepare complex geographic data. The solution integrates with the internal digital twin product team's existing systems.
Outcome: Dramatically reduced manual effort in spatial data preparation, enabling the digital twin team to focus on product development rather than data wrangling.
National Three Waters Asset Data Standard (3WADS) & Open-Source Toolkit
Problem: The New Zealand water sector was hampered by inconsistent data practices across different councils. This created significant friction for data sharing and increased costs ahead of major national reforms that would require data amalgamation.
Approach: I led a cross-sector working group to define the Three Waters Asset Data Standard (3WADS), establishing a common language for asset data. To make this standard easy to adopt, I developed and open-sourced a Python CLI toolkit that automates data validation, mapping, and the generation of schema connectors (XSD, SQL).
Outcome: The standard and toolkit reduced manual data setup by over 90%, providing a clear, low-cost pathway for data standardisation. This work uplifted the local industry and is improving collaboration, reducing integration costs, and building a more resilient data ecosystem for the entire sector.
Enterprise Water Asset Data Pipeline
Problem: A critical legacy asset management system had a fragile, 24+ hour data export process. This delay made timely reporting impossible and blocked the use of 3 million+ asset records for modern analytics, while other business processes still depended on the old data format.
Approach: I architected and built a production-grade ETL pipeline from the ground up. Using Python (with Polars for high-speed parallel processing) and AWS (S3, Redshift), the pipeline transforms millions of asset records and produces modern analytics-ready outputs for the data warehouse.
Outcome: The new pipeline accelerated a critical legacy data export workflow by over 95% (to just 30 minutes). This unlocked near-real-time operational analytics for the first time and provided a stable, modern data platform for the future.
Nationally Renowned Digital Engineering Programme
Problem: A £609M national transport project needed a robust framework to manage the quality and consistency of digital information from dozens of suppliers. Without it, the project risked receiving poor quality data that would create significant issues during operations and maintenance.
Approach: As a key member of the Digital Engineering team, I co-authored the project's strategic information management framework based on ISO 19650. I was responsible for embedding these requirements into contracts, governing the Common Data Environment (CDE), and developing scripts to audit supplier data submissions for compliance.
Outcome: The framework established a clear "single source of truth" for project information, ensuring data integrity from the supply chain. This proactive assurance work significantly reduced data integration risks and will maximize the long-term value of the project's digital assets.
Published Water-Sector Asset Information Requirements
Problem: Capital projects were often delivered without clear requirements for the final asset data handover. This resulted in inconsistent, poor-quality data that required costly remediation before it could be used by operational and asset management teams.
Approach: I authored and published the organisation's official Asset Information Requirements (AIR) in accordance with the ISO 19650 standard. This document clearly defines the data formats, structures, and quality standards for all new assets delivered through the capital programme.
Outcome: The published AIR is now a contractual requirement for all new projects. It ensures high-quality, structured data is captured from project inception, significantly reducing future data cleaning efforts and providing reliable data for better decision-making across the asset lifecycle.