CANOE Transportation Sector — Data Processing Documentation

Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's transportation sector.

Overview
Pipeline Architecture
Upstream Data Fetching
Spreadsheet Database Layer
The Compiler Logic
Known Assumptions and Limitations

1. Overview

Purpose

The CANOE transportation sector differs fundamentally from the other end-use sectors. Rather than pure Python transformations directly acting on raw APIs, the transportation pipeline utilizes an intermediate, human-readable Spreadsheet Database (Excel). A Python Compiler script then sanitizes, temporal-aggregates, and translates that spreadsheet into a strict Temoa-compatible SQLite database.

What the Tool Produces

Excel Workbooks: Populated baseline data templates representing varying regional transport paradigms.
SQLite Databases: Extensively constrained, Temoa-ready databases mapping out fleets, efficiencies, capacities, and detailed Light-Duty Electric Vehicle (LD EV) charging curves.

Scope

The model operates across:

Regions: Canadian provinces.
Sectors: Transportation (Passenger Cars, Light Trucks, Heavy Freight, Transit, Rail, Marine, Aviation).
Commodities: Gasoline, Diesel, Jet Fuel, Electricity, Hydrogen, Biofuels.
Temporal Resolution: Vintages tracked precisely, and electric vehicle charging profiles tracked at an hourly resolution mapped to specific Demand Specific Distributions (DSDs) or Capacity Factors (CFTs).

2. Pipeline Architecture

The transportation pipeline executes in a two-stage process:

1. get_nrcan_data.py    → Fetches raw base statistics and injects them into an Excel Template
2. CANOE_TRN_v4.xlsx    → *Human Editable Layer* containing projected bounds, tech trees, and costs
3. compile_transport.py → The Compiler:
   a. instantiate_database → Opens SQLite schema structure
   b. compile_techs/comms  → Loads rigid string parameters
   c. compile_demand       → Loads generic sectoral energy bounds
   d. compile_excap        → Loads historical fleet stock (Aggregated Quinquennially)
   e. compile_efficiency/costs → Connects financial & thermodynamic bounds to fleet logic
   f. compile_dsd/cft      → Attaches external RAMP-mobility hourly logic to EV charging nodes
   g. cleanup              → Actively prunes "dead" or unused technologies from the database to save solver memory

3. Upstream Data Fetching

Executed via get_nrcan_data.py:

Targets the Natural Resources Canada (NRCan) Comprehensive Energy Use Database (CEUD) specific to the transportation domain (Tables 7, 14-21, 28-37).
Downloads provincial .xls files natively, strips formatting ("Shares", "GHG", missing rows), and reconstructs standard time-series arrays.
Template Injection: The script systematically opens a master Excel template (e.g., CANOE_TRN_ON_v4...xlsx), copies it for the distinct target province (e.g., QC, MB), and injects the cleaned raw NRCan arrays directly onto the Background Data sheet.

4. Spreadsheet Database Layer

The spreadsheet_database directory houses the .xlsx files acting as the true modeling nexus for transportation. Unlike purely algorithmic sectors, predicting the capital cost decay of hydrogen fuel-cell heavy-duty trucks necessitates human assumptions that are better governed in a spreadsheet.

Key Tabs:

Techs & Comms: Topology flags.
Demand: Generic bounds for vehicle kilometers traveled (VKT) or broad energy totals.
ExCap: Historical vehicle adoption vintages extending back to 2000.
Efficiency & Costs: Financials, fixed O&M, and operational efficiency curves for future technology options.
Lifetime: Operational survival bounds.
DemandDist / CapFact: Time-slicing logic switches.

5. The Compiler Logic

Executed via compile_transport.py:

Quinquennial Aggregation

Modern Temoa models rapidly bog down if forced to evaluate every historical adoption year continuously.
For ExistingCapacity, the compiler triggers quinquennial_mapping(), forcing historical vehicle stocks (2000–2020) into 5-year representative integer buckets (e.g., vintages 2011, 2012, 2013, 2014, and 2015 all collapse violently into a singular 2015 stock block).

EV Charging Profiles (DSDs and CFTs)

Electric vehicle grid loading cannot be ignored. The compiler reads external CSV arrays located in charging_profiles/ramp_mobility/results/ (e.g., ON-2016TTS...csv).
It converts the hourly timestamps strict to America/Toronto ET formatting, averages them into 24-hour sets (Day/Night maps), and injects them directly into DemandSpecificDistribution or CapacityFactorTech elements.

Automatic Cleanup & Garbage Collection

To maintain a lean SQL matrix, the compiler executes a strict cleanup() function after importing spreadsheet databases.
It explicitly locates and Deletes any technologies inside Efficiency, CostVariable, and CostFixed that lack a corresponding foundation inside ExistingCapacity or whose existing capacity falls below an arbitrary noise threshold (epsilon = 1e-4).

Data Quality Indexing (DQI)

Assigns empirical trust scores via dq_time(). If a value is sourced from data only 3 years old, it scores a 1 (excellent). If the data year is >15 years obsolete, it inherently degrades to a 5.

6. Known Assumptions and Limitations

Quinquennial Distortion: Aggregating historical fleet data into 5-year blocks artificially smooths historical fleet adoption. It assumes the fleet turnover dynamics can ignore granular year-over-year deviations.
RAMP-Mobility Fixation: Electric vehicle grid charging profiles rely strictly on static representations from RAMP-mobility models. It inherently models "dumb" charging patterns based on typical commuter behavior, rather than perfectly responsive price-arbitrage charging mechanisms.
Time Zone Lock: RAMP profiles currently hard-force tz_convert('America/Toronto'). Applying this strictly to models operating in BC or AB may cause sunset/charging peaks to artificially miss-align with local grid solar profiles.
Decoupled Future Projections: Because future bounds exist entirely within the Excel spreadsheets, CANOE updates do not automatically project transport dynamics. If the CER alters GDP assumptions, canoe-transportation does not magically respond; a human must manually readjust the Excel macro-scalars.

CANOE Transportation Sector — Data Sources Catalog

1. Data Source Summary

Data Type	Primary Source	Granularity	Update Frequency
Historical Baselines	NRCan CEUD	Province / Annual	Annual
EV Charging Maps	RAMP-mobility	Hourly / Seasonal	Static / Independent
Financial Trajectories	AEO, GREET, Internal	Technology-Level	Manual
Usage Patterns	NHTS / Regional Studies	Demographic / Daily	Manual

2. Natural Resources Canada (NRCan) CEUD

Usage: Provides overarching historical anchors for existing vehicle populations (ExCap), initial fuel consumptions, and physical transport modalities (Rail, Aviation, Marine).
Update Procedure: Governed by get_nrcan_data.py. Adjust the hardcoded target years (e.g., "2021") in the URL fetcher strings.

3. RAMP-mobility Profiles

Usage: Yields crucial 8760 simulation profiles dictating exactly when EV fleets interface with the power grid.
Update Procedure: External. If a new transportation adoption study is conducted (e.g., shifting toward intense workplace charging instead of at-home nighttime charging), a new external RAMP model must be run, outputting a new .csv into the charging_profiles/ directory, and compile_transport.py must have its ldv_profile_name target string manually updated.

4. Analytical Forecasts (AEO, GREET, NHTS)

Usage: Because the compiler accepts anything typed into the Excel spreadsheet, bounds concerning Future Battery Costs, Hydrogen Fuel-Cell Efficiencies, and Vehicle Lifetime caps are heavily influenced by the EIA AEO, Argonne National Lab's GREET model, and Household Travel Surveys.
Update Procedure: Manual. Requires a modeler to open the specific .xlsx database, review the literature, update the cells, cite the update in the "References" column, and execute compile_transport.py to bake those new assumptions into the .sqlite file.

5. Update Procedures (Checklist)

During regular annual or biannual CANOE updates:

NRCan Fetching: Run get_nrcan_data.py. Ensure the NRCan API URLs still correctly route to the .xls formats rather than .csv updates.
Review Spreadsheets: Open the generated CANOE_TRN_[REGION]...xlsx files. Validate that the newly injected Background Data correctly cascaded through the formulas linking to ExCap and Demand.
Literature Review: Manually scrutinize EV cost trajectories and ICCT/GREET emissions estimates. Update the spreadsheet manually if reality has outpaced the older projections.
Compile: Execute compile_transport.py.
Review Logs: The terminal will spit out specific cleanup() logs noting exactly which technologies were deleted. Ensure it hasn't accidentally wiped out a critical emerging technology due to a rounding threshold error.