CANOE Transportation Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's transportation sector.
Table of Contents
- Overview
- Pipeline Architecture
- Upstream Data Fetching
- Spreadsheet Database Layer
- The Compiler Logic
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE transportation sector differs fundamentally from the other end-use sectors. Rather than pure Python transformations directly acting on raw APIs, the transportation pipeline utilizes an intermediate, human-readable Spreadsheet Database (Excel). A Python Compiler script then sanitizes, temporal-aggregates, and translates that spreadsheet into a strict Temoa-compatible SQLite database.
What the Tool Produces
- Excel Workbooks: Populated baseline data templates representing varying regional transport paradigms.
- SQLite Databases: Extensively constrained, Temoa-ready databases mapping out fleets, efficiencies, capacities, and detailed Light-Duty Electric Vehicle (LD EV) charging curves.
Scope
The model operates across:
- Regions: Canadian provinces.
- Sectors: Transportation (Passenger Cars, Light Trucks, Heavy Freight, Transit, Rail, Marine, Aviation).
- Commodities: Gasoline, Diesel, Jet Fuel, Electricity, Hydrogen, Biofuels.
- Temporal Resolution: Vintages tracked precisely, and electric vehicle charging profiles tracked at an hourly resolution mapped to specific Demand Specific Distributions (DSDs) or Capacity Factors (CFTs).
2. Pipeline Architecture
The transportation pipeline executes in a two-stage process:
1. get_nrcan_data.py → Fetches raw base statistics and injects them into an Excel Template
2. CANOE_TRN_v4.xlsx → *Human Editable Layer* containing projected bounds, tech trees, and costs
3. compile_transport.py → The Compiler:
a. instantiate_database → Opens SQLite schema structure
b. compile_techs/comms → Loads rigid string parameters
c. compile_demand → Loads generic sectoral energy bounds
d. compile_excap → Loads historical fleet stock (Aggregated Quinquennially)
e. compile_efficiency/costs → Connects financial & thermodynamic bounds to fleet logic
f. compile_dsd/cft → Attaches external RAMP-mobility hourly logic to EV charging nodes
g. cleanup → Actively prunes "dead" or unused technologies from the database to save solver memory
3. Upstream Data Fetching
Executed via get_nrcan_data.py:
- Targets the Natural Resources Canada (NRCan) Comprehensive Energy Use Database (CEUD) specific to the transportation domain (Tables 7, 14-21, 28-37).
- Downloads provincial
.xlsfiles natively, strips formatting ("Shares", "GHG", missing rows), and reconstructs standard time-series arrays. - Template Injection: The script systematically opens a master Excel template (e.g.,
CANOE_TRN_ON_v4...xlsx), copies it for the distinct target province (e.g.,QC,MB), and injects the cleaned raw NRCan arrays directly onto theBackground Datasheet.
4. Spreadsheet Database Layer
The spreadsheet_database directory houses the .xlsx files acting as the true modeling nexus for transportation. Unlike purely algorithmic sectors, predicting the capital cost decay of hydrogen fuel-cell heavy-duty trucks necessitates human assumptions that are better governed in a spreadsheet.
Key Tabs:
- Techs & Comms: Topology flags.
- Demand: Generic bounds for vehicle kilometers traveled (VKT) or broad energy totals.
- ExCap: Historical vehicle adoption vintages extending back to 2000.
- Efficiency & Costs: Financials, fixed O&M, and operational efficiency curves for future technology options.
- Lifetime: Operational survival bounds.
- DemandDist / CapFact: Time-slicing logic switches.
5. The Compiler Logic
Executed via compile_transport.py:
Quinquennial Aggregation
- Modern Temoa models rapidly bog down if forced to evaluate every historical adoption year continuously.
- For
ExistingCapacity, the compiler triggersquinquennial_mapping(), forcing historical vehicle stocks (2000–2020) into 5-year representative integer buckets (e.g., vintages 2011, 2012, 2013, 2014, and 2015 all collapse violently into a singular2015stock block).
EV Charging Profiles (DSDs and CFTs)
- Electric vehicle grid loading cannot be ignored. The compiler reads external CSV arrays located in
charging_profiles/ramp_mobility/results/(e.g.,ON-2016TTS...csv). - It converts the hourly timestamps strict to
America/TorontoET formatting, averages them into 24-hour sets (Day/Night maps), and injects them directly intoDemandSpecificDistributionorCapacityFactorTechelements.
Automatic Cleanup & Garbage Collection
- To maintain a lean SQL matrix, the compiler executes a strict
cleanup()function after importing spreadsheet databases. - It explicitly locates and Deletes any technologies inside
Efficiency,CostVariable, andCostFixedthat lack a corresponding foundation insideExistingCapacityor whose existing capacity falls below an arbitrary noise threshold (epsilon = 1e-4).
Data Quality Indexing (DQI)
- Assigns empirical trust scores via
dq_time(). If a value is sourced from data only 3 years old, it scores a1(excellent). If the data year is >15 years obsolete, it inherently degrades to a5.
6. Known Assumptions and Limitations
- Quinquennial Distortion: Aggregating historical fleet data into 5-year blocks artificially smooths historical fleet adoption. It assumes the fleet turnover dynamics can ignore granular year-over-year deviations.
- RAMP-Mobility Fixation: Electric vehicle grid charging profiles rely strictly on static representations from RAMP-mobility models. It inherently models "dumb" charging patterns based on typical commuter behavior, rather than perfectly responsive price-arbitrage charging mechanisms.
- Time Zone Lock: RAMP profiles currently hard-force
tz_convert('America/Toronto'). Applying this strictly to models operating in BC or AB may cause sunset/charging peaks to artificially miss-align with local grid solar profiles. - Decoupled Future Projections: Because future bounds exist entirely within the Excel spreadsheets, CANOE updates do not automatically project transport dynamics. If the CER alters GDP assumptions,
canoe-transportationdoes not magically respond; a human must manually readjust the Excel macro-scalars.
CANOE Transportation Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| Historical Baselines | NRCan CEUD | Province / Annual | Annual |
| EV Charging Maps | RAMP-mobility | Hourly / Seasonal | Static / Independent |
| Financial Trajectories | AEO, GREET, Internal | Technology-Level | Manual |
| Usage Patterns | NHTS / Regional Studies | Demographic / Daily | Manual |
2. Natural Resources Canada (NRCan) CEUD
- Usage: Provides overarching historical anchors for existing vehicle populations (ExCap), initial fuel consumptions, and physical transport modalities (Rail, Aviation, Marine).
- Update Procedure: Governed by
get_nrcan_data.py. Adjust the hardcoded target years (e.g., "2021") in the URL fetcher strings.
3. RAMP-mobility Profiles
- Usage: Yields crucial 8760 simulation profiles dictating exactly when EV fleets interface with the power grid.
- Update Procedure: External. If a new transportation adoption study is conducted (e.g., shifting toward intense workplace charging instead of at-home nighttime charging), a new external RAMP model must be run, outputting a new
.csvinto thecharging_profiles/directory, andcompile_transport.pymust have itsldv_profile_nametarget string manually updated.
4. Analytical Forecasts (AEO, GREET, NHTS)
- Usage: Because the compiler accepts anything typed into the Excel spreadsheet, bounds concerning Future Battery Costs, Hydrogen Fuel-Cell Efficiencies, and Vehicle Lifetime caps are heavily influenced by the EIA AEO, Argonne National Lab's GREET model, and Household Travel Surveys.
- Update Procedure: Manual. Requires a modeler to open the specific
.xlsxdatabase, review the literature, update the cells, cite the update in the "References" column, and executecompile_transport.pyto bake those new assumptions into the.sqlitefile.
5. Update Procedures (Checklist)
During regular annual or biannual CANOE updates:
- NRCan Fetching: Run
get_nrcan_data.py. Ensure the NRCan API URLs still correctly route to the.xlsformats rather than.csvupdates. - Review Spreadsheets: Open the generated
CANOE_TRN_[REGION]...xlsxfiles. Validate that the newly injectedBackground Datacorrectly cascaded through the formulas linking toExCapandDemand. - Literature Review: Manually scrutinize EV cost trajectories and ICCT/GREET emissions estimates. Update the spreadsheet manually if reality has outpaced the older projections.
- Compile: Execute
compile_transport.py. - Review Logs: The terminal will spit out specific
cleanup()logs noting exactly which technologies were deleted. Ensure it hasn't accidentally wiped out a critical emerging technology due to a rounding threshold error.