CANOE Industry Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's industrial sector.
Table of Contents
- Overview
- Pipeline Architecture
- Configuration and Setup
- External Data Fetching
- Demand and Capacity
- Technology Splits and Efficiencies
- Costs
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE industry sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian industrial sector. It pulls data primarily from Natural Resources Canada (NRCan), the Canada Energy Regulator (CER), and Statistics Canada to construct top-down representation of energy demands and fuel mixes across major sub-sectors.
What the Tool Produces
- A Temoa-schema SQLite database containing technology attributes, costs, capacities, energy demands, and fuel mixes utilized by the industrial sector.
Scope
The model operates across:
- Regions: Canadian provinces.
- Subsectors: Construction, Pulp and paper, Smelting, Petroleum refining, Cement, Chemicals, Iron and Steel, Other manufacturing, Forestry, and Mining/Oil & Gas extraction.
- Commodities: Electricity, Natural Gas, Diesel, Heavy Fuel Oil, Petroleum Coke, Natural Gas Liquids, Coal, Coke, Wood, and Other.
- Time resolution: Annual bounds (no hourly Demand Specific Distributions are mapped for the industry sector in this tool).
- Planning horizon: Configurable multi-year stepping (default: starting in 2025).
2. Pipeline Architecture
Unlike building sectors, the industry processing logic utilizes a modular ETL-like architecture coordinated
through a central aggregator.py script:
1. aggregator.py → Main orchestrator
a. setup_runtime → Initialize database, tables, schema versions, and shared domain constants
b. build_techcom → Map physical commodities and generic technologies into Temoa sets
c. data_scraper → Load CEUD baseline (NRCan) and Macro Indicators (CER)
d. statcan → Load Atlantic (ATL) province disaggregation mapping
e. demands → Build Demand and ExistingCapacity tables, scaling future demands
f. techinput → Extract fuel split mixes (`LimitTechInputSplitAnnual`) from NRCan
g. efficiency → Seed theoretical conversion boundaries (Efficiency = 1.0)
h. costs → Seed generic placeholder capital costs
i. post_processing → Embed datasets, sources taxonomy, and IDs
3. Configuration and Setup
General Settings
Defined in input/params.yaml:
schema_version: Ensures strict alignment to standard Temoa schema (version 3.1).periods: List of modeled future years (e.g.,[2025, ...]).NRCan_year: The base historical year calibration point (default: 2022).
Domain Mapping
The pipeline statically maps base strings into Temoa IDs, creating naming schemes dynamically. For instance,
subsectors are assigned the prefix I_ and generic demands prefixed with D_ (e.g.,
D_CEMENT mapped to "Cement manufacturing").
4. External Data Fetching
Data fetching utilizes robust local caching (data_cache/ or cache/) to limit
repeated web requests:
- NRCan Comprehensive Energy Use Database (CEUD): Actively scraped via the
oee.nrcan.gc.caAPI. Fetches Tables 2 through 12 individually for every province (AB, BC, MB, SK, ON, QC) and the Atlantic aggregate (ATL). - CER Energy Future (CEF): Downloads Macro-indicators defining expected forward trajectories of GDP parameters.
- StatCan Table 25-10-0029: Downloads the zip archive for specific geographic disaggregation of primary and secondary energy supply/demand characteristics.
5. Demand and Capacity
Constructed primarily through demands.py:
- Base Year (2022) Alignment: Takes provincial industrial energy consumption directly from NRCan CEUD.
- Atlantic Disaggregation: Since NRCan often aggregates Atlantic provinces into 'ATL', the script intersects data with StatCan table 25-10-0029. Total physical supply ratios are extracted by subsector to break exactly what fraction of "ATL" belongs to NB, NS, PEI, and NL.
- Future Demand Projection: Demand in future periods dynamically scales relative to the base year using Real Gross Domestic Product ($2012 Millions) growth factors extracted from the CER CEF 'Global Net-zero' scenario.
- Capacity: Existing capacity aligns with the preceding historical year data values from NRCan.
6. Technology Splits and Efficiencies
The industrial model utilizes a top-down approach emphasizing generic fuel consumption pathways:
Fuel Input Splits (techinput.py)
Uses LimitTechInputSplitAnnual parameters to lock the allowed fuel mix to the historical base
parameters observed within NRCan's tables.
- Pulls ratios of Electricity, Natural Gas, Diesel, Wood, etc.
- Any values mapped as missing or excluded (
Xorn.a.) evaluate to 'na', and their remainder block up to 100% is explicitly equally distributed against the other unknown physical vectors to create closure inside the optimizer.
Efficiencies (efficiency.py)
Due to the structure of top-down data representing secondary/final energy outputs natively:
- Efficiencies are configured structurally as exactly 1.0. The tool assumes the output utility corresponds one-to-one mechanically with the input fuel vector inside the generic bounds mapped.
7. Costs
The pipeline configures generic boundary assumptions for Capital Investments (costs.py):
- Due to data availability limitations resolving specific industrial equipment (kilns, blast furnaces,
crackers) bounds,
CostInvestdefaults temporarily to 0.1 M$/PJ across arbitrary initial conditions.
8. Known Assumptions and Limitations
- GDP-to-Demand Decoupling: Future energy demand intrinsically scales at a 1:1 ratio with the CER Global Net-Zero framework's real GDP. It lacks internal non-linear scaling recognizing heavy industrial sector transitions (e.g., decoupling energy intensity per dollar of GDP via process efficiency upgrades).
- Static Fuel Splits: Relying rigidly on base-year (2022) input mix fractions artificially constrains the model's capacity to switch fuels heavily across future model periods unless specifically relaxed downstream.
- Efficiency Staticism: Lacking explicit process-level unit modeling, generic efficiencies held at 1.0 limit representations of thermal recuperation or next-generation electrification (which routinely demonstrate COP > 1.0 or high thermal retention).
- Proxy Costs: Placeholder
CostInvestmetrics mean realistic capacity expansions in industrial domains are not aggressively bound by realistic capital burdens natively in this module. - No Time-Slicing: The pipeline outputs strict annual constraints, deferring arbitrary time-partitioning. Peak load physics intrinsic to heavily industrialized grids are smoothed over the annual timeframe.
- StatCan Cross-Mapping Alignment: Assigning StatCan gross fuel survey ratios dynamically back against CEUD NRCan reporting schemas fundamentally depends on both government bodies interpreting firm survey data using harmonized NAIC categorizations perfectly.
CANOE Industry Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| Historical Demand | NRCan CEUD | Province / Annual | Annual |
| Demand Scaling | CER CEF Macro Indicators | Canada / Annual | Annual/Biannual |
| Atlantic Disaggregation | Statistics Canada | Provincial / Annual | Annual |
| Fuel Input Splits | NRCan CEUD | Province / Annual | Annual |
2. NRCan Comprehensive Energy Use Database (CEUD)
- Usage: Provides the anchor framework for historical base year energy demands, fuel utilization combinations, and sectorial breakdowns.
- Update Procedure: The base year is configured via
NRCan_yearininput/params.yaml. Ensure the URL query string insidedata_scraper.pysuccessfully returns HTML mapping tables when updated format changes occur.
3. Canada Energy Regulator (CER) Macro-Indicators
- Usage: Extrapolates future demand projections using the 'Global Net-Zero' Real GDP scenario paths.
- Update Procedure: If the CER produces a new Energy Futures release (e.g., shifting from
EF2023 to EF2024), the
CER_URLvariable globally insidedata_scraper.pymust point to the new direct CSV link. Column indexing must be double-checked against the resulting header schema.
4. Statistics Canada
- Usage: Pulls specific Atlantic region breakdowns matching Table 25-10-0029 to compensate for NRCan regional aggregation.
- Update Procedure: The script relies functionally upon the structure inside Zip archive downloads. Modifications are rarely needed unless StatCan universally changes its API schema or table identifier.
5. Update Procedures (Checklist)
During regular annual or biannual CANOE updates:
- Clear Caches: Delete the contents of the
cache/directory to force fresh data scrapes from NRCan, CER, and StatCan endpoints. - Review Config: Modify
NRCan_yearand subsequent projectionperiodsarrays ininput/params.yaml. - CER URL Validation: Test the explicit CSV download URL for the Canada Energy Regulator
and verify they haven't released a newer Energy Futures forecast matrix. Update
CER_URLif they have. - Execution: Execute
aggregator.pyand subsequently review the generated Temoa-SQLite structures ensuring no orphaned tables appear, ensuringdata_idtags have safely assigned references, and demand generation succeeds against the newest dataset constraints.