CANOE Industry Sector — Data Processing Documentation

Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's industrial sector.

Overview
Pipeline Architecture
Configuration and Setup
External Data Fetching
Demand and Capacity
Technology Splits and Efficiencies
Costs
Known Assumptions and Limitations

1. Overview

Purpose

The CANOE industry sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian industrial sector. It pulls data primarily from Natural Resources Canada (NRCan), the Canada Energy Regulator (CER), and Statistics Canada to construct top-down representation of energy demands and fuel mixes across major sub-sectors.

What the Tool Produces

A Temoa-schema SQLite database containing technology attributes, costs, capacities, energy demands, and fuel mixes utilized by the industrial sector.

Scope

The model operates across:

Regions: Canadian provinces.
Subsectors: Construction, Pulp and paper, Smelting, Petroleum refining, Cement, Chemicals, Iron and Steel, Other manufacturing, Forestry, and Mining/Oil & Gas extraction.
Commodities: Electricity, Natural Gas, Diesel, Heavy Fuel Oil, Petroleum Coke, Natural Gas Liquids, Coal, Coke, Wood, and Other.
Time resolution: Annual bounds (no hourly Demand Specific Distributions are mapped for the industry sector in this tool).
Planning horizon: Configurable multi-year stepping (default: starting in 2025).

2. Pipeline Architecture

Unlike building sectors, the industry processing logic utilizes a modular ETL-like architecture coordinated through a central aggregator.py script:

1. aggregator.py        → Main orchestrator
   a. setup_runtime     → Initialize database, tables, schema versions, and shared domain constants
   b. build_techcom     → Map physical commodities and generic technologies into Temoa sets
   c. data_scraper      → Load CEUD baseline (NRCan) and Macro Indicators (CER)
   d. statcan           → Load Atlantic (ATL) province disaggregation mapping
   e. demands           → Build Demand and ExistingCapacity tables, scaling future demands
   f. techinput         → Extract fuel split mixes (`LimitTechInputSplitAnnual`) from NRCan
   g. efficiency        → Seed theoretical conversion boundaries (Efficiency = 1.0)
   h. costs             → Seed generic placeholder capital costs
   i. post_processing   → Embed datasets, sources taxonomy, and IDs

3. Configuration and Setup

General Settings

Defined in input/params.yaml:

schema_version: Ensures strict alignment to standard Temoa schema (version 3.1).
periods: List of modeled future years (e.g., [2025, ...]).
NRCan_year: The base historical year calibration point (default: 2022).

Domain Mapping

The pipeline statically maps base strings into Temoa IDs, creating naming schemes dynamically. For instance, subsectors are assigned the prefix I_ and generic demands prefixed with D_ (e.g., D_CEMENT mapped to "Cement manufacturing").

4. External Data Fetching

Data fetching utilizes robust local caching (data_cache/ or cache/) to limit repeated web requests:

NRCan Comprehensive Energy Use Database (CEUD): Actively scraped via the oee.nrcan.gc.ca API. Fetches Tables 2 through 12 individually for every province (AB, BC, MB, SK, ON, QC) and the Atlantic aggregate (ATL).
CER Energy Future (CEF): Downloads Macro-indicators defining expected forward trajectories of GDP parameters.
StatCan Table 25-10-0029: Downloads the zip archive for specific geographic disaggregation of primary and secondary energy supply/demand characteristics.

5. Demand and Capacity

Constructed primarily through demands.py:

Base Year (2022) Alignment: Takes provincial industrial energy consumption directly from NRCan CEUD.
Atlantic Disaggregation: Since NRCan often aggregates Atlantic provinces into 'ATL', the script intersects data with StatCan table 25-10-0029. Total physical supply ratios are extracted by subsector to break exactly what fraction of "ATL" belongs to NB, NS, PEI, and NL.
Future Demand Projection: Demand in future periods dynamically scales relative to the base year using Real Gross Domestic Product ($2012 Millions) growth factors extracted from the CER CEF 'Global Net-zero' scenario.
Capacity: Existing capacity aligns with the preceding historical year data values from NRCan.

6. Technology Splits and Efficiencies

The industrial model utilizes a top-down approach emphasizing generic fuel consumption pathways:

Fuel Input Splits (`techinput.py`)

Uses LimitTechInputSplitAnnual parameters to lock the allowed fuel mix to the historical base parameters observed within NRCan's tables.

Pulls ratios of Electricity, Natural Gas, Diesel, Wood, etc.
Any values mapped as missing or excluded (X or n.a.) evaluate to 'na', and their remainder block up to 100% is explicitly equally distributed against the other unknown physical vectors to create closure inside the optimizer.

Efficiencies (`efficiency.py`)

Due to the structure of top-down data representing secondary/final energy outputs natively:

Efficiencies are configured structurally as exactly 1.0. The tool assumes the output utility corresponds one-to-one mechanically with the input fuel vector inside the generic bounds mapped.

7. Costs

The pipeline configures generic boundary assumptions for Capital Investments (costs.py):

Due to data availability limitations resolving specific industrial equipment (kilns, blast furnaces, crackers) bounds, CostInvest defaults temporarily to 0.1 M$/PJ across arbitrary initial conditions.

8. Known Assumptions and Limitations

GDP-to-Demand Decoupling: Future energy demand intrinsically scales at a 1:1 ratio with the CER Global Net-Zero framework's real GDP. It lacks internal non-linear scaling recognizing heavy industrial sector transitions (e.g., decoupling energy intensity per dollar of GDP via process efficiency upgrades).
Static Fuel Splits: Relying rigidly on base-year (2022) input mix fractions artificially constrains the model's capacity to switch fuels heavily across future model periods unless specifically relaxed downstream.
Efficiency Staticism: Lacking explicit process-level unit modeling, generic efficiencies held at 1.0 limit representations of thermal recuperation or next-generation electrification (which routinely demonstrate COP > 1.0 or high thermal retention).
Proxy Costs: Placeholder CostInvest metrics mean realistic capacity expansions in industrial domains are not aggressively bound by realistic capital burdens natively in this module.
No Time-Slicing: The pipeline outputs strict annual constraints, deferring arbitrary time-partitioning. Peak load physics intrinsic to heavily industrialized grids are smoothed over the annual timeframe.
StatCan Cross-Mapping Alignment: Assigning StatCan gross fuel survey ratios dynamically back against CEUD NRCan reporting schemas fundamentally depends on both government bodies interpreting firm survey data using harmonized NAIC categorizations perfectly.

CANOE Industry Sector — Data Sources Catalog

1. Data Source Summary

Data Type	Primary Source	Granularity	Update Frequency
Historical Demand	NRCan CEUD	Province / Annual	Annual
Demand Scaling	CER CEF Macro Indicators	Canada / Annual	Annual/Biannual
Atlantic Disaggregation	Statistics Canada	Provincial / Annual	Annual
Fuel Input Splits	NRCan CEUD	Province / Annual	Annual

2. NRCan Comprehensive Energy Use Database (CEUD)

Usage: Provides the anchor framework for historical base year energy demands, fuel utilization combinations, and sectorial breakdowns.
Update Procedure: The base year is configured via NRCan_year in input/params.yaml. Ensure the URL query string inside data_scraper.py successfully returns HTML mapping tables when updated format changes occur.

3. Canada Energy Regulator (CER) Macro-Indicators

Usage: Extrapolates future demand projections using the 'Global Net-Zero' Real GDP scenario paths.
Update Procedure: If the CER produces a new Energy Futures release (e.g., shifting from EF2023 to EF2024), the CER_URL variable globally inside data_scraper.py must point to the new direct CSV link. Column indexing must be double-checked against the resulting header schema.

4. Statistics Canada

Usage: Pulls specific Atlantic region breakdowns matching Table 25-10-0029 to compensate for NRCan regional aggregation.
Update Procedure: The script relies functionally upon the structure inside Zip archive downloads. Modifications are rarely needed unless StatCan universally changes its API schema or table identifier.

5. Update Procedures (Checklist)

During regular annual or biannual CANOE updates:

Clear Caches: Delete the contents of the cache/ directory to force fresh data scrapes from NRCan, CER, and StatCan endpoints.
Review Config: Modify NRCan_year and subsequent projection periods arrays in input/params.yaml.
CER URL Validation: Test the explicit CSV download URL for the Canada Energy Regulator and verify they haven't released a newer Energy Futures forecast matrix. Update CER_URL if they have.
Execution: Execute aggregator.py and subsequently review the generated Temoa-SQLite structures ensuring no orphaned tables appear, ensuring data_id tags have safely assigned references, and demand generation succeeds against the newest dataset constraints.