CANOE Industry Sector — Data Processing Documentation

Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's industrial sector.


Table of Contents

  1. Overview
  2. Pipeline Architecture
  3. Configuration and Setup
  4. External Data Fetching
  5. Demand and Capacity
  6. Technology Splits and Efficiencies
  7. Costs
  8. Known Assumptions and Limitations

1. Overview

Purpose

The CANOE industry sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian industrial sector. It pulls data primarily from Natural Resources Canada (NRCan), the Canada Energy Regulator (CER), and Statistics Canada to construct top-down representation of energy demands and fuel mixes across major sub-sectors.

What the Tool Produces

Scope

The model operates across:


2. Pipeline Architecture

Unlike building sectors, the industry processing logic utilizes a modular ETL-like architecture coordinated through a central aggregator.py script:

1. aggregator.py        → Main orchestrator
   a. setup_runtime     → Initialize database, tables, schema versions, and shared domain constants
   b. build_techcom     → Map physical commodities and generic technologies into Temoa sets
   c. data_scraper      → Load CEUD baseline (NRCan) and Macro Indicators (CER)
   d. statcan           → Load Atlantic (ATL) province disaggregation mapping
   e. demands           → Build Demand and ExistingCapacity tables, scaling future demands
   f. techinput         → Extract fuel split mixes (`LimitTechInputSplitAnnual`) from NRCan
   g. efficiency        → Seed theoretical conversion boundaries (Efficiency = 1.0)
   h. costs             → Seed generic placeholder capital costs
   i. post_processing   → Embed datasets, sources taxonomy, and IDs

3. Configuration and Setup

General Settings

Defined in input/params.yaml:

Domain Mapping

The pipeline statically maps base strings into Temoa IDs, creating naming schemes dynamically. For instance, subsectors are assigned the prefix I_ and generic demands prefixed with D_ (e.g., D_CEMENT mapped to "Cement manufacturing").


4. External Data Fetching

Data fetching utilizes robust local caching (data_cache/ or cache/) to limit repeated web requests:

  1. NRCan Comprehensive Energy Use Database (CEUD): Actively scraped via the oee.nrcan.gc.ca API. Fetches Tables 2 through 12 individually for every province (AB, BC, MB, SK, ON, QC) and the Atlantic aggregate (ATL).
  2. CER Energy Future (CEF): Downloads Macro-indicators defining expected forward trajectories of GDP parameters.
  3. StatCan Table 25-10-0029: Downloads the zip archive for specific geographic disaggregation of primary and secondary energy supply/demand characteristics.

5. Demand and Capacity

Constructed primarily through demands.py:


6. Technology Splits and Efficiencies

The industrial model utilizes a top-down approach emphasizing generic fuel consumption pathways:

Fuel Input Splits (techinput.py)

Uses LimitTechInputSplitAnnual parameters to lock the allowed fuel mix to the historical base parameters observed within NRCan's tables.

Efficiencies (efficiency.py)

Due to the structure of top-down data representing secondary/final energy outputs natively:


7. Costs

The pipeline configures generic boundary assumptions for Capital Investments (costs.py):


8. Known Assumptions and Limitations

  1. GDP-to-Demand Decoupling: Future energy demand intrinsically scales at a 1:1 ratio with the CER Global Net-Zero framework's real GDP. It lacks internal non-linear scaling recognizing heavy industrial sector transitions (e.g., decoupling energy intensity per dollar of GDP via process efficiency upgrades).
  2. Static Fuel Splits: Relying rigidly on base-year (2022) input mix fractions artificially constrains the model's capacity to switch fuels heavily across future model periods unless specifically relaxed downstream.
  3. Efficiency Staticism: Lacking explicit process-level unit modeling, generic efficiencies held at 1.0 limit representations of thermal recuperation or next-generation electrification (which routinely demonstrate COP > 1.0 or high thermal retention).
  4. Proxy Costs: Placeholder CostInvest metrics mean realistic capacity expansions in industrial domains are not aggressively bound by realistic capital burdens natively in this module.
  5. No Time-Slicing: The pipeline outputs strict annual constraints, deferring arbitrary time-partitioning. Peak load physics intrinsic to heavily industrialized grids are smoothed over the annual timeframe.
  6. StatCan Cross-Mapping Alignment: Assigning StatCan gross fuel survey ratios dynamically back against CEUD NRCan reporting schemas fundamentally depends on both government bodies interpreting firm survey data using harmonized NAIC categorizations perfectly.

CANOE Industry Sector — Data Sources Catalog

1. Data Source Summary

Data Type Primary Source Granularity Update Frequency
Historical Demand NRCan CEUD Province / Annual Annual
Demand Scaling CER CEF Macro Indicators Canada / Annual Annual/Biannual
Atlantic Disaggregation Statistics Canada Provincial / Annual Annual
Fuel Input Splits NRCan CEUD Province / Annual Annual

2. NRCan Comprehensive Energy Use Database (CEUD)

3. Canada Energy Regulator (CER) Macro-Indicators

4. Statistics Canada

5. Update Procedures (Checklist)

During regular annual or biannual CANOE updates:

  1. Clear Caches: Delete the contents of the cache/ directory to force fresh data scrapes from NRCan, CER, and StatCan endpoints.
  2. Review Config: Modify NRCan_year and subsequent projection periods arrays in input/params.yaml.
  3. CER URL Validation: Test the explicit CSV download URL for the Canada Energy Regulator and verify they haven't released a newer Energy Futures forecast matrix. Update CER_URL if they have.
  4. Execution: Execute aggregator.py and subsequently review the generated Temoa-SQLite structures ensuring no orphaned tables appear, ensuring data_id tags have safely assigned references, and demand generation succeeds against the newest dataset constraints.