Skip to content

Industry Sector

This page documents the data lifecycle for the Industry sector in the CANOE database, covering the sources, ETL processing logic, and the resulting data structure.

1. Data Sources

The industry pipeline integrates three primary external datasets to build a comprehensive view of energy demand and technological shares across Canada.

Source Dataset Purpose
NRCan Comprehensive Energy Use Database (CEUD) Provides base-year energy demand by industry and fuel type.
CER Energy Futures (Macro Indicators 2023) Used for GDP-based growth scaling of future demand.
StatCan Table 25-10-0029-01 Used to allocate Atlantic aggregate data into specific provinces (PEI, NB, NS, NL).

[!NOTE] All external data is locally cached in the cache/ directory to ensure reproducible builds.

2. Processing & Assumptions

The ETL pipeline transforms raw sectoral data into a standardized SQLite format using the following logic:

Base Year & Temporal Scaling

  • Base Year: Industrial energy demand is anchored to the NRCan CEUD base year (e.g., 2022).
  • GDP Scaling: Future demand periods (specified in params.yaml) are projected by scaling base-year values with GDP growth trajectories from the CER Global Net-zero scenario.

Atlantic Province Allocation

Because NRCan often aggregates Atlantic provinces, a specific split is performed: 1. Load aggregate Atlantic demand from NRCan. 2. Apply provincial shares derived from Statistics Canada Table 25-10-0029-01. 3. Distribute shares across PEI, NB, NS, and NLLAB.

Tech Input Splits (Fuel Shares)

  • Fuel shares for industrial technologies are derived from NRCan share tables.
  • Handling Missing Data: Values marked as n.a. or X (confidential) are treated as the "remainder" to ensure total shares sum to 100%.
  • Normalization: If aggregate shares exceed 100% due to source rounding, they are trimmed to fit.

Efficiency

  • All industrial technologies currently assume an Efficiency = 1.0, effectively mapping input energy commodities directly to sectoral demand commodities (e.g., I_ng $\rightarrow$ I_d_cement).

3. Final Data Output

The pipeline generates a SQLite database (typically CAN_industry.sqlite) containing the following key tables:

  • Demand: Time-series demand rows per region and industry sub-sector.
  • Technology & Commodity: Definitions for industrial processes and fuel types.
  • LimitTechInputSplitAnnual: The calculated fuel mix per technology.
  • Efficiency: Theoretical energy conversion performance.
  • CostInvest: Placeholder investment costs for future capacity planning.
  • DataSet / DataSource: Comprehensive metadata including citations for provenance.