Industry Sector¶
This page documents the data lifecycle for the Industry sector in the CANOE database, covering the sources, ETL processing logic, and the resulting data structure.
For in-depth details about the industry sector: Detailed Industry
1. Data Sources¶
The industry pipeline integrates three primary external datasets to build a comprehensive view of energy demand and technological shares across Canada.
| Source | Dataset | Purpose |
|---|---|---|
| NRCan | Comprehensive Energy Use Database (CEUD) | Provides base-year energy demand by industry and fuel type. |
| CER | Energy Futures (Macro Indicators 2023) | Used for GDP-based growth scaling of future demand. |
| StatCan | Table 25-10-0029-01 | Used to allocate Atlantic aggregate data into specific provinces (PEI, NB, NS, NL). |
[!NOTE] All external data is locally cached in the
cache/directory to ensure reproducible builds.
2. Processing & Assumptions¶
The ETL pipeline transforms raw sectoral data into a standardized SQLite format using the following logic:
Base Year & Temporal Scaling¶
- Base Year: Industrial energy demand is anchored to the NRCan CEUD base year (e.g., 2022).
- GDP Scaling: Future demand periods (specified in
params.yaml) are projected by scaling base-year values with GDP growth trajectories from the CER Global Net-zero scenario.
Atlantic Province Allocation¶
Because NRCan often aggregates Atlantic provinces, a specific split is performed: 1. Load aggregate Atlantic demand from NRCan. 2. Apply provincial shares derived from Statistics Canada Table 25-10-0029-01. 3. Distribute shares across PEI, NB, NS, and NLLAB.
Tech Input Splits (Fuel Shares)¶
- Fuel shares for industrial technologies are derived from NRCan share tables.
- Handling Missing Data: Values marked as
n.a.orX(confidential) are treated as the "remainder" to ensure total shares sum to 100%. - Normalization: If aggregate shares exceed 100% due to source rounding, they are trimmed to fit.
Efficiency¶
- All industrial technologies currently assume an
Efficiency = 1.0, effectively mapping input energy commodities directly to sectoral demand commodities (e.g.,I_ng$\rightarrow$I_d_cement).
3. Final Data Output¶
The pipeline generates a SQLite database (typically CAN_industry.sqlite) containing the following key tables:
- Demand: Time-series demand rows per region and industry sub-sector.
- Technology & Commodity: Definitions for industrial processes and fuel types.
- LimitTechInputSplitAnnual: The calculated fuel mix per technology.
- Efficiency: Theoretical energy conversion performance.
- CostInvest: Placeholder investment costs for future capacity planning.
- DataSet / DataSource: Comprehensive metadata including citations for provenance.