CANOE CEF Sector — Data Processing Documentation

Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's CEF (Canada's Energy Future) generalized sector.

Overview
Pipeline Architecture
Configuration and Setup
External Data Handling
Process and Commodity Generation
Demands, Efficiencies, and Inputs
Time-Slicing (DSDs)
Known Assumptions and Limitations

1. Overview

Purpose

The CANOE CEF sector aggregation tool provides a top-down, generalized representation of Canadian energy demand directly inheriting the Canada Energy Regulator's (CER) Energy Futures scenarios. Instead of building detailed, bottom-up operational models (like residential buildings or industrial kilns), this module forces the optimizer to perfectly mirror the top-down macroscopic energy demands and fuel mixes published by the CER.

What the Tool Produces

A Temoa-schema SQLite database containing consolidated technologies, commodities, generic energy demands, specific fuel input bounds (LimitTechInputSplitAnnual), and electricity time-slicing profiles (DemandSpecificDistribution).

Scope

The model operates across:

Regions: Configurable Canadian provinces mapped from CER generic regions.
Sectors: End-use macro-sectors explicitly defined in mapping CSVs (e.g., Residential, Commercial, Industrial, Transportation).
Commodities: Any fuel/energy vectors defined by the CER output structure (Electricity, Natural Gas, Diesel, Biomass, etc.).

2. Pipeline Architecture

The processing logic is fundamentally contained within a singular orchestrator pattern rather than distinct domain scripts:

1. __main__.py          → Execution trigger
2. setup.py             → Instantiates SQLite database, parses `params.yaml`, and loads mapping CSVs
3. all_sectors.py       
   a. build_sectors     → Parses the core CEF CSV, maps domains, drops insignificant flows, and generates topology
   b. build_tester      → Establishes generic time-period bounds matching the data years
   c. build_dsd         → (Optional) Loads electricity demand-specific distributions mapping summer/winter/day/night profiles
   d. build_metadata    → Finalizes dataset labels and reference sources

3. Configuration and Setup

General Settings

Defined in input_files/params.yaml:

sqlite_schema: Points to the base SQL schema file initializing the database.
scenario: Determines which CER scenario to filter for (e.g., 'Global Net-zero').
conversion_factor & decimal_places: Global unit adjustments for standardizing physical energy inputs into the target mathematical constraints.
prop_thresh: A statistical cutoff dropping highly trivial energy streams to simplify the mathematical space for the solver.

Domain Mapping

Configured meticulously through static CSVs within input_files/:

regions.csv: Maps CER strings to CANOE 2-letter codes.
commodities.csv: Translates output fuel vectors.
sectors.csv: Translates End-Use aggregate domains into functional demand technology branches.

4. External Data Handling

CER End-Use Demand: Unlike sectors that actively scrape APIs, the CEF module expects a downloaded static flat-file snapshot inside the repository natively (e.g., end-use-demand-2023.csv).
The script utilizes pandas to radically filter this million-row document strictly focusing upon rows intersecting the configured scenario, active map keys, and relevant model_periods.

5. Process and Commodity Generation

Inside all_sectors.py -> build_sectors():

Pivoting to Proportions: Calculates the grand total energy usage for every generated technology node per year. Subsequently maps the proportional share of every fuel making up that total.
Filtering Noise: If a specific fuel commodity represents a tiny fraction of the overarching technology bounds (falling under prop_thresh), it is entirely deleted to prevent solver instability from tracking infinitesimal matrices.
Database Injection: Generates independent Technology, SectorLabel, and dual variants of Commodity blocks (one strictly for Demands, and one denoting physical Fuels).

6. Demands, Efficiencies, and Inputs

Efficiencies: The model assumes Efficiency = 1.0 universally for these macro transfers. The required fuel is functionally equated precisely to the output utility constraint representing the top-down statistical reality.
Demand Rows: Generates Demand variables defining total volumetric energy needs per province/sector/year matching the extracted CER scenario exactly.
Input Limits: Writes constraints exclusively into LimitTechInputSplitAnnual. Instead of allowing the solver freedom to optimize cheaper fuel trajectories, the proportion column rigorously enforces the exact percentage mix dictated by the CER CEF scenario pathway (utilizing the <= or le constraint).

7. Time-Slicing (DSDs)

If toggled by use_dsd, the tool executes build_dsd().

Electricity Parsing: Unlike generic annual bounds for physical fuels (NG/Diesel), electricity matrices inherently suffer without time-of-day recognition.
Mapping: The module ingests dsd_electricity.csv to partition specifically electrical demand commodities into seasonal (Summer/Winter/Intermediate) and daily (Day/Night) fractions.
Distribution: Injects rows into DemandSpecificDistribution, TimeSeason, TimeSegmentFraction, and TimeOfDay to scaffold the temporal slicing inside the SQLite outputs.

8. Known Assumptions and Limitations

Rigid Optimistic Modeling: Since LimitTechInputSplitAnnual is heavily enforced, the resulting Temoa model acting upon this dataset cannot actively optimize fuel-switching based on its own cost curves. It merely verifies and costs out the scenario already pre-calculated by the Canadian Energy Regulator.
Data Staleness: Expecting a manual, local flat-file (end-use-demand-2023.csv) means the module does not automatically upgrade its dataset when the CER releases modern revisions unless the repository manager manually downloads and overwrites it.
Efficiency Staticism (1.0): Bypasses complex realities involving technological turnover. It assumes the structural makeup modeled by the CER already resolves underlying efficiency evolutions internally.
No Disaggregation: The model inherently accepts massive macro-sectors (e.g., 'Industrial') and ignores complex sub-sector granularity necessary for studying specialized industrial clusters or housing envelopes.
DSDs are Electricity Only: Time-slicing strictly profiles electricity. Gas ramping or peak heating burdens functionally disappear into arbitrary annual averaged totals.

CANOE CEF Sector — Data Sources Catalog

1. Data Source Summary

Data Type	Primary Source	Granularity	Update Frequency
End-Use Trajectories	CER Canada's Energy Future	Prov / Macro-Sector	Periodic (1-2 Years)
Electricity Time Slices	Internal Reference CSVs	Hourly/Seasonal Proxy	Static
Mappings	Internal CSVs	Sector / Fuel Nodes	Static

2. CER Canada's Energy Future (CEF)

Usage: Provides the fundamental core logic of the entire model, laying out exactly what technologies consume what fuels, down to exact fractions, tracing out to 2050.
Update Procedure: Visit the Canada Energy Regulator's Open Data Portal. Download the newest "End-Use Demand" flat-file representing their newest Energy Futures publication. Rename it locally and ensure the script target inside all_sectors.py accurately aims at the new .csv document.

3. Internal DSD and Mapping Definitions

Usage: dsd_electricity.csv constructs proportional temporal slices describing when electricity is demanded across day/night cycles per region. sectors.csv, regions.csv, and commodities.csv ensure translation between CER English schema strings and compact CANOE tokens.
Update Procedure: Only demands auditing if the CER inherently alters their macroscopic modeling labels (e.g., splitting "Transportation" into discrete segments requiring updated dictionary keys).

4. Update Procedures (Checklist)

During regular annual or biannual CANOE updates:

Flat File Download: Manually procure the updated CER End-Use data file. Place it forcefully in the /input_files/ directory.
Script Update: Point line 23 of all_sectors.py (e.g., pd.read_csv(config.input_files + 'end-use-demand-[YEAR].csv')) exactly toward the newly localized version.
Parameter Audit: Open params.yaml. Verify the scenario configuration precisely patches a valid internal string inside the newly downloaded CER matrix (e.g., ensuring 'Global Net-zero' didn't transform into 'GNZ2050'). Ensure model_periods correctly aligns with your targeted study bounds.
Mapping Audit: If execution fails throwing KeyError mappings, verify the CER didn't rename a fuel or sector label necessitating modifications inside sectors.csv or commodities.csv.