CANOE CEF Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's CEF (Canada's Energy Future) generalized sector.
Table of Contents
- Overview
- Pipeline Architecture
- Configuration and Setup
- External Data Handling
- Process and Commodity Generation
- Demands, Efficiencies, and Inputs
- Time-Slicing (DSDs)
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE CEF sector aggregation tool provides a top-down, generalized representation of Canadian energy demand directly inheriting the Canada Energy Regulator's (CER) Energy Futures scenarios. Instead of building detailed, bottom-up operational models (like residential buildings or industrial kilns), this module forces the optimizer to perfectly mirror the top-down macroscopic energy demands and fuel mixes published by the CER.
What the Tool Produces
- A Temoa-schema SQLite database containing consolidated technologies, commodities, generic energy demands, specific fuel input bounds (
LimitTechInputSplitAnnual), and electricity time-slicing profiles (DemandSpecificDistribution).
Scope
The model operates across:
- Regions: Configurable Canadian provinces mapped from CER generic regions.
- Sectors: End-use macro-sectors explicitly defined in mapping CSVs (e.g., Residential, Commercial, Industrial, Transportation).
- Commodities: Any fuel/energy vectors defined by the CER output structure (Electricity, Natural Gas, Diesel, Biomass, etc.).
2. Pipeline Architecture
The processing logic is fundamentally contained within a singular orchestrator pattern rather than distinct domain scripts:
1. __main__.py → Execution trigger
2. setup.py → Instantiates SQLite database, parses `params.yaml`, and loads mapping CSVs
3. all_sectors.py
a. build_sectors → Parses the core CEF CSV, maps domains, drops insignificant flows, and generates topology
b. build_tester → Establishes generic time-period bounds matching the data years
c. build_dsd → (Optional) Loads electricity demand-specific distributions mapping summer/winter/day/night profiles
d. build_metadata → Finalizes dataset labels and reference sources
3. Configuration and Setup
General Settings
Defined in input_files/params.yaml:
sqlite_schema: Points to the base SQL schema file initializing the database.scenario: Determines which CER scenario to filter for (e.g., 'Global Net-zero').conversion_factor&decimal_places: Global unit adjustments for standardizing physical energy inputs into the target mathematical constraints.prop_thresh: A statistical cutoff dropping highly trivial energy streams to simplify the mathematical space for the solver.
Domain Mapping
Configured meticulously through static CSVs within input_files/:
regions.csv: Maps CER strings to CANOE 2-letter codes.commodities.csv: Translates output fuel vectors.sectors.csv: Translates End-Use aggregate domains into functional demand technology branches.
4. External Data Handling
- CER End-Use Demand: Unlike sectors that actively scrape APIs, the CEF module expects a downloaded static flat-file snapshot inside the repository natively (e.g.,
end-use-demand-2023.csv). - The script utilizes pandas to radically filter this million-row document strictly focusing upon rows intersecting the configured
scenario, active map keys, and relevantmodel_periods.
5. Process and Commodity Generation
Inside all_sectors.py -> build_sectors():
- Pivoting to Proportions: Calculates the grand total energy usage for every generated technology node per year. Subsequently maps the proportional share of every fuel making up that total.
- Filtering Noise: If a specific fuel commodity represents a tiny fraction of the overarching technology bounds (falling under
prop_thresh), it is entirely deleted to prevent solver instability from tracking infinitesimal matrices. - Database Injection: Generates independent
Technology,SectorLabel, and dual variants ofCommodityblocks (one strictly for Demands, and one denoting physical Fuels).
6. Demands, Efficiencies, and Inputs
- Efficiencies: The model assumes
Efficiency = 1.0universally for these macro transfers. The required fuel is functionally equated precisely to the output utility constraint representing the top-down statistical reality. - Demand Rows: Generates
Demandvariables defining total volumetric energy needs per province/sector/year matching the extracted CER scenario exactly. - Input Limits: Writes constraints exclusively into
LimitTechInputSplitAnnual. Instead of allowing the solver freedom to optimize cheaper fuel trajectories, theproportioncolumn rigorously enforces the exact percentage mix dictated by the CER CEF scenario pathway (utilizing the<=orleconstraint).
7. Time-Slicing (DSDs)
If toggled by use_dsd, the tool executes build_dsd().
- Electricity Parsing: Unlike generic annual bounds for physical fuels (NG/Diesel), electricity matrices inherently suffer without time-of-day recognition.
- Mapping: The module ingests
dsd_electricity.csvto partition specifically electrical demand commodities into seasonal (Summer/Winter/Intermediate) and daily (Day/Night) fractions. - Distribution: Injects rows into
DemandSpecificDistribution,TimeSeason,TimeSegmentFraction, andTimeOfDayto scaffold the temporal slicing inside the SQLite outputs.
8. Known Assumptions and Limitations
- Rigid Optimistic Modeling: Since
LimitTechInputSplitAnnualis heavily enforced, the resulting Temoa model acting upon this dataset cannot actively optimize fuel-switching based on its own cost curves. It merely verifies and costs out the scenario already pre-calculated by the Canadian Energy Regulator. - Data Staleness: Expecting a manual, local flat-file (
end-use-demand-2023.csv) means the module does not automatically upgrade its dataset when the CER releases modern revisions unless the repository manager manually downloads and overwrites it. - Efficiency Staticism (1.0): Bypasses complex realities involving technological turnover. It assumes the structural makeup modeled by the CER already resolves underlying efficiency evolutions internally.
- No Disaggregation: The model inherently accepts massive macro-sectors (e.g., 'Industrial') and ignores complex sub-sector granularity necessary for studying specialized industrial clusters or housing envelopes.
- DSDs are Electricity Only: Time-slicing strictly profiles electricity. Gas ramping or peak heating burdens functionally disappear into arbitrary annual averaged totals.
CANOE CEF Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| End-Use Trajectories | CER Canada's Energy Future | Prov / Macro-Sector | Periodic (1-2 Years) |
| Electricity Time Slices | Internal Reference CSVs | Hourly/Seasonal Proxy | Static |
| Mappings | Internal CSVs | Sector / Fuel Nodes | Static |
2. CER Canada's Energy Future (CEF)
- Usage: Provides the fundamental core logic of the entire model, laying out exactly what technologies consume what fuels, down to exact fractions, tracing out to 2050.
- Update Procedure: Visit the Canada Energy Regulator's Open Data Portal. Download the newest "End-Use Demand" flat-file representing their newest Energy Futures publication. Rename it locally and ensure the script target inside
all_sectors.pyaccurately aims at the new.csvdocument.
3. Internal DSD and Mapping Definitions
- Usage:
dsd_electricity.csvconstructs proportional temporal slices describing when electricity is demanded across day/night cycles per region.sectors.csv,regions.csv, andcommodities.csvensure translation between CER English schema strings and compact CANOE tokens. - Update Procedure: Only demands auditing if the CER inherently alters their macroscopic modeling labels (e.g., splitting "Transportation" into discrete segments requiring updated dictionary keys).
4. Update Procedures (Checklist)
During regular annual or biannual CANOE updates:
- Flat File Download: Manually procure the updated CER End-Use data file. Place it forcefully in the
/input_files/directory. - Script Update: Point line
23ofall_sectors.py(e.g.,pd.read_csv(config.input_files + 'end-use-demand-[YEAR].csv')) exactly toward the newly localized version. - Parameter Audit: Open
params.yaml. Verify thescenarioconfiguration precisely patches a valid internal string inside the newly downloaded CER matrix (e.g., ensuring 'Global Net-zero' didn't transform into 'GNZ2050'). Ensuremodel_periodscorrectly aligns with your targeted study bounds. - Mapping Audit: If execution fails throwing
KeyErrormappings, verify the CER didn't rename a fuel or sector label necessitating modifications insidesectors.csvorcommodities.csv.