CANOE Commercial Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's commercial buildings sector.
Table of Contents
- Overview
- Pipeline Architecture
- Configuration and Setup
- Pre-Processing
- Existing Capacity and Demand
- New Capacity
- Hourly Demand Profiles (ComStock)
- Emissions
- Post-Processing
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE commercial sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian commercial buildings sector. It dynamically downloads and processes data from multiple sources—including Natural Resources Canada (NRCan), U.S. EIA Annual Energy Outlook (AEO), NREL ComStock, and Statistics Canada—to build a comprehensive set of end-use demands, existing technology stocks, and new technology options.
What the Tool Produces
- A Temoa-schema SQLite database containing all technology, cost, capacity, demand, emissions, and time-series data for the commercial sector.
- Optionally, an Excel workbook clone of the database for inspection and review.
- Diagnostic plots showing demand specific distributions (DSDs) and time-of-week variations.
Scope
The model covers:
- Regions: Canadian provinces.
- End-Uses: Space heating, space cooling, and an aggregated "other" demand (lighting, equipment, water heating, etc.).
- Technologies: Existing and new HVAC systems (furnaces, boilers, heat pumps, chillers, etc.) and a dummy technology to satisfy "other" demands.
- Time resolution: Hourly representation mapped to seasonal and time-of-day representative slices.
- Planning horizon: Configurable five-year periods (default: 2025–2050).
2. Pipeline Architecture
The aggregation runs in a fixed sequence, orchestrated by commercial_sector.py and all_subsectors.py:
1. setup.py → Load configuration, download macro/demographic data, build API bridges
2. instantiate_database → Create or wipe SQLite database using the Temoa schema
3. all_subsectors.py → Orchestrator for all subsector components
a. pre_process() → Write time periods, regions, temporal structures, and commodities
b. [Per Region Loop]
i. comstock_dsd.py → Download and map hourly demand profiles from NREL ComStock
ii. existing_capacity.py → Calculate annual demand, allocate existing stock, and write DSDs
iii. new_capacity.py → Write techno-economic parameters for new technology options
c. aggregate_emissions() → Compute direct emissions for energy combustion
d. post_process() → Clean up vintages, attach references, and validate data IDs
3. Configuration and Setup
3.1 Configuration File (params.yaml)
The primary configuration file dictates macro variables, API links, and processing rules:
| Parameter | Default | Description |
|---|---|---|
period_step |
5 | Years between model periods |
model_periods |
[2025, 2030, ..., 2050] | Planning horizon periods |
base_year |
2022 | Default year for pulling non-timeseries data (NRCan context) |
weather_year |
2018 | Year matching the ComStock hourly profiles |
other_electrification_factor |
0.62 | Target fraction of non-electric fuel shifted to electricity for "other" enduses by 2050 |
dsd_tolerance |
0.02 | Minimum threshold for hourly demand as fraction of mean |
sec_tolerance |
0.05 | Minimum threshold for secondary energy consumption |
3.2 Technology and Definitions CSVs
existing_technologies.csv/new_technologies.csv: Define existing and new commercial tech, linking them to EIA AEO reference names and their primary fuels.fuel_commodities.csv/end_use_demands.csv: Map fuel inputs and demand outputs with their Temoa identifiers and physical units.regions.csv: Maps Canadian provinces to US Census Divisions for proxying AEO data.time.csv: Configures the translation of 8,760 hours into representative seasons and times-of-day.
3.3 Data Caching
All remote data (AEO spreadsheets, StatCan zips, ComStock profiles, macro indicators) are downloaded to a local data_cache/ directory to speed up subsequent aggregations. force_download: true will bypass the cache.
4. Pre-Processing
The pre_process() function builds the structural foundation of the model:
- Time: Writes all specific hours, times-of-day (H01-H24), and season mappings into
TimeSegmentFraction,TimeSeason, etc. - Regions: Inherits regions defined in
regions.csv. - Commodities: Defines physical fuel commodities (e.g., electricity, natural gas) and demand services (space heating, space cooling, other).
5. Existing Capacity and Demand
This constitutes the core analytical engine in existing_capacity.py.
5.1 Defining the Existing Stock Equation
The model defines the existing commercial building stock mathematically:
DEM (Output Demand) = SEC (Input Fuel) × EFF (Efficiency)
CAP (Capacity) = DEM / (ACF × C2A)
5.2 Base Year Secondary Energy Consumption (SEC)
Base year (e.g., 2022) fuel consumption for space heating, space cooling, and other end-uses is downloaded from the NRCan Comprehensive Energy Use Database (CEUD).
- Atlantic Provinces Adjustment: Because NRCan aggregates the Atlantic provinces, the pipeline pulls auxiliary data from Statistics Canada (Table 25-10-0029) to calculate the province-specific fraction of energy use to accurately disaggregate the Atlantic region.
5.3 Proxied Efficiencies and Market Shares
Since NRCan does not provide detailed installed-base efficiencies, the tool proxies them from the U.S. EIA Annual Energy Outlook (AEO) Commercial Demand Module (CDM):
- Modellers map Canadian provinces to corresponding US Census Divisions with similar climate/economic factors.
- Installed technological base shares (
serv_share) and efficiencies are gathered from AEO'sktekx.xlsxdatabase. - Weighted average efficiencies are calculated per fuel, per end-use to derive final service DEMAND.
5.4 "Other" End-Uses
All residual energy usage aside from space heating and space cooling is lumped into a generalized "other" end-use. The fuel shares of this "other" demand are allowed to evolve.
- By 2050, it linearly interpolates non-electric fossil fuels down, shifting a portion to electricity based on the user-defined
other_electrification_factor(default 0.62 based on CER scenarios).
5.5 Scaling Future Demand
Base year service demands are scaled into the future (params.yaml model periods) using GDP growth projections derived from the Canada Energy Regulator (CER) Energy Future reports.
6. New Capacity
For prospective technology adoption, new_capacity.py ingests data entirely from the EIA AEO ktekx.xlsx technology menus.
- Efficiencies: Proxied exactly from AEO.
- CostInvest / CostFixed: Sourced from AEO's
capcstandmaintcstvariables. Costs are converted from base US units (e.g., $/(kBtu/h)) to CANOE native metrics (M$/(PJ/y)) using pre-defined conversion factors, and adjusted for currency and inflation into CAD. - Lifetimes: Taken directly from the AEO lifespan tables.
- Capacity Factors: Upper limits (
LimitAnnualCapacityFactor) are constrained by the peak-to-mean ratios derived from the hourly ComStock demand distributions.
7. Hourly Demand Profiles (ComStock)
To capture the intra-annual temporal dynamics necessary for capacity expansion modeling, hourly demand shapes are synthesized via comstock_dsd.py:
- Data Source: Uses NREL ComStock representative commercial buildings dataset (e.g., Hospital, Large Office, Primary School).
- Mapping: Since ComStock data is primarily US-focused, Canadian provinces pull profiles from an equivalent US State (
comstock_map.csv), adjusted through a process of explicit weather mapping (using Renewables Ninja temperature/humidity records). - Demand Specific Distributions (DSD): All ComStock 8,760-hour outputs are summed by end-use and then normalized to sum to 1. This normalized fraction is assigned to
DemandSpecificDistribution, dictating when the annual demand (from Section 5.5) physically occurs.
8. Emissions
If include_emissions is set to true in params.yaml:
- US EPA GHG Emission Factors Hub data is integrated to apply CO2, CH4, and N2O factors.
- Factors are scaled into standard Global Warming Potential (GWP) CO2-equivalent metrics and tied to specific generator inputs via the
EmissionActivitytable.
9. Post-Processing
The final step handles bookkeeping:
- Vintages (historical installation years) used by existing technologies are appended to the
TimePeriodtable with the flag "e". - The internal bibliography of citations is flushed to the
DataSourcetable. - A complete check for missing or unassigned
data_idfields ensures traceability for all database entries prior to validation.
10. Known Assumptions and Limitations
- US Proxies for Canadian Building Stock: The commercial pipeline relies heavily on the EIA AEO for specific HVAC efficiencies, technology costs, and installed base distributions. Furthermore, NREL ComStock is used for hourly usage profiles. While efforts are made to align similar climates (e.g., US Census Divisions matching Canadian Provinces), systemic differences between US and Canadian commercial building codes, occupancy norms, and construction materials are not explicitly captured.
- Atlantic Aggregation Proxying: Extracting individual province data from NRCan’s aggregated Atlantic region requires using a different Statistics Canada dataset with different survey scopes. This disaggregation is an approximation.
- Fixed Peak-to-Mean Demand Profiles: The hourly demand profile shapes generated from the base weather year (e.g., 2018) are assumed to remain constant across all future model periods.
- Simplification of "Other" Demands: Distinct end-uses such as water heating, lighting, and refrigeration are aggressively aggregated into a single "other" commodity, smoothing out individual usage profiles. The electrification transition for these is deterministic (linear interpolation) rather than endogenously optimized.
CANOE Commercial Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| Existing Demand/Fuel | NRCan CEUD | Regional / Annual | Annual |
| Future Demand Scaling | CER Canada's Energy Future | National/Regional GDP | Annual/Biannual |
| Tech Shares & Costs | U.S. EIA AEO | US Census Div. | Annual |
| Hourly Demand Shapes | NREL ComStock | US State / Hourly | Periodic updates |
| Emissions Factors | US EPA GHG Hub | Fuel Type | Annual |
| Population Config | Statistics Canada | Provincial / Annual | Annual |
2. NRCan Comprehensive Energy Use Database (CEUD)
- Usage: Establishes the absolute base year of commercial sector secondary energy consumption for space heating, cooling, and other.
- Update Procedure: Base year should be advanced. Update
base_yearinparams.yamland verify the integrity of the downloaded Excel sheets (Table 1, 24, 32 row/column structures).
3. U.S. EIA Annual Energy Outlook (AEO) Commercial Demand Module
- Usage: Provides parameters for efficiency, technological breakdown (market shares), fixed costs, and investment costs for both existing and new technologies.
- Update Procedure: A new
ktekx.xlsxfile must be manually downloaded from the AEO NEMS archive and placed ininput_files/. Checkaeo_cdm_indexing.csvand row indices in Python code if EIA alters the exact matrix dimensions of their output sheets. Updateaeo_data_yearin config.
4. NREL ComStock
- Usage: High-resolution hourly energy consumption profiles mapping commercial building archetypes to end-uses.
- Update Procedure:
params.yamlhandles dynamic S3 URL paths to the ComStock data lake. Updatedata_yearand theurlpath if NREL releases newer weather year evaluations.
5. Statistics Canada
- Usage: Historical and projected regional aggregate population metrics; provides distinct regional fuel allocations to disentangle NRCan's blanket "Atlantic" categorizations.
- Update Procedure: The pipeline pulls dynamically via the StatCan API (Table 17-10-0057 for projections and 25-10-0029 for energy flow). Update references in
params.yamlif table indices change.
6. CER Canada's Energy Future
- Usage: Macroeconomic GDP forecasts to project space heating, cooling, and auxiliary demands into future model periods.
- Update Procedure: The URL in
params.yamlpoints to the Open Government Portal. Ensure thegdp_urlis updated when a new CEF iteration is released.
7. US EPA GHG Emission Factors Hub
- Usage: Emission factors (CO2, CH4, N2O) mapped directly to commercial fuel combustion.
- Update Procedure: Modify
epa_urlinparams.yamlto point to the latest annual excel release.
8. Renewables Ninja (Weather Mapping)
- Usage: Weather data points to map specific ComStock US state proxy files realistically onto Canadian provincial climates.
- Update Procedure: Handled via dynamic API request using cached tokens. Unless modifying the base
weather_year, no direct data updates are required.
9. Update Procedures (Checklist)
As defined in annual_update_checklist.txt:
- Latest AEO NEMS data: Download and replace
ktekx.xlsxininput_files. Update row/column indices in code and names innew_technologies.csv. - Update datasets: Modify data years and reference strings in
params.yaml. - Clear Cache: Empty the
data_cache/directory to force the pipeline to pull freshly aligned data. - Execute & Test: Run
commercial_sector.pyand resolve any CSV layout or API structural changes from upstream data providers. Ensure the output aligns with the new Temoa SQLite schema.