CANOE Residential Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's residential buildings sector.
Table of Contents
- Overview
- Pipeline Architecture
- Configuration and Setup
- Pre-Processing
- End-Use Subsectors (Existing Capacity and Demand)
- New Capacity
- Hourly Demand Profiles (ResStock)
- Emissions
- Post-Processing
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE residential sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian residential buildings sector. It dynamically downloads and processes data from multiple sources—including Natural Resources Canada (NRCan), U.S. EIA Annual Energy Outlook (AEO), NREL ResStock, and Statistics Canada—to construct a detailed bottom-up representation of end-use demands, technology stocks, and investment options.
What the Tool Produces
- A Temoa-schema SQLite database containing technology attributes, costs, capacities, energy demands, emissions, and hourly time-slice descriptions for the residential sector.
- Optionally, an Excel workbook replica of the database for easier manual inspection.
- Diagnostic plots displaying demand specific distributions (DSDs) across the modeled hours.
Scope
The model operates across:
- Regions: Canadian provinces.
- End-Uses: Space heating, space cooling, water heating, lighting, and appliances/other.
- Technologies: A broad range of existing and future household technologies (furnaces, heat pumps, air conditioners, water heaters, various lighting types, and major appliances).
- Time resolution: Hourly load profiles mapped to representative seasonal and diurnal periods.
- Planning horizon: Configurable multi-year stepping (default: 5-year periods from 2025–2050).
2. Pipeline Architecture
The processing logic follows a structured orchestration handled primarily by residential_sector.py and all_subsectors.py:
1. setup.py → Load configuration, fetch population projections and macroeconomic data, download AEO databases
2. instantiate_database → Wipe/create the target SQLite database using the standard Temoa schema
3. all_subsectors.py → Orchestrator for sector-wide operations
a. pre_process() → Define core temporal scales, geographical regions, physical commodities, and fundamental technologies
b. [Subsector Loop]
i. space_heating.py → Compute existing demand, efficiencies, and stock capacities
ii. space_cooling.py → Compute existing demand...
iii. water_heating.py → Compute existing demand...
iv. lighting.py → Compute existing demand...
v. appliances.py → Compute existing demand...
c. aggregate_dsd() → Construct normalized Demand Specific Distributions utilizing NREL ResStock
d. aggregate_emissions() → Map US EPA emission factors to physical combustion
e. post_process() → Cross-check data consistency, write vintage chronologies, output bibliography references
3. Configuration and Setup
3.1 Configuration File (params.yaml)
Control parameters define the macro structure of the final database:
| Parameter | Default | Description |
|---|---|---|
period_step |
5 | Integer spacing between future projection years |
model_periods |
[2025, ..., 2050] | Planning sequence periods |
base_year |
2022 | The historical calibration point for empirical data pulls (NRCan) |
weather_year |
2018 | The temporal anchor matching ResStock hourly profiles |
furnace_fans |
Flag | Specifies if electricity consumption for furnace fans is explicit |
dsd_tolerance |
0.02 | Minimum threshold for clipping extremely small hourly demand fractions |
3.2 Definition Maps (CSVs)
existing_technologies.csv/new_technologies.csv: Dictionary arrays tying model technology tags to AEO proxy strings and NRCan categorical fields.fuel_commodities.csv/end_use_demands.csv: Outlines physical fuel constraints and output utility metrics (e.g., PJ space heating).regions.csv/time.csv: Provincial alignment to US Census divisions (for proxying) and definitions for season/hour groupings.
4. Pre-Processing
Data structures strictly requisite for Temoa's operational matrices are generated:
- Time definitions: Explicit rows mapping 8,760 hours to aggregated fractional time-delineations (
SeasonLabel,TimeSeason,TimeOfDay,TimeSegmentFraction). - Technology Initialization: Cross-matches provided dictionaries to declare technology strings. It utilizes the AEO residential module (via Weibull distributions) to immediately instantiate dynamic equipment lifetimes.
5. End-Use Subsectors (Existing Capacity and Demand)
The five primary Python modules (space_heating.py, space_cooling.py, water_heating.py, lighting.py, and appliances.py) execute slightly different methodologies dependent on available data:
5.1 Base Demand and Efficiency (NRCan)
- Secondary Energy Consumption (SEC) is loaded from the NRCan Comprehensive Energy Use Database. (e.g. Table 8 for space heating).
- Efficiencies corresponding to existing vintages are mapped from NRCan Table 26 (Heating System Efficiencies) etc., accommodating single or dual-fuel (where explicit) equipment.
- Primary Demand Constraint is calculated directly as:
DEMAND = SEC * EFFICIENCY.
5.2 Future Demand Scaling (Population)
Unlike the commercial sector which references GDP, the residential sector inherently relies on demographic growth to project absolute energy boundaries forwards into model periods.
- The pipeline requests regional demographic data directly from Statistics Canada APIs (historical populations and "M1 medium-growth" projections).
5.3 Existing Capacity Mapping
Base year hardware stock quantities are mapped from NRCan's database (e.g., Table 21 for Heating System Stock by building type). Rather than dumping this entire volume into the base year, CANOE functionally distributes these unit counts linearly backward across feasible historical vintages derived from the equipment lifetime.
6. New Capacity
Future capital investments depend rigidly upon techno-economic metrics from the U.S. EIA Annual Energy Outlook (AEO) Residential Module (rsmess.xlsx).
- Canada is mapped regionally to comparable US Census Divisions.
- Parameter structures derived:
- CostInvest / CostFixed: Acquired from AEO base values, multiplied by configured currency exchange and inflation adjustment vectors, transforming them to $CAD(2020) per unit of capacity.
- Efficiency: Straight proxy from expected generic baseline performance metrics within the AEO class.
- Limits: Annual capacity factors are bounded using the peak-to-mean operational ratio inherently identified from the end-use specific hourly shape generation.
7. Hourly Demand Profiles (ResStock)
A crucial aspect for system capacity adequacy is projecting exactly when the calculated annual end-use energy demand occurs throughout the year.
- Data Source: Uses NREL ResStock modeled hourly energy outputs categorized across archetypal US building configurations (mobile homes, apartments, detached, etc).
- Stock Alignment: The tool weights these US archetypical profiles to match the specific housing demographic (single vs multi-family ratios) existing in the target Canadian Province based on NRCan distribution surveys.
- Weather Alignment: The pipeline normalizes the profile, applying temperature algorithms utilizing Renewables Ninja geo-proxied meteorological data (temperature and humidity matched to the standard
weather_year) transferring the US profile's fundamental physics to the Canadian climate. - Result: An array representing the Demand Specific Distribution (
dsd) fractions mapping annual PJ targets into actual seasonal/daily Temoa demands.
8. Emissions
If activated via include_emissions in params.yaml:
- Emission mechanisms load coefficients directly mimicking the US EPA GHG Emission Factors Hub.
- Ties specific input fuel flows directly to a dummy global-warming-potential commodity explicitly linking combustion energy to the resulting physical bounds (e.g.,
EmissionLimit).
9. Post-Processing
- Assures data continuity and database stability by sweeping for any orphaned technologies or absent
data_idassignments which ensure rigid traceability across all inserted sqlite rows. - Exports an internal tracker to compile
DataSourcemappings.
10. Known Assumptions and Limitations
- Proxying Economics and Technologies: Like the commercial sector, the residential methodology extensively co-opts the EIA AEO frameworks assigning proxy USA-level geographic divisions to Canadian Provinces. Nuances between US manufacturing stock, Canadian regional subsidies, and specific cross-border technological adoptions are smoothed.
- Growth Trajectory Staticism: Projections map linearly to population size without complex non-linear elasticity constraints based on shrinking floor spaces, densification adjustments, or changing consumer welfare.
- ResStock Weather Generalization: Transferring localized US meteorological responses linearly against gross-level provincial weather metrics masks intense localized microclimates (especially inside massively varied topographies like BC or large geographic footprints like Ontario).
- Lighting Load Assumptions: Default Annual Capacity factors for lighting lean on statically defined load estimates (e.g., rigid hourly usage values per bulb), failing to perfectly capture shifting diurnal lighting behaviors tied fundamentally to work-from-home shifts or smart-home penetration.
CANOE Residential Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| Existing Demand/Fuel | NRCan CEUD | Regional / Annual | Annual |
| Future Demand Scaling | Statistics Canada | Provincial / Annual | Annual |
| Tech Shares & Costs | U.S. EIA AEO | US Census Div. | Annual |
| Hourly Demand Shapes | NREL ResStock | Archetype / Hourly | Periodic upgrades |
| Emissions Factors | US EPA GHG Hub | Fuel Type | Annual |
| Weather Mapping | Renewables Ninja | Geo-Coordinates | None (static year) |
2. NRCan Comprehensive Energy Use Database (CEUD)
- Usage: Provides the anchor framework for historical base year demands and unit counts across heating, cooling, lighting, and appliances lines.
- Update Procedure: Base year values (
base_yearinparams.yaml) should be checked. If table structures shift within the NRCan provided Excel files, the bounds insideutils.get_compr_db()will require index modifications.
3. U.S. EIA Annual Energy Outlook (AEO) Residential Module
- Usage: Source proxy for techno-economics metrics, efficiency evolutions, new capital options, and degradation/lifetime metrics.
- Update Procedure: A new
rsmess.xlsxfile must be manually pulled from the AEO NEMS archive repository intoinput_files/. It is critical to adjustaeo_data_year. Ensurersmess.xlsxsheet matrices (RSCLASSandRSMEQP) haven't changed indexing structures when updated.
4. Statistics Canada
- Usage: Pulls demographic historicals and M1 medium-growth projections to index gross energy consumption curves forward.
- Update Procedure: Dynamically parsed using the StatCan API (typically Table 17-10-0057 for forward projection, 17-10-0009 for historical). The pipeline aggressively caches downloaded responses. During update cycles, empty the
data_cache/directory to force a fresh data scrape.
5. NREL ResStock
- Usage: Maps generic hourly load shapes categorized by housing archetype explicitly linking physical temperature thresholds to thermal demands.
- Update Procedure: Uses dynamic S3 path calls. The model utilizes Amy2018 base weather architectures mapped via
params.yaml. Updates rely upon modifying the coreresstock:configuration string paths if NREL reorganizes their OpenEI structure.
6. US EPA GHG Emission Factors Hub
- Usage: Generates default CO2 equivalent emission scaling factors.
- Update Procedure: Change
epa_urlandepa_yearin the configuration files to utilize explicit annual spreadsheet dumps provided by the US EPA.
7. Renewables Ninja
- Usage: Uses local API tokens to pull geo-specifically mapped temporal matrices to transfer ResStock profiles.
- Update Procedure: Ensure the
rninja_api_token.txtremains valid. Data does not generally demand updating unlessweather_yearis aggressively modified.
8. Update Procedures (Checklist)
As defined natively by annual_update_checklist.txt:
- Latest AEO NEMS data: Download and replace
rsmess.xlsxininput_files. - Setup.py indexing: If AEO altered their table outputs, update the row/col indexing arrays inside
setup.py. - Review Config: Modify configuration data years and update relevant citation string variables in
params.yaml. - Purge Cache: Delete all objects within the local
data_cache/folder forcing the routine to retrieve fresh JSON/ZIP artifacts from exterior endpoints (StatCan, ResStock). - Execution Test: Generate the database validating the resulting tables against proper Temoa-SQLite structural definitions. Ensure no orphan data components appear.