CANOE Agriculture Sector — Data Processing Documentation

Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's agriculture sector.


Table of Contents

  1. Overview
  2. Pipeline Architecture
  3. Configuration and Setup
  4. External Data Fetching
  5. Demand and Capacity
  6. Technology Splits and Efficiencies
  7. Costs
  8. Known Assumptions and Limitations

1. Overview

Purpose

The CANOE agriculture sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian agricultural sector. It pulls top-down data primarily from Natural Resources Canada (NRCan), the Canada Energy Regulator (CER), and Statistics Canada to represent the sector's unified energy demands and fuel mixes.

What the Tool Produces

Scope

The model operates across:


2. Pipeline Architecture

Comparable to the industrial subsystem, the agriculture processing logic utilizes a modular ETL-like architecture coordinated through aggregator.py:

1. aggregator.py        → Main orchestrator
   a. load_runtime_agri  → Initialize database, tables, schema versions, and shared constraints
   b. data_scraper      → Load CEUD baseline (NRCan 'CP', sector 'agr') and Macro Indicators (CER)
   c. statcan           → Load Atlantic (ATL) province disaggregation mapping
   d. demands           → Build Demand arrays (GDP-scaled + ATL mapped)
   e. techinput         → Extract fuel split mixes (`LimitTechInputSplitAnnual`) from NRCan
   f. efficiency        → Seed theoretical conversion boundaries (Efficiency = 1.0)
   g. costs             → [Skipped/Commented out in active pipeline]
   h. post_processing   → Embed datasets, sources taxonomy, and IDs

3. Configuration and Setup

General Settings

Defined in input/params.yaml:

Domain Mapping

The pipeline maps base strings into Temoa IDs statically. The agriculture domain uses the prefix A_ for tracking commodities natively linked to agricultural demands.


4. External Data Fetching

Data fetching utilizes robust local caching (cache/) to limit repeated web requests:

  1. NRCan Comprehensive Energy Use Database (CEUD): Actively scraped via the oee.nrcan.gc.ca API. Fetches exclusively the 'agr' sector tables for every province and the Atlantic aggregate (ATL).
  2. CER Energy Future (CEF): Downloads Macro-indicators defining expected forward trajectories of GDP parameters directly via the raw CSV endpoint.
  3. StatCan: Downloads archives for specific geographic disaggregation of Atlantic primary and secondary energy supply/demand characteristics.

5. Demand and Capacity

Constructed through demands.py:


6. Technology Splits and Efficiencies

Fuel Input Splits (techinput.py)

Uses LimitTechInputSplitAnnual parameters to lock the allowed fuel mix to the historical base parameters observed within NRCan's tables.

Efficiencies (efficiency.py)


7. Costs


8. Known Assumptions and Limitations

  1. GDP-to-Demand Decoupling: Future energy demand intrinsically scales at a 1:1 ratio with the CER Global Net-Zero framework's real GDP. It lacks internal non-linear scaling recognizing structural shifts in agricultural intensity (e.g. shifts from heavy row-crops to automated vertical greenhouses).
  2. Diesel Remainder Assignment: Assuming all "missing" or unassigned fractions in the base-year energy mix default to Diesel strongly cements diesel dependencies in the optimizer, regardless of whether that missing slice practically included off-grid natural gas or localized biomass.
  3. Static Fuel Splits: Relying rigidly on base-year (2022) input mix fractions artificially constrains the model's capacity to switch fuels automatically across future periods.
  4. Efficiency Staticism: Lacking explicit unit modeling, generic efficiencies held at 1.0 limit representations of massive jumps in efficiency (such as electrifying diesel tractor fleets where electrical powertrains intrinsically possess much higher mechanical conversion efficiencies).
  5. No Equipment Capital Limits: Without active generic CostInvest markers holding the sector accountable, agricultural electrification capacity is strictly limited by grid limits and electricity cost, rather than the heavy upfront capital burden of procuring specialized electric farm equipment.

CANOE Agriculture Sector — Data Sources Catalog

1. Data Source Summary

Data Type Primary Source Granularity Update Frequency
Historical Demand NRCan CEUD Province / Annual Annual
Demand Scaling CER CEF Macro Indicators Canada / Annual Annual/Biannual
Atlantic Disaggregation Statistics Canada Provincial / Annual Annual
Fuel Input Splits NRCan CEUD Province / Annual Annual

2. NRCan Comprehensive Energy Use Database (CEUD)

3. Canada Energy Regulator (CER) Macro-Indicators

4. Statistics Canada

5. Update Procedures (Checklist)

During regular annual or biannual CANOE updates:

  1. Clear Caches: Delete the contents of the cache/ directory (specifically dataframes.pkl and pop_df.pkl) to force fresh data scrapes.
  2. Review Config: Modify NRCan_year and subsequent projection periods arrays in input/params.yaml.
  3. CER URL Validation: Test the explicit CSV download URL for the Canada Energy Regulator and verify they haven't released a newer Energy Futures forecast matrix. Update CER_URL if they have.
  4. Execution Test: Execute aggregator.py. Check output logs ensuring the "Diesel Remainder" correction logic inside techinput.py doesn't artificially skew the entire sector if NRCan dramatically alters their data reporting taxonomy resulting in heavy missing slices.