CANOE Agriculture Sector — Data Processing Documentation
Comprehensive documentation of the data pipeline that converts upstream data sources into a Temoa-ready SQLite database for the Canadian Open Energy (CANOE) model's agriculture sector.
Table of Contents
- Overview
- Pipeline Architecture
- Configuration and Setup
- External Data Fetching
- Demand and Capacity
- Technology Splits and Efficiencies
- Costs
- Known Assumptions and Limitations
1. Overview
Purpose
The CANOE agriculture sector aggregation tool automatically constructs a Temoa-compatible SQLite database representing the Canadian agricultural sector. It pulls top-down data primarily from Natural Resources Canada (NRCan), the Canada Energy Regulator (CER), and Statistics Canada to represent the sector's unified energy demands and fuel mixes.
What the Tool Produces
- A Temoa-schema SQLite database containing technology attributes, demands, efficiencies, and fuel mixes utilized by the agriculture sector.
Scope
The model operates across:
- Regions: Canadian provinces.
- Sectors: Aggregated as a singular
AGRIsector ("Agriculture, fishing, hunting and trapping"). - Commodities: Electricity (
elc), Natural Gas (ng), Diesel (dsl), and Gasoline (gsl). - Time resolution: Annual bounds.
- Planning horizon: Configurable multi-year stepping (default: starting in 2025).
2. Pipeline Architecture
Comparable to the industrial subsystem, the agriculture processing logic utilizes a modular ETL-like architecture coordinated through aggregator.py:
1. aggregator.py → Main orchestrator
a. load_runtime_agri → Initialize database, tables, schema versions, and shared constraints
b. data_scraper → Load CEUD baseline (NRCan 'CP', sector 'agr') and Macro Indicators (CER)
c. statcan → Load Atlantic (ATL) province disaggregation mapping
d. demands → Build Demand arrays (GDP-scaled + ATL mapped)
e. techinput → Extract fuel split mixes (`LimitTechInputSplitAnnual`) from NRCan
f. efficiency → Seed theoretical conversion boundaries (Efficiency = 1.0)
g. costs → [Skipped/Commented out in active pipeline]
h. post_processing → Embed datasets, sources taxonomy, and IDs
3. Configuration and Setup
General Settings
Defined in input/params.yaml:
schema_version: Ensures strict alignment to standard Temoa schema (version 3.1).periods: List of modeled future years (e.g.,[2025, ...]).NRCan_year: The base historical year calibration point (default: 2022).
Domain Mapping
The pipeline maps base strings into Temoa IDs statically. The agriculture domain uses the prefix A_ for tracking commodities natively linked to agricultural demands.
4. External Data Fetching
Data fetching utilizes robust local caching (cache/) to limit repeated web requests:
- NRCan Comprehensive Energy Use Database (CEUD): Actively scraped via the
oee.nrcan.gc.caAPI. Fetches exclusively the 'agr' sector tables for every province and the Atlantic aggregate (ATL). - CER Energy Future (CEF): Downloads Macro-indicators defining expected forward trajectories of GDP parameters directly via the raw CSV endpoint.
- StatCan: Downloads archives for specific geographic disaggregation of Atlantic primary and secondary energy supply/demand characteristics.
5. Demand and Capacity
Constructed through demands.py:
- Base Year (2022) Alignment: Takes provincial agricultural energy consumption directly from NRCan CEUD.
- Atlantic Disaggregation: NRCan aggregates Atlantic provinces into 'ATL'. The script cross-references StatCan data specific to "Agriculture, fishing, hunting and trapping" to divide the ATL aggregate proportionally among NB, NS, PEI, and NL.
- Future Demand Projection: Demand in future periods dynamically scales relative to the base year using Real Gross Domestic Product ($2012 Millions) growth factors extracted from the CER CEF 'Global Net-zero' scenario.
- Existing Capacity: While historically included, capacity scaffolding appears bypassed or strictly defined via baseline generic values in the current runtime compared to rigid buildings blocks.
6. Technology Splits and Efficiencies
Fuel Input Splits (techinput.py)
Uses LimitTechInputSplitAnnual parameters to lock the allowed fuel mix to the historical base parameters observed within NRCan's tables.
- Pulls ratios of Electricity, Natural Gas, Diesel, and Gasoline.
- Missing Data Correction Policy:
- If values are listed as
n.a.orX, the script attempts to calculate the remainder. - If the total observed percentage explicitly exceeds 100% (1.0), the script dynamically corrects the smallest observed float downward to force closure to 1.0.
- Diesel Swing Assumption: If the total known physical inputs fall short of 100%, the remainder is explicitly assigned entirely to Diesel (
dsl). It assumes heavy agricultural machinery (tractors/combines) represents the "invisible" bulk energy draw not perfectly responding to granular survey thresholds.
- If values are listed as
Efficiencies (efficiency.py)
- Efficiencies are configured structurally as exactly 1.0. The tool assumes the output end-use matches the fuel heating value 1:1 since the tool acts as a top-down allocator, not a physical combustion unit simulator.
7. Costs
- Capital/Investment Costs: In the current iteration of the agriculture module, explicit capital representation (
build_cost_invest_agri()) is intentionally commented out/disabled within theaggregator.pyorchestrator. Sector operations are constrained strictly by input costs (derived from the upstream fuel sector) and emission limits rather than heavy localized equipment capitalization bounds.
8. Known Assumptions and Limitations
- GDP-to-Demand Decoupling: Future energy demand intrinsically scales at a 1:1 ratio with the CER Global Net-Zero framework's real GDP. It lacks internal non-linear scaling recognizing structural shifts in agricultural intensity (e.g. shifts from heavy row-crops to automated vertical greenhouses).
- Diesel Remainder Assignment: Assuming all "missing" or unassigned fractions in the base-year energy mix default to Diesel strongly cements diesel dependencies in the optimizer, regardless of whether that missing slice practically included off-grid natural gas or localized biomass.
- Static Fuel Splits: Relying rigidly on base-year (2022) input mix fractions artificially constrains the model's capacity to switch fuels automatically across future periods.
- Efficiency Staticism: Lacking explicit unit modeling, generic efficiencies held at 1.0 limit representations of massive jumps in efficiency (such as electrifying diesel tractor fleets where electrical powertrains intrinsically possess much higher mechanical conversion efficiencies).
- No Equipment Capital Limits: Without active generic
CostInvestmarkers holding the sector accountable, agricultural electrification capacity is strictly limited by grid limits and electricity cost, rather than the heavy upfront capital burden of procuring specialized electric farm equipment.
CANOE Agriculture Sector — Data Sources Catalog
1. Data Source Summary
| Data Type | Primary Source | Granularity | Update Frequency |
|---|---|---|---|
| Historical Demand | NRCan CEUD | Province / Annual | Annual |
| Demand Scaling | CER CEF Macro Indicators | Canada / Annual | Annual/Biannual |
| Atlantic Disaggregation | Statistics Canada | Provincial / Annual | Annual |
| Fuel Input Splits | NRCan CEUD | Province / Annual | Annual |
2. NRCan Comprehensive Energy Use Database (CEUD)
- Usage: Provides the anchor framework for historical base year energy demands and fuel utilization mixes within the Ag sector.
- Update Procedure: The base year is configured via
NRCan_yearininput/params.yaml.
3. Canada Energy Regulator (CER) Macro-Indicators
- Usage: Extrapolates future demand projections using the 'Global Net-Zero' Real GDP scenario paths.
- Update Procedure: In
data_scraper.py, theCER_URLpoints explicitly tomacro-indicators-2023.csv. When a new Energy Futures release drops, this flat-file URL must be manually updated to reflect the newest reference scenario.
4. Statistics Canada
- Usage: Pulls specific Atlantic region breakdowns matching the "Agriculture, fishing, hunting and trapping" NAICS equivalents to compensate for NRCan regional aggregation.
- Update Procedure: Relies on
statcan.pyto ingest the latest regional tables.
5. Update Procedures (Checklist)
During regular annual or biannual CANOE updates:
- Clear Caches: Delete the contents of the
cache/directory (specificallydataframes.pklandpop_df.pkl) to force fresh data scrapes. - Review Config: Modify
NRCan_yearand subsequent projectionperiodsarrays ininput/params.yaml. - CER URL Validation: Test the explicit CSV download URL for the Canada Energy Regulator and verify they haven't released a newer Energy Futures forecast matrix. Update
CER_URLif they have. - Execution Test: Execute
aggregator.py. Check output logs ensuring the "Diesel Remainder" correction logic insidetechinput.pydoesn't artificially skew the entire sector if NRCan dramatically alters their data reporting taxonomy resulting in heavy missing slices.