MBERs validation

From Global Energy Monitor

Marginal Build Emissions Rates (MBERs) validation

Climate TRACE recently released global MBER data that implements the default GHGP Guidelines build margin algorithm, at an hourly level. (Methodology here. For concision, these are referred to here as "GHGP MBERs".) The coalition also did some validation to verify that the model accurately predicts-real world emissions behavior. This page contains information about that validation.

Validation Methodology

The foundation of any MBER model lies in its ability to accurately forecast the types of power plants that will be constructed in the future in two different scenarios: one in which an intervention occurs, and one counterfactual scenario in which it does not occur. Thus, randomized controlled trials and natural experiments to examine counterfactuals can be useful when validating MBERs. However, a simpler place to start is to simply examine whether a MBER model, when given only data available prior to the forecast period, can successfully predict actual historical capacity construction of power plants that occurred. Note that depending on whether the intervention in question did or did not occur, this test can be thought of as validating either the intervention scenario, or the counterfactual scenario.

The historical capacity growth of power plants is a directly observable variable that can be obtained with few or no assumptions, allowing for a direct empirical test of one scenario of a MBER model. Here we have implemented this test.

For a cutoff date in the historical capacity data, consider the historical forecast of the capacities in a region derived or assumed by a MBER model. The GHGP MBER assumes that the power plants to be constructed after the cutoff date are similar to the newest power plants that generate 20% of the load at the cutoff date. The capacity changes based on this assumption can be derived using the following approach:

  1. Compute the percentage of each fuel type of the newest power plants that produce 20% of the generation at the cutoff date
  2. Sum the total capacity changes (MW) of power plants actually built after the cutoff date, excluding fuel types experiencing a net capacity decline (e.g., coal); this exclusion is necessary since the GHGP MBER focuses solely on new builds instead of retirements
  3. Multiply the values from steps 1 and 2 to derive the capacity growth assumed by the GHGP MBER model

Due to relatively low availability of other models prior to 2021 against which these MBERs could be compared, the cutoff date is chosen to be 2021. This means that the GHGP MBER is computed assuming only information available in the year 2021. The resulting capacity changes are then compared against the actual capacity growth of coal, gas, oil, solar, wind, and nuclear power plants in ten US balancing authorities (BPAT, CISO, ERCO, FPL, ISNE, MISO, NYIS, PJM, SOCO, and SWPP) between the years 2022-2024.

Two versions of the GHGP MBER model are analyzed, the annual model and the hourly model. To obtain annual numbers from the hourly model, it is aggregated in two ways. The first is by a straight average of the hourly numbers; this corresponds to the annualized MBER for a load intervention with a flat profile (e.g., a data center). The second is by a generation average of the hourly numbers, corresponding to an annualized MBER for a load intervention with a profile that is similar to the current generation.

To provide a comparison benchmark for the GHGP MBER model, the capacity changes between the years 2022-2024 for the same balancing authorities are compared with those predicted by Cambium. Cambium is a widely used emissions dataset produced by the US National Renewable Energy Laboratory. Cambium relies on capacity expansion modeling via economical least-cost simulations, and thus can directly provide predictions of future capacity changes. The Cambium model used in this comparison was its 2021 version, ensuring that the model had no access to data that occurred after 2021. Three Cambium models were included in the comparison, the “low-case”, “mid-case”, and “high-case”, corresponding to different assumptions of the costs of future renewables.

Note that while least-cost capacity expansion models are commonly used to predict future capacity changes, their typical use case is to make predictions over much longer time horizons than the few years used in this comparison. By contrast this MBER model makes predictions only over the 8760 hours of a single year. Its intended use case is not to inform long-term decision making, but rather to provide an ex post assessment of impacts after the fact. Thus, this analysis should not be interpreted as an assessment of Cambium’s accuracy for that model’s intended purpose of making long-run predictions; here Cambium is simply used as a familiar benchmark for a sense of scale.

Results

The resulting capacity changes, observed and derived/predicted are plotted in the following figures. The blue bar represents actual observed change in capacity by fuel type. The orange, green and red bars represent the predictions of different MBERs depending on the temporal profile: a low-granularity model that uses annual MBERs only; an hourly model that assumes perfectly flat load; and an hourly model profile with a temporal profile that is similar to current generation. The purple, brown, and pink bars represent the Cambium low, mid, and high case respectively.

In most cases, the three GHGP MBERs lead to broadly similar predictions, as do the three Cambium MBERs. The much larger differences are between models: e.g. all three GHGP MBERs typically predict significantly smaller annual swings in capacity than do all three Cambium values.

BPAT Capacity Changes
CISO Capacity Changes
ERCO Capacity Changes
FPL Capacity Changes
ISNE Capacity Changes
MISO Capacity Changes
NYIS Capacity Changes
PJM Capacity Changes
SOCO Capacity Changes

A table of percentage errors of installed capacity averaged over BAs is provided below. Negative numbers indicate an overestimation (underestimation) of installed (retired) capacity. Note that the GHGP MBERs measure only build, and are not designed to predict retirements. However, retirements are included in this table anyway to examine the consequence of this decision.

% Error GHGP GHGP Hourly (Flat) GHGP Hourly (Gen) Cambium Low-Case Cambium Mid-Case Cambium High-Case
Coal -104 -106 -105 58 9 -7
Gas 123 126 122 -54 -31 -16
Nuclear -100 -100 -100 -845 -845 -845
Oil -103 -100 -100 648 683 644
Solar -56 -58 -56 116 -24 -77
Wind 39 38 39 204 117 38
SWPP Capacity Changes

Averaged over all fuel types and model subtypes, the GHGP MBER obtains an averaged percentage error of 87.5%, compared to 280.5% for the Cambium models. It is important not to interpret this as a comprehensive measure of the accuracy of either model. The Cambium model was designed for different purposes and has different strengths, weaknesses, and use cases. For example, these results are purely retrospective, and provide no information about either model's ability to predict change decades in advance.

Further, the overall numbers also mask considerable variation in the specific results. For example, the GHGP MBER method performs relatively well in predicting capacity increases, but as it does not incorporate retirements, it cannot predict decreases in gas, coal, and oil capacities. If a region increasing its net load had any causal effect on decreasing power plant capacity, the model would be unable to detect this. GHGP MBERs also typically predicted lower growth in solar and wind growth than does Cambium. In most cases this prediction more closely matched the resulting actual growth, but not in all cases.

While these caveats are important, the overall results do appear to indicate that the GHGP MBER, when applied retrospectively to examine the causal effects of build in a single year, has significant predictive power in capturing actual real-world power plant build behavior. In fact, in this specific single-year application, early tests even indicate that it may potentially outperform the closest thing Climate TRACE could find to a widely used industry standard model.

Discussion

The GHGP MBER is intended to be used on an annual basis to estimate the causal effect on emissions from structural change driven by a change in (net) electricity reductions or increases. When added up over a period of many years and combined with an operating margin, it becomes a retrospective estimate of long-run marginal emissions based solely on actual observed grid behavior.

A strength of the model is that unlike capacity expansion models such as Cambium, the model requires essentially only one single assumption. There is also no room for user subjectivity: the method is completely standardized for all power grid designs, regions, and conditions. Anyone implementing this algorithm produces the same result. However, it does rely on the one critical assumption that the past 20% of power plants built are a reasonable proxy for what the marginal next unit a grid would build in response to changes in net demand.

Given that grids are changing fast, Climate TRACE was initially uncertain of how this assumption would hold up in real-world behavior. In this early benchmark testing focusing on the ability of the method to successfully predict real-world actual change that has occurred to date, Climate TRACE concluded that the model appeared to have significant predictive power. In fact, it even outperformed the closest thing Climate TRACE could find to a widely used existing industry benchmark for this application. This is in no way a statement that this model is a general replacement for capacity expansion models, and only an examination of this particular use case.

Practitioners may wish to consider the use of the GHGP MBERs signal in long-run marginal emissions rate applications where retrospective analysis is sufficient and standardization and observability are useful features.

ClimateTRACE-logo.png