MBERs validation test
Marginal Build Emissions Rates (MBERs) validation
Climate TRACE recently released global MBER data that implements the default GHGP Guidelines build margin algorithm, at an hourly level. The coalition also did some validation to verify that the model accurately predicts-real world emissions behavior. This page contains information about that validation. For concision, MBERs that follow the GHGP Guidelines are referred to as GHGP MBERs.
Validation Methodology
The foundation of any MBER model lies in its ability to accurately forecast the types of power plants that will be constructed in the future in two different scenarios: one in which an intervention occurs, and one counterfactual scenario in which it does not occur. Thus, randomized controlled trials and natural experiments to examine counterfactuals can be useful when validating MBERs. However, a simpler place to start is to simply examine whether a MBER model, when given only data available prior to the forecast period, can successfully predict actual historical capacity construction of power plants that occurred. Note that depending on whether the intervention in question did or did not occur, this test can be thought of as validating either the intervention scenario, or the counterfactual scenario.
The historical capacity growth of power plants is a directly observable variable that can be obtained with few or no assumptions, allowing for a direct empirical test of one scenario of a MBER model. Here we have implemented this test.
For a cutoff date in the historical capacity data, consider the historical forecast of the capacities in a region derived or assumed by a MBER model. The GHGP MBER assumes that the power plants to be constructed after the cutoff date are similar to the newest power plants that generate 20% of the load at the cutoff date. The capacity changes based on this assumption can be derived using the following approach:
- Compute the percentage of each fuel type of the newest power plants that produce 20% of the generation at the cutoff date
- Sum the total capacity changes (MW) of power plants actually built after the cutoff date, excluding fuel types experiencing a net capacity decline (e.g., coal); this exclusion is necessary since the GHGP MBER focuses solely on new builds instead of retirements
- Multiply the values from steps 1 and 2 to derive the capacity growth assumed by the GHGP MBER model
Due to relatively low availability of other models prior to 2021 against which these MBERs could be compared, the cutoff date is chosen to be 2021. This means that the GHGP MBER is computed assuming only information available in the year 2021. The resulting capacity changes are then compared against the actual capacity growth of coal, gas, oil, solar, wind, and nuclear power plants in ten US balancing authorities (BPAT, CISO, ERCO, FPL, ISNE, MISO, NYIS, PJM, SOCO, and SWPP) between the years 2022-2024.
Two versions of the GHGP MBER model are analyzed, the annual model and the hourly model. To obtain annual numbers from the hourly model, it is aggregated in two ways. The first is by a straight average of the hourly numbers; this corresponds to the annualized MBER for a load intervention with a flat profile (e.g., a data center). The second is by a generation average of the hourly numbers, corresponding to an annualized MBER for a load intervention with a profile that is similar to the current generation.
To provide a comparison benchmark for the GHGP MBER model, the capacity changes between the years 2022-2024 for the same balancing authorities are compared with those predicted by Cambium (https://www.nrel.gov/analysis/cambium.html). Cambium is a widely used emissions dataset produced by the US National Renewable Energy Laboratory. Cambium relies on capacity expansion modeling via economical least-cost simulations, and thus can directly provide predictions of future capacity changes. The Cambium model used in this comparison was its 2021 version (https://www.nrel.gov/docs/fy22osti/81611.pdf), ensuring that no information past 2021 was incorporated. Three Cambium models were included in the comparison, the “low-case”, “mid-case”, and “high-case”, corresponding to different assumptions of the costs of future renewables.
Note that while least-cost capacity expansion models are commonly used to predict future capacity changes, their typical use case is to make predictions over much longer time horizons than the few years used in this comparison. By contrast this MBER model makes predictions only over the 8760 hours of a single year. Its intended use case is not to inform long-term decision making, but rather to provide an ex post assessment of impacts after the fact. Thus, this analysis should not be interpreted as an assessment of Cambium’s accuracy for that model’s intended purpose of making long-run predictions; here Cambium is simply used as a benchmark.
Results
The resulting capacity changes, observed and derived/predicted are plotted in the following figures. The blue bar represents actual observed change in capacity by fuel type. The orange, green and red bars represent the predictions of different MBERs depending on the temporal profile: a low-granularity model that uses annual MBERs only; an hourly model that assumes perfectly flat load; and an hourly model profile with a temporal profile that is similar to current generation. The purple, brown, and pink bars represent the Cambium low, mid, and high case respectively.
In most cases, the three GHGP MBERs lead to broadly similar predictions, as do the three Cambium MBERs. The much larger differences are between models with Cambium MBERs typically projecting large swings in capacity, while GHGP MBERs predicting smaller swings.
A table of percentage errors of installed capacity averaged for each BA is provided below. Negative numbers indicate an overestimation (underestimation) of installed (retired) capacity. Note that the GHGP Guidelines measure build and are not designed to predict retirements. However, retirements are included in this table anyway to examine the consequence of this decision.
Averaged over all fuel types and model subtypes, the GHGP MBER obtains an averaged percentage error of 87.5%, compared to 280.5% for the Cambium models. It is important not to interpret this as a comprehensive measure of the accuracy of either model. The Cambium model was designed for different purposes and has different strengths, weaknesses, and use cases. The table below in no way ensures that the GHGP MBER is completely accurate, nor that it could be used for many of Cambium’s priority use cases.
Further, the overall numbers also mask considerable variation in the specific results. For example, the GHGP MBER method performs relatively well in predicting gas, solar, and wind capacity growths, but as it does not incorporate retirements, it cannot predict decreases in gas, coal, and oil capacities. It is typically less optimistic than Cambium in predicting solar and wind growth (e.g., ISNE, NYIS, PJM). Neither the GHGP MBER model or Cambium tend to produce accurate predictions on nuclear capacity changes.
While these caveats are important, the overall results do appear to indicate that the GHGP MBER, when applied retrospectively to examine the results of a single year for accounting purposes, has significant predictive power in predicting actual real-world power plant behavior compared to leading available alternatives. In short, the overall accuracy of the GHGP MBERs for this specific use case is broadly comparable to one of the leading broadly available capacity expansion models.
% Error | GHGP | GHGP Hourly (Flat) | GHGP Hourly (Gen) | Cambium Low-Case | Cambium Mid-Case | Cambium High-Case |
---|---|---|---|---|---|---|
Coal | -104 | -106 | -105 | 58 | 9 | -7 |
Gas | 123 | 126 | 122 | -54 | -31 | -16 |
Nuclear | -100 | -100 | -100 | -845 | -845 | -845 |
Oil | -103 | -100 | -100 | 648 | 683 | 644 |
Solar | -56 | -58 | -56 | 116 | -24 | -77 |
Wind | 39 | 38 | 39 | 204 | 117 | 38 |
Discussion
The GHGP MBER is intended to be used on an annual basis to estimate the causal effect on emissions from structural change driven by a change in (net) electricity reductions or increases. When added up over a period of many years and combined with an operating margin, it becomes an estimate of long-run marginal emissions.
Further research is needed in the rapidly growing field of empirical validation of long-run marginal emissions rates. However, a key advantage of this method for measuring long-run marginal emissions is that it is empirically observable each year. In this early benchmark test focusing on the ability of the method to successfully predict real-world actual change that has occurred based solely on data that was available before the change, the model appeared to have significant predictive power, in large part due to the fact that it does not even attempt to predict out years.
Analysts may wish to consider the use of this signal in long-run marginal emissions rate applications where observability is important and there is the ability to update data annually. Note this is distinct from applications in which it is necessary to estimate the total lifetime long-run marginal emissions of an intervention before the beginning of the project. For such applications, other methodologies are recommended.