Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 11730 (2026)
348
Metrics details
Accurate probabilistic solar photovoltaic (PV) power forecasting is essential for the reliable integration of solar energy into modern power grids. This study evaluates four uncertainty quantification methods for short-term PV forecasting: Adaptive Conformal Inference (ACI), Deep Quantile Regression (DQR), Bayesian Long Short-Term Memory (BLSTM), and CatBoost quantile regression. ACI is applied as a post-processing technique that adaptively adjusts prediction intervals based on recent forecast errors. We propose a novel modification to ACI in which the miscoverage parameter is reset at the start of each day to prevent the accumulation of calibration errors during nighttime periods when PV output is zero. This reset addresses the interval inflation commonly observed in standard ACI under strong diurnal variability, leading to more stable and reliable prediction intervals. Using a five-year dataset from Wroclaw University of Science and Technology, the modified ACI achieves the highest coverage (90.96%) with a mean interval width of 12.8% of peak power. BLSTM performs comparably with 83.32% coverage and 13.74% width. CatBoost yields the sharpest intervals (11.2%) but lower coverage (81.07%), while DQR provides the lowest coverage (79.48%) and the highest Winkler score. Although tested on a single site, the data-driven, model-agnostic nature of ACI supports generalization, and its independence from weather forecasts ensures robustness.
He Tracking Clean Energy Progress report of the International Energy Agency (IEA) for the year 2023 has noted that in the year 2022 compared to all other renewable energy sources including wind energy, solar PV generation witnessed an increase of 270 TWh reaching a high of about 1300 TWh1. This growth has in fact changed the tracking status of solar PV to “on track” from “more effort needed”. It is also mentioned that in order to reach the Net Zero Scenario by 2050 yearly additions to capacity should be about 3 times what was added in 2022 indicating further significant investments in this energy technology. Such accelerated adoption of solar PV is not without consequences, it would lead to several issues such as increased variability and uncertainty in power supply, challenges in grid stability, the need for advanced energy management strategies, voltage surges, producers participation in the electricity markets and problems related to management of balancing reserves2,3. A solution to the challenges mentioned above is to deploy a solar PV power forecasting system that would give information concerning the future power outputs of the concerned system which would aid decision making regarding the management of critical energy infrastructure and services4,5. Solar power forecasting methods can be classified based on numerous characteristics. Depending upon the forecast horizon they can be classified as ultrashort term methods that include forecasts ranging from mere seconds to minutes6, short-term forecasts that have a range from minutes to 48 h to 72 h ahead. This includes both the intra-day and day ahead forecasts7. Medium-term forecasts from few days to a week8 and Long-term forecasts from a week to as long as a few years ahead9. The methods can also be classified based on the type of algorithm used for producing the forecasts, it can be a statistical method such as an AutoRegressive Integrated Moving Average (ARIMA) model10 or a machine learning model such as the Convolutional Neural Network (CNN)11 model or a hybrid model combining the two. Based on whether the forecasting model produces a single point value of power as the forecast output or gives intervals of power as a forecast with a certain confidence level it can be classified into deterministic or probabilistic forecasting models12,13. Within the probabilistic class of forecasting models there is a differentiation between parametric and non-parametric models where parametric models assume a specific form for the underlying probability distribution of the data. These models typically require the estimation of a certain number of parameters from the data, which define the structure and behavior of the model14. This approach is advantageous when the data adheres well to the assumed distribution, as it allows for more straightforward interpretation and computation15. In contrast, non-parametric models do not assume any specific form for the distribution. Instead, they often rely on the data itself to model relationships and trends16. These models are particularly useful when there is limited knowledge about the underlying distribution of the data or when the data does not fit well into any known parametric form, providing flexibility and adaptability17. The above-mentioned information is also presented in Fig. 1. Recent research works on the topic of probabilistic short term solar power forecasting are as follows. A sophisticated Bayesian Optimization – Long Short-Term Memory (BO-LSTM) model integrated with time-frequency correlation mapping is employed to predict ultra-short-term solar power outputs in18. This model is rigorously evaluated against a variety of benchmark models, including Adam-LSTM, Sgd-LSTM, Rmsprop-LSTM, and Adadelta-LSTM, as well as Multi-Layer Perceptron (MLP) variants like Adam-mlp and Adagrad-mlp. Utilizing data collected over a year from a commercial solar PV station in North China, this method utilizes periodicities and variability intrinsic to solar power generation. The probabilistic forecasts are evaluated using the Power Interval Normalized Average Width (PINAW) error metric, where the BO-LSTM model demonstrates superior performance, reducing error margins compared to the benchmarks.
Classification of solar power forecasting methods.
The study19 employs a approach by integrating a Transformer-LUBE (Lower Upper Bound Estimation) model with advanced data imputation techniques, including XGBoost, Predictive Mean Matching (PMM), and bootstrapping, to enhance solar PV power forecasting. Using data from ten solar farms in Taiwan and enriched with Numerical Weather Predictions from the Taiwan Central Weather Bureau, this model quantifies uncertainty through Lower Upper Bound Estimation, enabling robust prediction intervals. When compared to other Artificial Intelligence (AI) models like Artificial Neural Networks (ANN), LSTM networks, Gated Recurrent Unit (GRU) networks, and XGBoost, this hybrid model demonstrates superior accuracy and reliability.
In20 is introduced Ensemble Conformalized Quantile Regression (EnCQR), a probabilistic forecasting method that combines Quantile Regression (QR) with Conformal Prediction (CP) to generate adaptive and valid prediction intervals (PIs). EnCQR enhances the robustness and sharpness of PIs by integrating ensemble learning into QR models, which allows it to adaptively adjust PI widths based on local data variability. The method was tested on five real-world datasets, including wind power generation, solar energy production, and electricity consumption. For the forecasting models, they employed LightGBM, Neural Networks (NN), and Gradient Boosted Decision Trees (GBDT) to implement quantile regression, and compared the results with standard CP, QR, and standalone ensemble methods. The evaluation results show that EnCQR consistently outperforms standard QR and CP approaches in terms of key metrics such as Prediction Interval Coverage Probability (PICP) and Prediction Interval Width (PIW). Specifically, EnCQR achieved more accurate and tighter PIs while maintaining valid coverage. The data sets used for evaluation were sourced from publicly available repositories on energy production and consumption, such as wind power and solar generation from regional energy grids. In terms of performance, EnCQR achieved a superior balance between reliability (PICP) and efficiency (PIW) across all tested datasets, highlighting its effectiveness for nonstationary and heteroscedastic time series.
A forecasting model using a hybrid neural network that integrates sub-models for parameter estimation with a Meta-Learner to optimize these estimates is described in21. This model specifically employs bounded Kumaraswamy distributions to handle the intrinsic variability and limits of renewable energy outputs. Tested against both standard parametric models and advanced non-parametric techniques, such as Quantile Regression and Mixture Density Networks, the proposed approach uses data from both wind and solar sources across diverse geographical settings, including a year’s worth of data from commercial wind farms and solar plants.
The paper22 investigates PV power prediction for a 2.680 kWp photovoltaic system installed in a mountainous region, using experimentally measured data collected at 10-second intervals between February 23 and March 28, 2023. The dataset includes irradiance, temperature, voltage, current, and power, captured with sensors integrated into an outdoor experimental setup. Using this data, the authors develop and compare 14 different predictive models, including deep learning models (LSTM, Modular Neural Network, Radial Basis Function Neural Network), classical machine learning models (Support Vector Regression, Decision Tree, Random Forest, Ridge Regression, Kernel Ridge Regression), and multiple variants of linear and quantile regression. Each model is evaluated across five forecasting intervals (10 s, 1 min, 30 min, 1 h, and 1 day). The results show that the Radial Basis Function Neural Network (RBFNN) achieves the best performance for most horizons particularly at 10-second, 1-minute, 30-minute, and 1-day intervals while the Kernel Ridge Regressor provides the lowest RMSE for the 1-hour horizon. Overall, the study demonstrates that nonlinear ANN-based models, especially RBFNN, offer superior predictive accuracy for PV systems operating under the highly variable outdoor conditions of mountainous regions.
In23 extensive regional solar PV output data, which includes numerical weather prediction (NWP) and historical output data, is leveraged to validate the proposed forecasting method. This method employs granule-based clustering (GC) to effectively segment and utilize this data, enhancing prediction intervals (PIs) for very short-term PV outputs. The direct optimization programming (DOP) further refines this approach by optimizing the overall performance cost function of the PIs. The results of this study have been compared with several benchmark forecasting models to demonstrate the effectiveness of the proposed methods. These benchmarks include traditional parametric models as well as various nonparametric approaches. The latter category includes extreme learning machine (ELM), quantile regression, and machine learning-based linear programming approaches.
According to the previously mentioned classifications for solar power forecasting methods this article presents a probabilistic forecasting method for short term solar PV power forecasting that is non-parametric and is based on Adaptive Conformal Inference. While uncertainty quantification is achieved through ACI, the underlying forecasting model is a deep learning-based LSTM stacked model. The contributions of the paper are as follows:
Development of a novel ACI procedure suitable for short term probabilistic PV power forecasts.
ACI as a tool for quantifying uncertainty in deep learning models such as the LSTM stacked model.
Comparison of the ACI performance against other state-of-the-art uncertainty quantification approaches such as the Deep Quantile Regression, Bayesian LSTM and a gradient boosting based CatBoost ensemble.
While ACI has been proposed as a general post-hoc uncertainty quantification method, existing formulations implicitly assume a continuous and smoothly evolving time series. Solar PV data, however, exhibits a strong diurnal discontinuity with long nighttime zero-generation periods, causing the adaptive miscoverage parameter αₜ to drift upward and inflate interval widths in standard ACI implementations. To address this limitation, we introduce a novel daily miscoverage-reset mechanism that reinitializes αₜ at the start of each day. This modification prevents error accumulation across diurnal boundaries and enables ACI to recalibrate adaptively to each day’s irradiance conditions. As a result, the modified ACI remains sensitive to intraday variability without carrying over irrelevant calibration artifacts from the preceding night. Empirically, this enhancement allows ACI to achieve the highest coverage (90.92%) while maintaining competitive interval sharpness, demonstrating that the proposed adjustment is essential for deploying ACI reliably in solar PV forecasting environments. The rest of the manuscript is divided into the following sections. Section II presents the LSTM stacked model and the uncertainty quantification tools investigated, which include the ACI, the DQR, the Bayesian LSTM and the CatBoost ensemble. Section III presents the data description and data preprocessing methods used. Section IV presents the evaluation metrics used to assess the performance of the forecasting models, Section V presents the results which is then followed by the conclusions and references.
LSTM models are quite popular in the literature for solar PV power forecasting due to their ability to capture temporal dependencies in time-series data, which is important given the variable nature of solar energy production24. These models possess memory cells that maintain information across long sequences, allowing for better modeling of the dynamic changes in solar power output influenced by environmental factors like cloud cover and temperature25,26. The structure of an LSTM cell is widely available in existing literature27 and will not be repeated in this manuscript. The architecture of the forecasting model used is the stacked LSTM version which is shown in Fig. 2 for the ACI uncertainty quantification method.
Stacked LSTM architecture for ACI uncertainty quantification.
The architecture of the stacked model for the DQR uncertainty method is similar to that shown in Fig. 2. The difference in architecture arises due to the fact that for ACI the dense layer has only 1 neuron because it is a form of post processing uncertainty quantification method. For the DQR the dense layer of the architecture has 3 neurons since it directly predicts the quantiles [5th, 50th and 95th ]. Apart from differences in the dense layer the rest of the architecture for both uncertainty quantification approaches are identical. It includes an input layer which prepares the input data in a 3-D format of total number of samples, number of time steps and input variables or features. The input layer is followed by two layers of LSTM cells. This is followed by the respective dense layers and the output layer which simply presents the output based on the dense layer’s configuration.
To define clearly the mathematical model. The following variables are used. The historical dataset is denoted as:
and the goal is to forecast the subsequent (:left({T}_{1}right)) observations for:
The ACI procedure begins with a dataset comprising (:{T}_{0}) initial observations (:left({x}_{1},{y}_{1}right)),…,(:left({x}_{{T}_{0}},{y}_{{T}_{0}}right)) in (:{mathbb{R}}^{d}times:mathbb{:}mathbb{R}). The goal is to forecast and construct prediction intervals for the subsequent (:{T}_{1}) observations28. For each forecast step (:t) within the range (:left[{T}_{0}+:1,:{T}_{0}+:{T}_{1}right]), the previously known values (:{y}_{t:-:{T}_{0}}:),….,(:{y}_{t-1}) are used to establish the forecast intervals. To form an interval (:{C}_{a}) at a specific miscoverage rate (:alpha::in :left[text{0,1}right]), condition (1) must hold defining (:{C}_{a}) as a valid interval.
To build the interval, the following steps must be followed. The data up to (:{T}_{0}) is split into random training ((:{T}_{rt}:)) and calibration sets ((:Ca{l}_{t})). Then a predictive model (µ’) such as the stacked LSTM model is trained on (:{T}_{rt}) and its performance is evaluated on (:Ca{l}_{t}). The performance is measured by a metric called the conformity score ((:{S}_{cal})) defined by (4) where (:i) represents values in the calibration set, (:mu^{prime}left({x}_{i}right)) is the predicted value and (:{y}_{i}) is the actual value.
The (1 – (:{alpha:}_{t})) quantile of the conformity scores (:{{Q}_{1-:alpha:}}_{t}left({S}_{ca{l}_{t}}right)) is computed to determine the interval size centered around the prediction for each step28. Since this procedure is adaptive, in the above-mentioned process the miscoverage rate (:{alpha:}_{t}) is dynamically updated at each step using a learning rate (:gamma:) which influences the interval size according to (5) and (6).
Flowchart representation of applying the modified ACI procedure.
Figure 3 Flowchart representation of applying the modified ACI procedure.
If the actual value (:{y}_{t}) falls outside the predicted interval (:{C}_{{alpha:}_{t}}) then (:{alpha:}_{t+1}) is adjusted to be less than or equal to (:{alpha:}_{t}), increasing the interval size for subsequent forecasts, and vice versa. Given the non-continuous nature of solar PV power data, we propose a modification where the miscoverage rate (:alpha:) is reset at the end of each day to a predetermined level. This adjustment addresses the impractical expansion of prediction intervals that occurs when (:alpha:) exceeds 1 this is described by (7). This scenario often occurs overnight when solar PV panels are inactive, and the correlation between the output power at the end of one day and the beginning of the next is weak. (:{mathcal{T}}_{EOD}) represents the end of each day.
The above implementation is further explained visually through a flowchart in Fig. 3. It should also be noted that the need for this modification is not dataset specific but arises from a fundamental characteristic of solar PV generation: the globally consistent diurnal cycle, during which nighttime output drops to zero and temporal continuity between days is inherently weak. Such structural discontinuities occur in PV installations worldwide, meaning that the proposed daily miscoverage-reset mechanism is broadly applicable to any solar forecasting context employing ACI.
Unlike ACI, which is a post-processing uncertainty quantification method applied after the primary model generates point predictions, the DQR method integrates uncertainty quantification directly into the forecasting model29. In DQR, the model is trained to predict specific quantiles of the target distribution as part of the forecasting process. This direct approach allows DQR to simultaneously produce forecasts and quantify uncertainty by generating prediction intervals, between quantiles such as the 5th, 50th, and 95th quantile30. This integration enables the model to capture the inherent variability and uncertainty of the data more naturally during the forecasting process, rather than relying on a separate post-hoc adjustment as with ACI.
The DQR model is described in (8) where (:widehat{{y}_{q}}) is the predicted quantile at level (:q) and as mentioned above take the values corresponding to the 5th, 50th, and 95th quantiles. (:x) is the input data and (:{theta:}_{q}) represents the model parameters specific to (:q). These are the weights and biases values that are determined during model training.
The DQR model is trained by minimizing a quantile loss function, also known as the pinball loss, which is designed to focus on the accuracy of the quantile predictions31. The pinball loss for a given quantile is calculated according to (9). Where (:y) is the actual observed value. This loss function is non-symmetric and penalizes underestimation and overestimation differently, depending on the quantile being predicted. For example, for the 5th quantile, the loss function penalizes underestimation more heavily because the model should be conservative, ensuring that only 5% of the actual values fall below the predicted value. Conversely, for the 95th quantile, overestimation is penalized more, as the model should capture the higher end of the distribution, ensuring that 95% of the actual values are below the predicted value.
The total loss function for the model, when predicting multiple quantiles, is the sum of the pinball losses across all quantiles calculated according to (10).
In order to improve the coverage provided by the DQR algorithm a continuous adaptive quantile adjustment procedure is used in this study. This approach dynamically adjusts the predicted quantiles based on their performance relative to the actual values in the previous time step. Specifically, after each forecast, the predicted lower and upper quantiles are scaled using adjustment factors, which either widen or tighten the prediction interval based on how well the current interval captures the actual value. The adjusted lower and upper quantiles are computed by multiplying the predicted quantiles by their respective adjustment factors α and β, as shown in the Eqs. (11) and (12).
The Bayesian LSTM model inherently incorporates uncertainty estimation into the model during the training and inference stages. By placing probabilistic priors over the weights and biases of the network, Bayesian LSTM enables the direct estimation of predictive uncertainty alongside the primary forecasts32. The approach presented in this paper leverages Monte Carlo sampling with dropout during inference to approximate the posterior distributions of the model’s parameter32. As a result, it provides prediction intervals that naturally capture the variability and uncertainty present in the data, eliminating the need for external adjustment methods. This integrated approach ensures that the model learns to account for uncertainty throughout its operation, yielding probabilistic forecasts that are both robust and computationally efficient33.
The architecture of the model in this study is built on the one shown in Fig. 2 where Monte Carlo (MC) dropout layers are introduced during both the training and inference phases. These layers enable uncertainty quantification by approximating Bayesian inference through stochastic forward passes34. The model comprises two LSTM layers, followed by a dense output layer. Dropout layers are positioned after each LSTM layer, with a dropout rate p of 0.2, introducing stochasticity to the weight connections. The Bayesian aspect is achieved by enabling the dropout mechanism during inference, thus simulating multiple realizations of the model and generating a distribution of predictions. Mathematically, the output (:{h}_{t}) of the LSTM layer at time step (:t) is computed as in (13):
Where, (:{x}_{t})is the input at tine (:t), (:{h}_{t-1}) is the hidden state from the previous time step, (:{W}_{h}) and (:{U}_{h}) are weight matrices, (:{b}_{h}) is the bias vector and (:sigma:) is the activation function. The inclusion of dropout applies a masking vector m sampled from a Bernoulli distribution defined by (14):
where (:p) is the dropout rate. During forward passes, the modified output becomes as shown in (15):
where (:odot:) represents element-wise multiplication. To estimate uncertainty, MC sampling is employed during inference by performing N stochastic forward passes with dropout activated. This generates a distribution of predictions from which the mean prediction and predictive intervals are derived. The predictive intervals are computed as the 5th and 95th percentiles of the distribution according to (16):
This approach ensures that the model captures both the central tendency and the variability of the target distribution, making it well-suited for probabilistic forecasting35.
CatBoost or Categorical Boosting is a gradient boosting algorithm designed to handle both numerical and categorical features efficiently. Unlike traditional gradient boosting algorithms, CatBoost incorporates ordered boosting and target-based statistics (TBS) to mitigate prediction shifts and reduce overfitting36. In this paper separate CatBoost models are being trained for lower and upper quantiles, as well as the median prediction. Each model was trained using the same input features but with distinct quantile loss functions. The algorithm minimizes the quantile loss function as shown in (17)37.
where, (:{y}_{i}) is the actual value, (:widehat{{y}_{i}}) is the predicted value, (:alpha:) is the quantile level and (:+) in the suffix denotes the positive part of the expression. CatBoost builds oblivious decision trees, which split the data based on the same condition across all branches at a given depth37. The function at each iteration t is represented as shown in (15). Where, (:{F}_{t}left(xright)) is the model prediction at iteration (:t), (:eta:) is the learning rate and (:{h}_{t}left(xright)) is the weaker learner that approximates the residuals.
The weak learner (:{h}_{t}left(xright)) is chosen to minimize the residual error between the true target and the current prediction according to (19).
A key feature of CatBoost is the use of ordered boosting to prevent target leakage. In ordered boosting, the training examples are randomly permuted, and each example’s prediction is computed using a model trained on the preceding data points only. This approach ensures that no information from the current or future samples influences the prediction. It also replaces categorical features with their target-based statistics (TBS)37 as shown below
where, (:mu:) is the prior mean of the target, (:beta:) controls the regularization strength and (:{1}_{{x}_{j}={x}_{i}}) is an indicator function.
The dataset utilized in this study was gathered from rooftop solar PV panels mounted on top of the building belonging to the Department of Electrical Engineering and Electrotechnology Fundamentals, Wroclaw University of Science and Technology. The setup comprises three distinct types of solar panels, each with a peak power output of 5 kW: monocrystalline, polycrystalline, and CIGS (Copper Indium Gallium Selenide). While the focus of this study is on data from the monocrystalline panels, the machine learning model developed using historical weather data and power output can be generalized and applied to the other panel types as well. The key parameters measured include solar irradiation (W/m²), module surface temperature (°C), ambient air temperature (°C), wind speed (m/s), and power output (kW).
While the preparation of the input data for model training varies for both the ACI and DQR procedures, in this study the features that are used apart from time related features are the historical output power data and irradiation.
Solar PV panel output power visualization.
Figure 4 illustrates the trends in the power output data over a period of five years, from January 2014 to January 2019. The dataset is split into a training and validation set, with 80% of the data used for training the model and the remaining 20% reserved for validation. The validation set, completely unseen by the model during training, is important for assessing the model’s generalization ability and forecast accuracy.
This study has two types of input features. The first one as already described is the historical power output data from the solar PV panels. The second one is time related features. They are important because they capture the natural cyclical patterns that affect solar power generation. Time-related features such as the hour of the day, day of the week, and month of the year are crucial because solar energy production is inherently tied to the Earth’s rotation and orbit38. For example, the hour of the day reflects the daily solar cycle, with power generation typically peaking around midday and dropping to zero during the night. The month of the year captures seasonal variations, as longer daylight hours and higher sun angles during summer months lead to increased solar output, while shorter days and lower angles during winter reduce it. Failing to account for its periodic nature can mislead machine learning models39.
The cyclical features are modelled into two components using the sine and cosine functions40. This method ensures that the values are mapped onto a continuous circular space. For a feature (:Y), the cyclic encoding formula is given by (21) and (22) where, (:Y) is the cyclical time feature such as hour of the day or number of the month, max_value is the maximum possible value which is 24 for hours and 12 for months and (:{Y}_{sin}:) and (:{Y}_{cos}:) together represent the cyclical nature of (:Y).
The input features considered in this research vary in terms of scale, distribution, and measurement units. To prevent the LSTM models from becoming too sensitive to these differences, and to prevent issues such as large weight values during training, it is essential to apply normalization techniques. Normalizing the input data helps to bring all features to a consistent range, improving the stability and performance of the model. For this reason, min-max normalization, as expressed in Eq. (23), is applied to rescale the variables. Where, (:{x}_{i}) represents every point value of the features considered, (:minleft(xright)) represents the smallest value of the feature and (:maxleft(xright)) the largest value.
Rolling window quantiles involves calculating quantiles over a moving subset of data points (known as a window) within a time series. As the window slides over the data, quantiles are recalculated based on the values in that window, capturing local variability within the dataset. This approach provides information as to how quantile values change dynamically over time and is a crucial step involved in implementing DQR. In this study the window size chosen is 5 considering the past 5 h of values. The algorithm to implement the same is shown below:
Rolling Window Quantile Algorithm.
Since ACI is a post-processing uncertainty quantification method the data preprocessing steps are the same as that for a point forecasting LSTM model. In this case it is necessary to reshape the input data in a three-dimensional format of number of time samples, number of timesteps and the number of features. This format is also produced by means of a rolling window that slides over the entire input data reshaping it. Algorithm 2 describes this process.
Rolling Window Algorithm.
We evaluate the prediction intervals using four complementary metrics. Coverage assesses reliability by measuring how often true values fall within the predicted intervals. Mean Interval Width (MIW) reflects sharpness, with narrower intervals indicating greater precision. The Winkler Score combines both reliability and sharpness, penalizing intervals that are either too wide or fail to cover the true value. Finally, CRPS offers a holistic view of the probabilistic forecast quality by comparing the full predictive distribution to the observed outcomes. Validity is calculated as the percentage of time steps where the actual values fall within the predicted lower and upper quantiles. The formula for the same is shown in (24) where 1 is an indicator which equals 1 if the actual value lies between the predicted intervals and 0 otherwise, (:N) is the total number of predictions, (:actua{l}_{i}) is the actual value at time step (:i), (:{text{Q}}_{lower_i}) and (:{text{Q}}_{upper_i}) represent the lower and upper interval predictions at time step (:i).
MIW as described by (25) indicates the breadth of forecast prediction intervals. Narrow intervals suggest greater precision but risk missing the true values. Conversely, wider intervals are more likely to capture actual outcomes but reduce the forecast’s precision.
MIW reflects how wide or narrow the forecast prediction intervals are. Narrower intervals indicate higher precision, but if the intervals are too narrow, they might fail to capture the actual values. Wider intervals, on the other hand, may ensure coverage of the actual values but lower the precision.
The third evaluation metric is the Continuous Ranked Probability Score (CRPS) which is a scoring rule that measures the accuracy of probabilistic forecasts by evaluating the difference between the cumulative distribution function (CDF) of the forecast and the actual observed value. Unlike deterministic metrics such as Mean Absolute Error (MAE), CRPS generalizes these to probabilistic settings by comparing the entire predicted distribution with the actual observation. This metric not only assesses the closeness of the forecast to the actual value but also incorporates the sharpness of the predictive distribution. It is described by (26) where, (:Fleft(zright)) is the predicted cumulative distribution of the forecast, (:y) is the actual observation and (:{1}_{{zge:y}}) is the Heaviside function which equals 1 if (:zge:y) and 0 otherwise. A lower CRPS value indicates better probabilistic forecasts as it signifies that the predicted distribution aligns more closely with the observed data41.
The final metric used in this study is the Winkler score, which evaluates the quality of prediction intervals by jointly accounting for their width and their ability to capture the true observation. Unlike metrics that consider only interval sharpness or only coverage, the Winkler score penalizes intervals that are too wide as well as those that fail to contain the actual value42. For a significance level α, the score is defined as in (27), where ‘width’ denotes the difference between the upper and lower bounds, and (:{y}_{text{actual}}) is the observed value. A lower Winkler score indicates a more efficient interval one that is both narrow and reliably covers the true outcome making it a widely used measure for comparing probabilistic forecasting models.
The results for this study are obtained over a validation set that is one year long and entirely unseen by the forecasting algorithms during training. Table 1 summarizes the comparative performance of the four probabilistic forecasting methods. In terms of coverage probability, ACI achieves the highest at 90.96%, significantly outperforming the other methods. Bayesian LSTM follows with 83.32%, while CatBoost and DQR trail with 81.07% and 79.48%, respectively. The superior coverage of ACI underscores its ability to dynamically adapt prediction intervals to varying forecast uncertainty, ensuring that actual values are consistently captured.
Bayesian LSTM, though not adaptive in the same way as ACI, benefits from Monte Carlo dropout, allowing it to generate probabilistic forecasts through multiple stochastic forward passes. This mechanism supports the estimation of uncertainty by capturing a spread of plausible outcomes. However, its performance in terms of coverage (83.32%) remains lower than ACI’s 90.96%, indicating that some true values fall outside the prediction intervals. DQR and CatBoost, which are based on fixed quantile regression, offer less flexibility in adjusting intervals dynamically, resulting in even lower coverage levels of 79.48% and 81.07%, respectively. These figures suggest that static quantile estimators may struggle to accommodate changing data uncertainty over time.
With respect to interval sharpness, as measured by the MIW, DQR produces the widest intervals at 0.700 kW, followed closely by Bayesian LSTM at 0.687 kW. ACI yields narrower intervals at 0.620 kW despite achieving the best coverage, indicating its ability to maintain efficiency while being reliable. CatBoost achieves the sharpest intervals at 0.560 kW, demonstrating its strength in concentrating predictions more tightly around expected values, albeit with a trade-off in coverage performance.
In terms of CRPS, which jointly assesses the calibration and sharpness of predictive distributions, DQR shows the weakest performance with a score of 0.775, indicating poorly calibrated intervals that are not only wide but also misaligned with the actual values. ACI, while excelling in coverage, has a higher CRPS of 0.539, reflecting its conservative nature in interval estimation. Bayesian LSTM fares better with a CRPS of 0.463, balancing uncertainty estimation with predictive sharpness. CatBoost attains the lowest CRPS of 0.360, suggesting that among the models, it provides the most accurate and sharp probabilistic forecasts in terms of distributional calibration.
Winkler scores provide an additional measure by penalizing both coverage failures and excessive interval width. CatBoost again outperforms other methods with the lowest Winkler score of 1.017, indicating the most efficient uncertainty quantification. ACI and Bayesian LSTM follow with Winkler scores of 1.705 and 1.750, respectively. Despite their higher coverage, these scores reveal a slight inefficiency in their interval construction, especially for ACI, which favors conservative bounds. DQR performs the worst with a Winkler score of 2.008, further reinforcing its inadequacy in balancing coverage and sharpness.
Overall, the comparative evaluation demonstrates clear performance differences between the four uncertainty quantification methods in terms of both reliability and interval sharpness. The modified ACI method consistently provides the most balanced behavior, achieving the highest empirical coverage while maintaining relatively narrow intervals, which confirms the effectiveness of the daily miscoverage reset in stabilizing interval widths under strong diurnal variability. Bayesian LSTM also delivers competitive coverage but produces wider intervals, reflecting the inherent variability introduced by Monte Carlo sampling. CatBoost achieves the sharpest intervals across most horizons, but at the cost of noticeably lower coverage, indicating a tendency to underestimate predictive uncertainty. DQR shows the weakest performance, with both low coverage and high Winkler scores, suggesting insufficient flexibility to capture distributional asymmetry in rapidly changing PV conditions. Taken together, these results highlight that ACI provides the most robust trade-off between reliability and efficiency, making it a promising and computationally practical solution for operational PV forecasting scenarios where consistent uncertainty calibration is essential.
In addition to forecast accuracy, computational efficiency is an important consideration for operational deployment. Table 1 summarizes the total training and evaluation times of the four approaches. Bayesian LSTM is the most computationally demanding at 7 min 11 s, primarily due to the repeated stochastic forward passes required for Monte Carlo dropout sampling during inference. CatBoost is the fastest method, completing training in 1 min 15 s because of its optimized tree-based boosting architecture and lightweight quantile objective. ACI and DQR exhibit comparable runtimes of 4 min 26 s and 4 min 40 s, respectively, with ACI adding only negligible overhead beyond the base LSTM model. All experiments were conducted on a Lenovo laptop equipped with an AMD Ryzen 7 5700U CPU (16 logical cores, 1.8 GHz), 16 GB RAM, and integrated Radeon Graphics, running Windows 11 Home (64-bit).
Figure 5 presents four days randomly selected from the validation set of how the Adaptive Conformal Inference (ACI) method constructs prediction intervals for different days across the test set. These days were selected to illustrate the behavior of ACI under varying solar conditions, given that ACI achieved the highest coverage among all methods while also having moderate sharpness. The results show the trade-off between coverage and interval width across different solar generation patterns.
Forecast interval performance.
On February 24th, 2018, a day characterized by low solar generation, ACI produces relatively narrow intervals throughout most of the day, particularly during the early morning and evening hours when the generation is close to zero. However, during the peak generation period, the intervals widen significantly, with a noticeable overestimation of the peak value. Despite this, the actual values remain within the constructed intervals except during hours 8:00 and 10:00. For June 15th, 2018, a day with consistent solar availability, the intervals remain relatively well-calibrated around the actual values. The intervals are slightly wider around the midday peak, accommodating the increased variability in solar power generation. Importantly, ACI successfully covers the actual values throughout the entire day. On November 5th, 2018, which exhibits a more variable solar profile, ACI adapts by constructing intervals that capture most of the fluctuations in generation. While the coverage remains high, some intervals appear slightly wider than necessary, reflecting ACI’s trade-off between capturing variability and maintaining sharpness. Lastly, July 1st, 2018, represents a more complex scenario where solar generation fluctuates more erratically. The constructed intervals widen at various points in the day to accommodate uncertainty, but there are still instances of slight under-coverage which can be noticed at hours 11:00 and 14:00. This suggests that while ACI maintains high coverage on average, it may struggle to construct consistently sharp intervals when the underlying data exhibits irregular fluctuations.
This study assessed the performance of four uncertainty quantification methods Adaptive Conformal Inference (ACI), Deep Quantile Regression (DQR), Bayesian LSTM, and CatBoost for short-term probabilistic solar PV forecasting. Using a five-year dataset, we evaluated reliability, sharpness, and practical applicability, with particular emphasis on the proposed modification to ACI involving a daily miscoverage reset. The results reveal distinct strengths and limitations across the methods and highlight the importance of adaptive uncertainty calibration for PV systems characterized by strong diurnal variability.
The main outcomes can be summarized as follows:
ACI achieved the highest coverage (90.96%) while maintaining reasonably narrow intervals, demonstrating the effectiveness of the daily miscoverage-reset mechanism.
Bayesian LSTM showed balanced performance, providing moderate coverage and interval width, benefiting from Monte Carlo dropout–based uncertainty estimation.
CatBoost produced the sharpest intervals, but at the cost of lower coverage, indicating a tendency to underestimate predictive uncertainty.
DQR exhibited the weakest results, with low coverage and high Winkler scores, suggesting insufficient flexibility for rapidly changing PV conditions.
The modified ACI method provided the best trade-off between reliability and efficiency for operational short-term PV forecasting.
Overall, the modified ACI framework emerges as the most robust and adaptable approach for practical PV forecasting applications, offering consistently reliable uncertainty quantification under real-world diurnal variability conditions. Future research may extend the present study in several directions. First, the adaptive nature of ACI can be enhanced by exploring dynamic or data driven strategies for tuning the learning rate (:gamma:)and the baseline miscoverage level (:{alpha:}_{0}), potentially improving interval sharpness without sacrificing coverage. Second, integrating ACI with ensemble learning or Bayesian frameworks may yield more robust uncertainty estimates, particularly under highly variable weather conditions. Third, evaluating the modified ACI procedure on diverse PV installations across different climatic regions would help further validate its generalizability. Finally, investigating hybrid architectures that combine conformal methods with advanced deep learning models such as transformer-based or state-space forecasting networks may offer additional gains in reliability and computational efficiency for next-generation probabilistic PV forecasting systems.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
IEA. Tracking Clean Energy Progress. at (2023). https://www.iea.org/reports/tracking-clean-energy-progress-2023.
Yang, B. et al. Classification and summarization of solar irradiance and power forecasting methods: A thorough review. CSEE J. Power Energy Syst. https://doi.org/10.17775/CSEEJPES.2020.04930 (2021).
Article Google Scholar
Visser, L., AlSkaif, T. & van Sark, W. Operational day-ahead solar power forecasting for aggregated PV systems with a varying spatial distribution. Renew. Energy. 183, 267–282 (2022).
Article Google Scholar
Massidda, L., Bettio, F. & Marrocu, M. Probabilistic day-ahead prediction of PV generation. A comparative analysis of forecasting methodologies and of the factors influencing accuracy. Sol Energy. 271, 112422 (2024).
Article Google Scholar
Abdelsattar, M. Mountain gazelle optimizer for standalone hybrid power system design incorporating a type of incentive-based strategies. Neural Comput. Appl. 36, 6839–6853 (2024).
Article Google Scholar
Kim, B., Suh, D. & Solar, P. V. Generation Prediction Based on Multisource Data Using ROI and Surrounding Area. IEEE Trans. Geosci. Remote Sens. 62, 1–11 (2024).
Google Scholar
Ma, X., Du, H., Wang, K., Jia, R. & Wang, S. An efficient QR-BiMGM model for probabilistic PV power forecasting. Energy Rep. 8, 12534–12551 (2022).
Article Google Scholar
Weyll, A. L. C. et al. Medium-term forecasting of global horizontal solar radiation in Brazil using machine learning-based methods. Energy 300, 131549 (2024).
Article Google Scholar
Asiedu, S. T., Nyarko, F. K. A., Boahen, S., Effah, F. B. & Asaaga, B. A. Machine learning forecasting of solar PV production using single and hybrid models over different time horizons. Heliyon 10, e28898 (2024).
Article PubMed PubMed Central Google Scholar
Perera, M., De Hoog, J., Bandara, K. & Halgamuge, S. Multi-resolution, multi-horizon distributed solar PV power forecasting with forecast combinations. Expert Syst. Appl. 205, 117690 (2022).
Article Google Scholar
Mondal, R., Roy, S. K. & Giri, C. Solar power forecasting using domain knowledge. Energy 302, 131709 (2024).
Article Google Scholar
Huang, H. H. & Huang, Y. H. Probabilistic forecasting of regional solar power incorporating weather pattern diversity. Energy Rep. 11, 1711–1722 (2024).
Article Google Scholar
Hoang, K. T., Thilker, C. A., Knudsen, B. R. & Imsland, L. Probabilistic Forecasting-Based Stochastic Nonlinear Model Predictive Control for Power Systems With Intermittent Renewables and Energy Storage. IEEE Trans. Power Syst. 39, 5522–5534 (2024).
Article ADS Google Scholar
Mayer, M. J. & Yang, D. Probabilistic photovoltaic power forecasting using a calibrated ensemble of model chains. Renew. Sustain. Energy Rev. 168, 112821 (2022).
Article Google Scholar
Yang, D. Reconciling solar forecasts: Probabilistic forecast reconciliation in a nonparametric framework. Sol Energy. 210, 49–58 (2020).
Article ADS Google Scholar
Li, Q., Xu, Y., Chew, B. S. H., Ding, H. & Zhao, G. An Integrated Missing-Data Tolerant Model for Probabilistic PV Power Generation Forecasting. IEEE Trans. Power Syst. 37, 4447–4459 (2022).
Article ADS Google Scholar
Ramakrishna, R., Scaglione, A., Vittal, V., Dall’Anese, E. & Bernstein, A. A Model for Joint Probabilistic Forecast of Solar Photovoltaic Power and Outdoor Temperature. IEEE Trans. Signal. Process. 67, 6368–6383 (2019).
Article ADS Google Scholar
Shi, J. et al. Bayesian Optimization – LSTM Modeling and Time Frequency Correlation Mapping Based Probabilistic Forecasting of Ultra-short-term Photovoltaic Power Outputs. IEEE Trans. Ind. Appl. 60, 2422–2430 (2023).
Article Google Scholar
Phan, Q. T., Wu, Y. K. & Phan, Q. D. Enhancing One-Day-Ahead Probabilistic Solar Power Forecast With a Hybrid Transformer-LUBE Model and Missing Data Imputation. IEEE Trans. Ind. Appl. 60, 1396–1408 (2024).
Article Google Scholar
Jensen, V., Bianchi, F. M. & Anfinsen, S. N. Ensemble Conformalized Quantile Regression for Probabilistic Time Series Forecasting. IEEE Trans. Neural Networks Learn. Syst. 35, 9014–9025 (2024).
Article Google Scholar
Konstantinou, T. & Hatziargyriou, N. Day-Ahead Parametric Probabilistic Forecasting of Wind and Solar Power Generation Using Bounded Probability Distributions and Hybrid Neural Networks. IEEE Trans. Sustain. Energy. 14, 2109–2120 (2023).
Article ADS Google Scholar
Yadav, A. K., Khargotra, R., Lee, D., Kumar, R. & Singh, T. Novel applications of various neural network models for prediction of photovoltaic system power under outdoor condition of mountainous region. Sustain. Energy Grids Networks. 38, 101318 (2024).
Article Google Scholar
Sun, Y. et al. Nonparametric Probabilistic Prediction of Regional PV Outputs Based on Granule-based Clustering and Direct Optimization Programming. J. Mod. Power Syst. Clean. Energy. 11, 1450–1461 (2023).
Article Google Scholar
Ying, C. et al. Deep learning for renewable energy forecasting: A taxonomy, and systematic literature review. J. Clean. Prod. 384, 135414 (2023).
Article Google Scholar
Hossain, M. S. & Mahmood, H. Short-Term Photovoltaic Power Forecasting Using an LSTM Neural Network and Synthetic Weather Forecast. IEEE Access. 8, 172524–172533 (2020).
Article Google Scholar
Abdel-Nasser, M. & Mahmoud, K. Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput. Appl. 31, 2727–2740 (2019).
Article Google Scholar
Zaffran, M., Dieuleveut, A., Féron, O., Goude, Y. & Josse, J. Adaptive Conformal Predictions for Time Series. at (2022). http://arxiv.org/abs/2202.07282.
Gibbs, I. & Candès, E. Adaptive Conformal Inference Under Distribution Shift. at (2021). http://arxiv.org/abs/2106.00170.
Alcántara, A., Galván, I. M. & Aler, R. Deep neural networks for the quantile estimation of regional renewable energy production. Appl. Intell. 53, 8318–8353 (2023).
Article Google Scholar
Tuyen, N. D., Thanh, N. T., Huu, V. X. S. & Fujita, G. A combination of novel hybrid deep learning model and quantile regression for short-term deterministic and probabilistic PV maximum power forecasting. IET Renew. Power Gener. 17, 794–813 (2023).
Article Google Scholar
van der Meer, D. W., Widén, J. & Munkhammar, J. Review on probabilistic forecasting of photovoltaic power production and electricity consumption. Renew. Sustain. Energy Rev. 81, 1484–1512 (2018).
Article Google Scholar
Xu, L., Hu, M. & Fan, C. Probabilistic electrical load forecasting for buildings using Bayesian deep neural networks. J. Build. Eng. 46, 103853 (2022).
Article Google Scholar
Panamtash, H., Zhou, Q., Hong, T., Qu, Z. & Davis, K. O. A copula-based Bayesian method for probabilistic solar power forecasting. Sol Energy. 196, 336–345 (2020).
Article ADS Google Scholar
Kummaraka, U. & Srisuradetchai, P. Time-Series Interval Forecasting with Dual-Output Monte Carlo Dropout: A Case Study on Durian Exports. Forecasting 6, 616–636 (2024).
Article Google Scholar
Quilty, J. et al. Bayesian extreme learning machines for hydrological prediction uncertainty. J. Hydrol. 626, 130138 (2023).
Article Google Scholar
Panigrahi, R., Patne, N. R., Vardhan, S., Khedkar, M. & B. V. & Short-term load analysis and forecasting using stochastic approach considering pandemic effects. Electr. Eng. 106, 3097–3108 (2024).
Article Google Scholar
Zhang, H., Jia, R., Du, H., Liang, Y. & Li, J. Short-term interval prediction of PV power based on quantile regression-stacking model and tree-structured parzen estimator optimization algorithm. Front. Energy Res. 11, (2023).
Khan, Z. A., Hussain, T. & Baik, S. W. Dual stream network with attention mechanism for photovoltaic power forecasting. Appl. Energy. 338, 120916 (2023).
Article Google Scholar
Xu, H., Hu, F., Liang, X., Zhao, G. & Abugunmi, M. A framework for electricity load forecasting based on attention mechanism time series depthwise separable convolutional neural network. Energy 299, 131258 (2024).
Article Google Scholar
Zhu, T., Guo, Y., Li, Z. & Wang, C. Solar Radiation Prediction Based on Convolution Neural Network and Long Short-Term Memory. Energies 14, 8498 (2021).
Article Google Scholar
Thorey, J., Mallet, V. & Baudin, P. Online learning with the Continuous Ranked Probability Score for ensemble forecasting. Q. J. R Meteorol. Soc. 143, 521–529 (2017).
Article ADS Google Scholar
LI, G. et al. A New Wind Speed Evaluation Method Based on Pinball Loss and Winkler Score. Adv. Electr. Comput. Eng. 22, 11–18 (2022).
Article Google Scholar
Download references
Vishnu Suresh discloses support for the research of this work from Funder National Science Centre (NCN), Poland. Grant Number: 2022/06/X/ST8/00393.
Open access funding provided by Vellore Institute of Technology.
Faculty of Electrical Engineering, Wroclaw University of Science and Technology, Wroclaw, 50-370, Poland
Vishnu Suresh
School of Electrical Engineering, Vellore Institute of Technology, Chennai, 600127, India
B. Sri Revathi
School of Aeronautics and Astronautics, Zhejiang university, Hangzhou, 310000, Zhejiang, China
Josep M. Guerrero
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
**Vishnu Suresh: ** Conceptualization, Data curation, formal analysis, funding acquisition, investigation, methodology, project administration, software, visualization, writing – original draft and writing – review & editing. **B. Sri Revathi: ** Software, supervision and validation. **Josep M. Guerrero: ** Funding acquisition and supervision.
Correspondence to B. Sri Revathi.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Suresh, V., Revathi, B.S. & Guerrero, J.M. A non-parametric adaptive conformal inference based probabilistic hour-ahead solar PV power forecasting method. Sci Rep 16, 11730 (2026). https://doi.org/10.1038/s41598-026-40911-x
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-40911-x
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.