Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 12915 (2026)
1837
Metrics details
With the rapid penetration of photovoltaic (PV) generation into modern power grids, accurate and robust ultra-short-term PV power forecasting is increasingly important for real-time dispatch and frequency regulation. However, PV power series are volatile, nonlinear, and uncertain at short time scales, challenging conventional methods. This paper proposes a hybrid ultra-short-term forecasting framework that integrates secondary decomposition with advanced learning models. First, key features are screened and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) decomposes PV power into intrinsic mode functions (IMFs). Sample entropy quantifies IMF complexity, and K-means clusters IMFs into high- and low-frequency components. High-frequency components are further decomposed by Black-winged Kite Algorithm (BKA)-Variational Mode Decomposition (VMD) to enhance stationarity and reduce manual parameter tuning. The resulting high-frequency sub-signals are predicted using Online Kernel Extreme Learning Machine (OKELM), while low-frequency components are modeled by a Convolutional Neural Network (CNN)-Echo State Network (ESN) to capture spatiotemporal patterns. Final ultra-short-term forecasts are obtained via additive reconstruction. Experiments on datasets from the Ningxia PV station (China) and the Desert Knowledge Australia (DKA) Solar Energy Centre achieve (R^2) values of 99.6987% and 99.0635% in comparative and validation experiments, respectively, demonstrating high accuracy across different geographic locations and seasons. Improved PV power forecasting reduces uncertainty, supports grid stability, enables more efficient dispatch and reserve scheduling, and lowers operating costs and curtailment.
Photovoltaic (PV) power generation systems are direct conversion mechanisms that transform sunlight into electricity without relying on any mechanical or mobile device. They harness solar energy, an inexhaustible resource, making them sustainable and environmentally friendly energy solutions. One of the key advantages of PV systems is their long service life, coupled with minimal maintenance requirements, ensuring consistent and reliable energy production over extended periods1. However, PV power generation is significantly influenced by meteorological factors. The complexity and variability of these factors result in strong intermittency and volatility in PV power output. These characteristics pose challenges in system stability, power generation forecasting and planning, and power management when PV is operated on a large-scale grid-connected basis. Accurate forecasting for power enables the power dispatch department to formulate scheduling plans, ensuring the safe and stable operation of the power system. Therefore, enhancing research on precise ultra-short-term PV power forecasting is crucial for maintaining grid stability and real-time power balancing. Improved ultra-short-term PV power forecasting reduces uncertainty, enhances grid stability, enables more efficient dispatch and reserve scheduling, and lowers operating costs and curtailment, delivering better economic benefits.
In existing research, models for PV power forecasting generally include physical, statistical and artificial intelligence methods2. Physical models use numerical weather predictions as inputs for physical equations, but though interpretable, often fail to capture the nonlinear nature of PV power3. In contrast, statistical models such as Autoregressive Integrated Moving Average (ARIMA)4 and improved Markov Chain approaches5 offer solid mathematical foundations and improved stability, yet struggle with long-term dependencies and complex nonlinearities, limiting their effectiveness for PV power generation.
Deep learning has emerged as a promising alternative for PV power forecasting because it can automatically learn nonlinear representations and temporal dependencies from data. Sun et al.6 used Convolutional Neural Networks (CNN) to correlate PV output with contemporaneous sky images, successfully demonstrating the feasibility of image-based ”now-casting” for solar systems. Zhang et al.7 developed Gated Recurrent Unit (GRU) to construct prediction intervals under different weather conditions, improving forecasting reliability under variability. Sun et al.8 proposed Long Short-Term Memory (LSTM) model that explicitly exploits spatial and temporal correlations among neighboring PV sites, leading to higher prediction accuracy. Joo et al.9 focused on Echo State Network (ESN) due to its efficiency and fast training speed, showing that it could significantly outperform tuned LSTM models in terms of accuracy. Li et al.10 applied a Temporal Convolutional Network (TCN) for day-ahead PV power forecasting and reported a 20%–30% reduction in Root Mean Square Error (RMSE) over baseline methods. Khalil et al.11 used a Transformer model, achieving a forecasting Mean Absolute Error (MAE) of 0.9377 and enabling proactive fault mitigation. Singh and Alam12 tried N-BEATS model pre-trained on temperature data, having a notable reduction of 30–40% in Mean Absolute Percentage Error (MAPE), indicating the effectiveness of leveraging pre-trained models. Suresh13 introduced Patch Time Series Transformer (PatchTST), which delivered superior accuracy over both classical persistence methods and other Transformer baselines. Liang et al.14 attempted Inverted Transformer (iTransformer) model into the distributed PV power prediction problem, which enables the model to capture correlations among limited meteorological features to fully exploit the data potential. Moreover, Online Kernel Extreme Learning Machine (OKELM) provides an alternative lightweight predictor with online updating ability, and is often used in streaming forecasting settings15.
However, individual machine learning models have limitations, prompting research into hybrid models that combine data decomposition and machine learning to address data non-stationarity and boost accuracy16,17. Commonly used decomposition methods include Empirical Mode Decomposition (EMD)18, Ensemble EMD (EEMD)19, and Complementary EEMD with Adaptive Noise (CEEMDAN)20, though mode mixing issues remain. Moreover, a single decomposition stage is often insufficient for highly non-stationary PV power series. After an initial decomposition, the high-frequency components may still exhibit strong volatility and residual mode mixing, which degrades the downstream forecasting performance. To further enhance component stationarity and separability, recent studies have adopted secondary decomposition frameworks. For example, Zhang et al.21 employed EMD method to denoise the signal, and the residual signal was further decomposed using Variational Mode Decomposition (VMD) to minimize mode aliasing and improve accuracy. In another study, Zhang et al.22 proposed VMD combined with CEEMDAN secondary decomposition method for the original signal decomposition, to reduce the signal volatility and reduce the complexity of feature mapping the PV data. Furthermore, Liu et al.23 selected two variables with the highest correlation and decomposed them using VMD, CEEMD, and Singular Spectrum Analysis (SSA) to extract more diverse and informative features. In this context, VMD constrains the bandwidth of each mode and can refine the sub-series obtained from the first-stage decomposition; however, its performance depends on key parameters that are typically manually tuned. Therefore, metaheuristic algorithms such as Northern Goshawk Optimization (NGO) and Grey Wolf Optimizer (GWO) have been used for parameter optimization, and this study adopts the Black-Winged Kite Algorithm (BKA) to optimize key VMD parameters, alleviating over- and under-decomposition and further reducing mode mixing and endpoint effects24,25,26,27,28.
Despite these advances, challenges persist: most models use only a single decomposition method, leading to unresolved mode mixing and frequency overlap, and dual-decomposition models often lack parameter optimization. Moreover, many decomposition-based forecasting frameworks apply a single predictor uniformly to all decomposed components, rather than designing component-specific models to match heterogeneous characteristics, which limits the effective exploitation of decomposition. PV power data remain highly nonlinear, non-stationary, and coupled, with noise and sparsity making feature extraction difficult. Thus, single data processing methods struggle to achieve ideal forecasting results.
To improve the accuracy of ultra-short-term PV power forecasting and enhance the stability of the power grid, this paper proposes a novel hybrid forecasting model, termed CEEMDAN-BKA-VMD-OKELM-CNN-ESN, which integrates multistage signal decomposition and advanced deep learning techniques. The primary contributions of this study are as follows:
(1) The VMD algorithm in which k and (alpha) have a great influence on the decomposition results. To avoid large errors caused by manual determination of these parameters, this study uses BKA to optimize the VMD hyperparameters. This enhancement reduces the complexity of modeling and improves the overall efficiency of the process.
(2) The secondary decomposition strategy combines two advanced decomposition algorithms, CEEMDAN and BKA-VMD, to process the PV power generation sequences. Based on the high-frequency subsequence reconstructed by CEEMDAN decomposition, the secondary decomposition is carried out by BKA-VMD. The method effectively reduces the non-stationarity of the data and comprehensively extracts the intrinsic features of the data.
(3) To overcome the limitations of a single model in capturing the historical data characteristics of PV power, this study introduces a dual-branch modeling structure. The high-frequency components are predicted using an OKELM, while the low-frequency components are processed using a CNN-ESN model, where the CNN extracts spatial features and the ESN captures temporal dynamics. This design maximizes the representation of different data characteristics.
(4) A hybrid ultra-short-term PV power forecasting method combining a secondary decomposition strategy and deep learning integration is proposed, which improves forecasting accuracy and robustness, and its effectiveness and generalization are validated across different geographic locations and seasons.
The structure of this paper is as follows: Section 2 describes the theory and structure of the CEEMDAN-BKA-VMD-OKELM-CNN-ESN forecasting method. Section 3 introduces the experimental dataset and model’s indicators. Section 4 presents the experimental process and the results of data analysis in detail and compares them with other models. Section 5 provides the conclusion.
EMD decomposes nonlinear, non-smooth signals into IMFs, but often suffers from mode aliasing. CEEMDAN addresses this by adding adaptive noise, reducing reconstruction error. The main step is:
(1)White noise is added to the original signal I times, generating noisy sequences (omega ^1, omega ^2,…, omega ^I). Each sequence is decomposed by EMD to obtain (IMF_1^1, IMF_1^2,…, IMF_1^I). The first IMF is then calculated as their average:
Update the corresponding residual value to
(2)To update the residual signal (r_1), add white noise again and complete the EMD decomposition to find the new (IMF_k), and update the residuals. The process is iterated until all modes are fully extracted.
(3)After completing all the layers of decomposition, the final raw signal x is decomposed into multiple (widetilde{IMF}_k) and a sum of residual signals, as follows:
where K is the number of levels of decomposition.
(4)During the exploration phase, fishermen initially focus on independent searches, using group encirclement as a supplementary search method. As the search progresses, the environmental advantage gradually shifts to the fishermen, and fishermen rely primarily on group encirclement while using individual advantages as a supplement. In this model, we use (alpha) to represent the capture rate parameter.
Where EFs is the current number of evaluations, and MaxEFs is the maximum number of evaluations.
The BKA is a simple and effective meta-heuristic optimization algorithm, which is divided into migration and attack phases. The process is as follows:
(1)Initialization phase: A set of random solutions is created, and the position of each black-winged kite (BK) is represented as a matrix.
(2)Attacking behavior: A mathematical model for the attack behavior of BK is shown:
where (y_{t+1}^{i,j}) and (y_t^{i,j}) represent the position of the (i^{th}) BK in the (j^{th}) dimension in the t and ((t+1)^{th}) iteration steps, respectively. r is a random number that ranges from 0 to 1, and p is a constant value of 0.9. And T is the total number of iterations, and t is the number of iterations that have been completed so far.
(3) Migration behavior: A mathematical model for the migration behavior of BK is expressed in
where (L_t^{i,j}) represents the leading scorer of the Black-winged kites in the (j^{th}) dimension of the (i^{th}) iteration so far. (F_i) represents the current position in the (j^{th}) dimension obtained by any BK in the t iteration. (F_{ri}) represents the fitness value of the random position in the (j^{th}) dimension obtained from any BK in the t iteration. And C(0, 1) represents the Cauchy mutation.
VMD is an adaptive, non-recursive method for decomposing signals into smooth, multi-scale components. It overcomes endpoint effects and mode aliasing seen in EMD, offering a stronger mathematical foundation. By solving a variational problem, VMD effectively handles complex, non-stationary signals.
Firstly, the variational problem is constructed by assuming that the original signal f is decomposed into k components. Each component should be a modal component with finite bandwidth centered around a specific frequency. Additionally, the sum of the estimated bandwidths of each modality should be minimized. The constraint is that the sum of all modes must equal to the original signal. Consequently, the VMD constrained variational model is as follows:
where (u_k = {u_1, u_2, ldots , u_K}) is the function of each mode, and (omega _k = {omega _1, omega _2, ldots , omega _K}) is the center frequency of each mode.
To solve the constrained optimization problem, it is necessary to transform the constrained variational problem into an unconstrained variational problem. By utilizing the quadratic penalty term and the Lagrange operator, the above equation is transformed into:
where (alpha) is the penalty parameter, and (lambda) is the Lagrange multiplier.
For all (omega ge 0), update the generalized letter (hat{u}_k):
Updating the generalized letter (omega _k):
For all (omega ge 0), a double boost
where (gamma) denotes the noise tolerance limit.
Repeat the above steps until the iterative constraints are satisfied:
The construction of this constrained variational model allows VMD to effectively deal with complex non-smooth signaling.
Typically, the selection of VMD parameters relies on practical experience, and the choices of decomposition layer k and penalty factor (alpha) significantly affect the decomposition results29. The value of k directly determines the number of decomposed modal components. An inappropriate selection of k can lead to under-decomposition; a large k value results in false modes, while a small k value fails to extract the hidden features of the time series effectively. The value of (alpha) affects the bandwidth of the modal components. A small (alpha) leads to mode mixing, hindering feature extraction, while a large (alpha) causes loss of local information. Since these parameter selections heavily rely on subjective judgment, it is crucial to optimize the parameters of VMD. The mechanism of the proposed BKA-VMD is shown in Fig. 1.
The mechanism of the proposed BKA-VMD.
The parameters of VMD are optimized using the BKA algorithm. For nonlinear and complex signals, multiscale permutation entropy (MPE) is used as the fitness function due to its superior stability and noise resistance. The expression is as follows:
where the value of (H_p(M)) indicates the extent of unpredictability and intricacy in the time series and m is the embedding dimension.
Traditional neural network models are typically trained using predefined training samples. However, as new samples are continuously added, the forecasting error tends to increase with conventional models. To address the issue of model updates during the process of sample accumulation and enhance forecasting accuracy, the OKELM algorithm is employed for predicting high-frequency signals15. The steps involved in the OKELM modeling process are as follows:
(1)Calculate the initial one using (t_{N+1}’) samples ({(x_i, t_j)}^{N+l-1}) at the current moment,
Here, (W_N) is a ((l-1) times (l-1))-dimensional square matrix, and (q_N) is a constant.
(2)Using (X_{N+1}) as the input, the predicted value (t_{N+1}’) for the corresponding output is computed based on Equation (23).
(3)Once the true value of (t_{N+1}) is obtained, the model’s forecasting error (e = |tN – tN’|) for this sample is calculated. If Equation (23) is satisfied, the pair is updated according to Equation (24) to obtain (D_{N+1}); otherwise, (D_{N+1} = DN).
(4)Using the updated (D_{N+1}), (lambda _{N+1}) is computed based on Equation (24). The old samples ((x_N, t_N)) are removed, and (W_{N+1}^{-1}) is calculated from (D_{N+1}) according to Equation (26). (varepsilon _{N+1}) is then computed from (D_{N+1}), and (lambda _{N+1}) is recalculated according to Equation (21).
(5)Let (N = N+1), return to step (2).
Structure of CNN.
CNN is a deep learning neural network model that excels at processing large amounts of data information with automatic hierarchical feature extraction30. By utilizing a convolutional kernel, the model can effectively reduce the number of parameters, parameters, thereby mitigating the risk of overfitting and enhancing computational efficiencies are widely employed for classification and regression tasks and typically comprise several key layers: a convolutional layer, a pooling layer, a fully connected layer, an input layer, and an output layer, which is shown in Fig. 2.
ESN is a type of recurrent neural network (RNN) proposed by Jaeger, which features a large, fixed, and sparsely connected reservoir with dynamic memory capability, which is shown in Fig. 3. Unlike traditional RNNs, ESNs avoid the complexity of backpropagation through time (BPTT) by only training the layer weights, which significantly reduces the training time while retaining the network’s temporal modeling ability31.
Structure of ESN.
Given an input sequence u(t), the reservoir state (x(t) in mathbb {R}^{N_r}) is updated according to:
where (W_{in}) is the input weight matrix, W is the reservoir (internal) weight matrix, and (W_{fb}) is the optional feedback weight matrix.
The output y(t) is then computed as:
where (W_{out}) is the learned output weight matrix, and [u(t), x(t)] denotes the concatenation of input and reservoir state vectors.
The CNN-ESN hybrid model combines the powerful feature extraction capabilities of CNN with the dynamic temporal modeling strength of ESN. In this structure, CNN is employed as a spatial encoder to capture localized patterns or short-term dependencies in the input data, while ESN is used to learn the temporal dependencies from the extracted features.
Given a time-series input (X = [x_1, x_2, ldots , x_T]), a one-dimensional convolutional layer with multiple kernels is first applied to extract high-level local features from the input sequence. Let (F_t) denote the feature maps at time t, where C is the number of channels (filters) and L is the length of the feature vector after convolution. These features are then fed into the reservoir of the ESN, which evolves over time.
This hybrid architecture leverages CNN’s ability to extract translation-invariant spatial representations and ESN’s efficiency in temporal sequence modeling, resulting in improved accuracy and generalization in time-series forecasting tasks32.
PV power generation exhibits strong nonlinearity and variability due to fluctuating environmental conditions, making it challenging for traditional single methods to achieve high-precision ultra-short-term forecasting. To address this issue, we propose a CEEMDAN-BKA-VMD-OKELM-CNN-ESN forecasting method for ultra-short-term PV power that systematically decomposes and reconstructs the PV power sequence, enhancing its predictability. The model consists of four key stages. First, we extract PV power generation data from a PV power plant located in Ningxia during the spring of 2017, providing a real-world dataset for analysis. Second, the PV power sequence is divided into training and test sets and subsequently decomposed using the CEEMDAN algorithm. Then, sample entropy and K-means clustering are used to divide the IMFs into high-frequency and low-frequency components. The high-frequency components are further decomposed using the BKA-VMD method, while the low-frequency components are retained as trend signals. Third, OKELM is applied to predict high-frequency signals, and CNN-ESN is employed to capture and predict the temporal trends from the low-frequency part. For high-frequency modes, the signal is highly non-stationary with rapid oscillations and possible distribution drift. Kernel-based ELM in OKELM provides strong nonlinear approximation with fast training, while its online updating mechanism can promptly adapt the model to newly arriving samples. This makes OKELM more suitable for tracking local, fast-varying patterns in high-frequency components than batch-trained models. We adopt OKELM to model the high-frequency components in PV power forecasting. Low-frequency modes mainly describe the underlying trend and long-range temporal structure. We therefore employ CNN-ESN for these components: the CNN extracts trend-related multi-scale patterns from sliding windows, while the ESN reservoir provides dynamic memory to integrate these features over longer horizons. As only the readout layer is trained, CNN-ESN can learn smooth trend evolution efficiently and stably33. Finally, the forecasting results from both branches are additively integrated to obtain the final PV power forecast, which is compared with other methods to validate the model’s effectiveness. By leveraging advanced decomposition and hybrid modeling techniques, the proposed approach significantly improves adaptability and predicts accuracy of ultra-short-term PV power forecasting. The structure of the CEEMDAN-BKA-VMD-OKELM-CNN-ESN PV power forecasting method proposed in this paper is shown in Fig. 4, which is mainly divided into four stages.
(1) Extract PV power generation data from a PV power plant located in Ningxia during the spring of 2017.
(2) After dividing the PV power sequence into training and test sets, apply CEEMDAN decomposition to obtain IMFs.
(3) Compute the sample entropy of each IMF and use K-means clustering to divide them into high-frequency and low-frequency components. The high-frequency IMFs are further decomposed using the BKA-VMD method and predicted using OKELM. The low-frequency IMFs are directly modeled using the CNN-ESN network.
(4) Integrate the forecasting results of high- and low-frequency components to generate the final ultra-short-term PV power forecast. The performance of the proposed method is evaluated by comparing it with other benchmark models, highlighting its improved adaptability and forecasting accuracy.
The overall flow of the photovoltaic power forecasting.
To verify the effectiveness and generalization ability of the proposed CEEMDAN-BKA-VMD-OKELM-CNN-ESN method, two PV power datasets are investigated in this study. Both datasets are collected from grid-connected PV plants and are employed for feature correlation analysis, feature selection, and forecasting performance evaluation.
The first dataset is obtained from a PV power plant located in Ningxia, China, during the spring of 2017. The installed capacity of the plant is 150 kW. The sampling interval is 15 min, and the dataset features contain: component temperature, ambient temperature, air pressure, humidity, total radiation (horizontal), direct radiation, diffuse radiation, and PV power.
The second dataset is collected from a PV power plant from the Australian Desert Knowledge Edge Solar Centre (DKASC, https://dkasolarcentre.com.au/downloadlocation=alice-springs) from January 1, 2022, to December 31,2022. The PV plant system is based on Trina Solar, and its array power rating is 23.4 kW. The sampling interval is 5 min. The dataset features contain: time index, PV active power, relative humidity, weather temperature, global horizontal irradiance, diffuse horizontal irradiance, wind direction, daily rainfall, wind speed, tilted-plane diffuse irradiance, and tilted-plane global irradiance.
Factors such as instantaneous failure of PV modules and manual recording bias can easily lead to missing recording data or deviation from the actual value. The direct use of abnormal data for forecasting will affect the convergence degree of the PV power forecasting. Therefore, the raw data are first examined to identify erroneous records, and missing values are imputed using mean filling. Additionally, to avoid the adverse effects caused by difference in magnitude and outlier sample data, the data are processed using a normalization method. This maps each feature’s data to range [0, 1]. The normalization and inverse normalization formulas are as follows, respectively.
where (x_i) is the raw data, (x_i’) is the normalized data, (x_{max }) is the maximum value of the variable, and (x_{min }) is the minimum value of the variable.
The PV power features of this PV plant contain three types of solar radiation features: solar scattered radiation, solar direct radiation, and total horizontal solar radiation. In PV power forecasting, the high correlation between the features causes the model to suffer from multicollinearity, which subsequently leads to inaccurate estimation of the model regression coefficients and the explanatory and diminishes both the explanatory and predictive performance of the model. To address the issue of multicollinearity, linear and nonlinear correlations between features are initially analyzed using Pearson, Spearman and Kendall correlation coefficients34. The Pearson, Spearman and Kendall correlation coefficients are follows:
where (u_i) and (v_i) are the (i_{th}) value of the two variables respectively; (overline{u}) and (overline{v}) are the means of the two variables respectively; (d_i) is the rank difference of the (i_{th}) value of the two variables, i.e., the difference between the positions of the two variables in the numerical order; N is the number of data points; C is the number of pairs of samples with consistent order; D is the number of pairs of samples with inconsistent order.
The values of the three correlation coefficient methods are all range between ([-1,1]), where a positive value is positive correlation, a negative value is negative correlation. The larger the absolute value, the stronger the correlation. The results of correlation analysis between each feature factor and PV power are presented in Table 1.
The specific hyperparameter settings employed in our proposed hybrid framework are detailed in Table 2. These parameters were carefully selected to optimize the performance of four core modules. For the initial decomposition via CEEMDAN, a noise ratio ((epsilon)) of 0.2 and an ensemble size of 500 were used to ensure stable mode extraction. In the secondary decomposition stage, BKA-VMD was configured with a population size of 10 and 30 iterations to efficiently search for the optimal k within [1, 10] and (alpha) within [1000, 8000]. Regarding the prediction models, the CNN-ESN, tasked with low-frequency component modeling, utilized 32 filters and a spectral radius of 0.8 to capture temporal dependencies, while the OKELM for high-frequency components adopted an RBF kernel function with specific penalty and kernel parameters, penalty coefficient C of (2^{10}) and kernel parameter (mu) of (2^{-4}), to enhance generalization capability.
The experiments were conducted on a Windows 11 system equipped with an AMD Ryzen 7 5800H processor, 16 GB of RAM, and an NVIDIA RTX 3060 GPU. The proposed model was implemented using the Keras framework in a Python 3.7 environment. Under this hardware configuration, the average runtime for a single prediction process was approximately 143 seconds.
To comprehensively and reliably evaluate the forecasting performance of the proposed forecasting model, mean square error (MSE), RMSE, MAE, MAPE and coefficient of determination ((R^2)) are used as evaluation indexes. The computational equations for these metrics are as follows:
where (y_i) denotes the original PV power data, (overline{y}_i) denotes the average value of the PV power data, and (hat{y}_i) denotes the predicted value of the model.
MSE, RMSE, MAE, and MAPE are metrics where smaller values indicate lower forecasting errors, smaller deviations from true values, and thus higher forecasting accuracy. The (R^2) is a metric used to evaluate the goodness of fit of a model, where values closer to 1 indicate stronger explanatory power and higher forecasting accuracy.A comprehensive analysis of these metrics provides a thorough evaluation of the model’s predictive capability and performance.
Before the decomposition of the original data, the maximum and minimum normalization is used to normalize the photovoltaic power series data. Since CEEMDAN can adaptively obtain the number of eigenmode components according to the series data. The original data is decomposed into a few subsequences, with the results shown in Fig. 5.
CEEMDAN results of photovoltaic power.
The first line of the figure represents the original sequence data, and the second to the thirteenth lines represent the IMF subsequences obtained after CEEMDAN decomposition. As can be seen in Fig. 5, the complexity and randomness of the decomposed sequences are reduced compared with the original sequences, and the fluctuation degree of the sequences is also reduced sequentially, with the sequence gradually becoming more stable. To analyze the complexity of the decomposed sequence more intuitively, this paper uses the sample entropy to calculate the sequence complexity.
From the entropy values of the decomposed subsequence samples, it can be observed that the complexity of the subsequence after CEEMDAN decomposition is gradually reduced. The degree of frequency fluctuation and randomness is also progressively diminished, and the complexity of different sequences exhibit a certain level of similarity.
The decomposed subsequences were reconstructed by comparing the sample entropy values of different subsequences using K-means clustering. They were then reconstructed into a high-frequency sequence Co-IMF1 and low-frequency sequences Co-IMF2 and Co-IMF3, which are shown in Fig. 6.
K-means Clustering Results for Multiple IMF Components.
The reconstructed high-frequency sequence was subjected to secondary decomposition to extract finer features. We first applied standard VMD with fixed parameters and subsequently employed the BKA-VMD method, where key parameters were adaptively optimized by the BKA algorithm. The comparative decomposition results are illustrated in Fig. 7 where BKA-VMD exhibits significant advantages over standard VMD in several aspects.
Regarding boundary effects, the components extracted by BKA-VMD show smoother fluctuations at the edges, effectively mitigating the end effects seen in VMD. Meanwhile BKA-VMD achieves higher modal purity. Unlike standard VMD, which often suffers from under-decomposition or over-decomposition due to improper parameter selection, the proposed method ensures a more concentrated frequency distribution and minimizes spectral overlap. Finally, BKA-VMD demonstrates superior capability in separating high-frequency noise from low-frequency trends, allowing for a more accurate capture of intrinsic signal changes. These improvements directly contribute to the enhanced forecasting accuracy of the overall model.
Decomposition results. (a) VMD; (b) BKA-VMD.
By comparing the predicted PV power of each comparison experiment with the predicted and actual values of the proposed model on the same graph, the final comparison graph between the predicted and actual values is obtained as shown in Fig. 8. And the group with the best training effect for each model is recorded, and the results are listed in Table 3 and Fig. 9.
Forecasting results of model.
Radar chart comparison of evaluation metrics for different forecasting models.
The results reveal a clear performance ladder across the competing models, with the proposed CEEMDAN-BKA-VMD- OKELM-CNN-ESN consistently ranking first on all metrics, indicating the most faithful reconstruction of PV power dynamics. Among deep baselines, iTransformer is the strongest, yet the proposed model further tightens the fit by markedly lowering the typical error level while nearly halving the relative deviation, and the much larger drop with 8.2896% in MSE suggests that it is particularly effective at suppressing occasional large misses such as ramps or sharp fluctuations rather than only improving average cases. In contrast, Among the baseline methods, the LSTM and GRU models exhibit the lowest accuracy, with RMSE values of 6.3152 and 6.3685, and MAPE values of 33.9420% and 34.7936%, respectively, highlighting their limited robustness to the nonstationary, multi-scale characteristics of the series; convolution- and attention-based models progressively improve but still leave a notable gap to the proposed framework. Overall, the across-the-board reductions in RMSE/MAE/MAPE together with the highest (R^2) support the conclusion that the decomposition-plus-specialized-learning strategy delivers more stable and accurate forecasts than single-architecture baselines.
The comprehensive comparison against a spectrum of competitive baselines reveals that, while these baseline models represent the forefront of time-series modeling, they share a fundamental limitation: they all attempt to model the raw, highly non-stationary PV generation series within a single, monolithic latent space.
In comparison with classical LSTM and GRU, the limitations of simple sequence modeling are evident. These models yield the highest errors, with RMSEs of 6.3152 and 6.3685, and MAPEs hovering around 34%. This poor performance stems from their struggle to handle the high-frequency volatility inherent in solar power. In contrast, our proposed method reduces the RMSE by approximately 69% compared to these baselines. This massive reduction confirms that relying solely on memory gates is insufficient for non-stationary data, whereas our decomposition-based approach effectively simplifies the input complexity.
Regarding the intermediate deep learning models such as TCN, standard Transformer, and N-BEATS, we observe a noticeable improvement over LSTM and GRU baselines but a continued gap with our method. For instance, while N-BEATS achieves a respectable RMSE of 4.2976, it still lags significantly behind our model’s 1.9289. Similarly, the standard Transformer achieves an (R^2) of 98.12%, which, while high, is overshadowed by our near-perfect 99.69%. These advanced architectures outperform LSTM and GRU by utilizing convolution or self-attention to capture longer dependencies, yet they still process the raw, noisy signal directly. Our results demonstrate that explicitly separating noise via BKA-VMD serves as a superior feature engineering step compared to the internal feature extraction of these intermediate models.
Most critically, the comparison against the forecasting models, PatchTST and iTransformer, highlights the unique value of our hybrid strategy. PatchTST and iTransformer are currently considered benchmarks in time-series forecasting due to their patching and inverted attention mechanisms, achieving RMSEs of 3.9976 and 3.4656, respectively. However, our proposed framework still outperforms the best baseline (iTransformer) by reducing the MSE from 12.0104 to 3.7208−a remarkable 69% reduction in variance. This proves that even the most advanced end-to-end deep learning architectures cannot match the precision of a divide-and-conquer system that assigns specialized learners (OKELM and CNN-ESN) to specific frequency components.
To verify the statistical significance of the above prediction results, we conducted a Diebold–Mariano(DM) test on them. During the DM test, the proposed model was designated as the first model, while the other models were respectively designated as the second models. The results of the DM test are reported in Table 4.
Based on the DM test results of the three loss functions (MSE, MAE and MAPE), the null hypothesis was rejected at the 5% significance level, and the DM test statistic values were all negative. This fully proves that the proposed model has superior predictive capabilities in the photovoltaic power prediction task compared to other models.
Forecasting performance of different model variants in the ablation study.
Radar chart comparison of evaluation metrics for different forecasting models in ablation study.
To thoroughly assess the contribution of each component in the proposed CEEMDAN-BKA-VMD-OKELM-CNN-ESN framework, we conducted an ablation study. The purpose of this study is to isolate the effect of each module−CEEMDAN-based decomposition, BKA-driven parameter optimization, VMD-based secondary decomposition, and the dual learners OKELM and CNN-ESN−on the overall forecasting performance. This design enables us to verify the necessity and effectiveness of individual components in improving predictive accuracy and robustness. Specifically, we implement the ablation study by systematically removing or replacing one module at a time from the full framework and then evaluating the resulting variants on the same test set using the metrics MSE, RMSE, MAE, MAPE, and (R^2). The quantitative results are reported in the Table 5, Fig. 10 and Fig. 11.
The limitations of standalone predictors are evident. The single OKELM and CNN-ESN models exhibit the highest errors, with RMSEs of 6.3902 and 6.2386, respectively. Their relatively low (R^2) values indicate that without signal preprocessing, these models struggle to learn the mapping between historical inputs and future outputs due to the superposition of noise and trends in the raw PV data.
The introduction of CEEMDAN marks the first tier of improvement. By decomposing the non-stationary series into IMFs, CEEMDAN reduces the complexity of the input features. Consequently, the CEEMDAN-OKELM model lowers the RMSE to 5.7010, and CEEMDAN-CNN-ESN further reduces it to 5.1090. While this represents a tangible gain, the improvement is capped because standard CEEMDAN may still leave residual high-frequency components that contain mode mixing, hindering precise prediction.
To address this, the secondary decomposition via VMD proves critical. As shown in the table, the CEEMDAN-VMD-OKELM model achieves a significant drop in MSE to 22.6865 (from 32.5076 in the CEEMDAN-only version). This validates that further decomposing the volatile IMFs into band-limited sub-modes helps disentangle the complex, chaotic signals that a single decomposition stage cannot handle.
Furthermore, the optimization via BKA is shown to be indispensable. By adaptively tuning the VMD parameters, the CEEMDAN-BKA-VMD-OKELM model reduces the MSE even further to 14.9851. This substantial improvement over the standard VMD variant confirms that BKA effectively prevents the loss of useful information caused by improper parameter selection (k and (alpha)) in VMD, ensuring that the decomposed sub-modes are physically meaningful and easier to forecast.
Finally, the comprehensive framework achieves the state-of-the-art performance. By integrating the strengths of all modules−using OKELM for high-frequency fluctuations and CNN-ESN for low-frequency trends−the proposed model achieves a dramatic reduction in error metrics, with the MSE plummeting to 3.7208 and the MAPE reaching a minimal 5.9927%. This final leap demonstrates that the divide-and-conquer strategy, combined with adaptive optimization, is far superior to any single-stage or unoptimized hybrid approach.
Seasonal forecasting performance on the DKA dataset: (a) Spring, (b) Summer, (c) Autumn, and (d) Winter.
To further verify the generalization ability of the proposed model under different geographical locations and seasonal variations, we conducted a robustness experiment using the Australia dataset (year 2022). The test samples were grouped by season, namely Spring, Summer, Autumn, and Winter. The forecasting performance was evaluated using MSE, RMSE, MAE, MAPE, and (R^2).
As reported in Table 6 and Fig. 12 the proposed model achieves consistently strong performance across all seasons, with (R^2) remaining above 99.06%. Specifically, Autumn yields the best overall accuracy, where MSE is 0.1074, RMSE is 0.3278, MAE is 0.1708, MAPE is 0.0997%, and (R^2) reaches 99.5511%. Spring also shows competitive results with (R^2) of 99.4119%, while Summer and Winter maintain stable errors and high goodness-of-fit, demonstrating that the proposed model is robust to seasonal pattern shifts and can generalize well under varying environmental conditions.
The forecasting of ultra-short-term PV power is crucial for improving PV dispatch strategies and ensuring the safe and stable operation of power-system equipment. This paper proposes a hybrid ultra-short-term PV power forecasting framework, CEEMDAN-BKA-VMD-OKELM-CNN-ESN. The original PV power series is orderly decomposed by CEEMDAN and BKA-VMD into multi-frequency subsequences, which effectively reduces nonstationarity and facilitates the extraction of informative patterns. Specifically, the decomposed components are categorized by frequency: the stable low-frequency trends are predicted by the CNN-ESN model to capture long-term dependencies, while the fluctuating high-frequency components are handled by OKELM to efficiently track rapid changes. This targeted approach ensures that the distinct characteristics of each sub-signal are modeled by the most suitable predictor.
Experimental comparative results comprehensively demonstrate that the proposed hybrid methodology significantly enhances ultra-short-term PV power forecasting precision, providing a critical reference for the further optimization of power grid dispatch strategies. This study establishes a robust framework for ultra-short-term generation prediction, effectively overcoming the stochastic nature of solar irradiance to achieve superior accuracy. In terms of quantitative performance, the proposed model exhibits exceptional metrics on the test dataset, recording an MSE of 3.7208, an RMSE of 1.9289, and a remarkably high goodness-of-fit with (R^2) of 99.6987%. The model successfully captures the complex, non-linear dynamic characteristics of solar energy generation with minimal deviation. When benchmarked against competitive baselines, the advantage of our approach is substantial. In comparison with the iTransformer, a strongest baseline model, as well as other mainstream algorithms, our method demonstrates a decisive performance lead, reducing the MSE and RMSE by approximately 69.0% and 44.3%, respectively. This significant error reduction proves that the proposed multi-stage decomposition and ensemble strategy offers a much more effective solution for error mitigation than single-structure deep learning models. Furthermore, the seasonal validation on the DKA dataset confirms the model’s distinct robustness against environmental variations. The model maintains high stability across all seasons, with the most precise forecasting results observed during the Autumn season, yielding a minimal MSE of 0.1074 and RMSE of 0.3278. Even under varying meteorological conditions, the model consistently aligns with actual power outputs.
In summary, by consistently outperforming existing optimization algorithms and mainstream forecasting models across all evaluated metrics, this study establishes a robust framework for ultra-short-term PV power prediction. The proposed hybrid approach effectively overcomes the stochastic nature of solar irradiance to achieve superior accuracy, proving its validity and effectiveness. Consequently, this method offers significant practical value for enhancing the operational stability of ultra-short-term PV power systems and provides a critical reference for the further optimization of power grid dispatch strategies.
However, this study is limited by the available samples and focuses on a single PV power time series. Future work will focus on: (1) integrating multi-modal data (e.g., sky images, satellite observations, and IoT sensor measurements) to enable nowcasting with high spatiotemporal resolution; (2) enhancing model interpretability via explainable AI and physics-informed/physics-guided learning, so that predictions are trustworthy and actionable for operational decision-making; (3)optimizing the model’s computational efficiency (e.g., exploring simplified or alternative signal decomposition methods to reduce preprocessing overhead) exploring lightweight architectures to facilitate its deployment on edge-computing platforms for real-time forecasting in practical PV plant operations; and (4) adopting advanced probabilistic forecasting frameworks to strengthen uncertainty quantification under extreme weather events, thereby improving grid operational resilience.
The measurement data from a PV power plant located in Ningxia during the spring of 2017 for feature factor correlation analysis and feature selection. The installed capacity of this PV power plant is 150kW, and the data sampling interval is 15mincontaining 8 features. The datasets generated and/or analyzed during this study are publicly available on the Figshare platform at https://figshare.com/s/7a26e8c40a049cb305a0 and https://dkasolarcentre.com.au/download?location=alice-springs. The repository is distributed under the permissive MIT open-source license and is permanently archived through Figshare’s preservation partnership with the Software Heritage Foundation, ensuring long-term accessibility. Users are free to download, reuse, and adapt the data, provided that appropriate credit is given to the original authors and all license terms are observed.
Huang, C. & Yang, M. Memory long and short term time series network for ultra-short-term photovoltaic power forecasting. Energy 279, 127961 (2023).
Article Google Scholar
Li, Y., Huang, W., Lou, K., Zhang, X. & Wan, Q. Short-term PV power prediction based on meteorological similarity days and SSA-BiLSTM. Syst. Soft Comput. 6, 200084 (2024).
Article Google Scholar
Malinkovich, Y., Sitbon, M., Lineykin, S., Dagan, K. J. & Baimel, D. A combined persistence and physical approach for ultra-short-term photovoltaic power forecasting using distributed sensors. Sensors 24, 2866 (2024).
Article CAS PubMed PubMed Central ADS Google Scholar
Fara, L., Diaconu, A., Craciunescu, D. & Fara, S. Forecasting of energy production for photovoltaic systems based on arima and ann advanced models. International Journal of Photoenergy 2021, 6777488 (2021).
Article Google Scholar
Miraftabzadeh, S. M., Colombo, C. G., Longo, M. & Foiadelli, F. K-means and alternative clustering methods in modern power systems. IEEE Access 11, 119596–119633 (2023).
Article Google Scholar
Sun, Y., Szűcs, G. & Brandt, A. R. Solar PV output prediction from video streams using convolutional neural networks. Energy Environ. Sci. 11, 1811–1818 (2018).
Article Google Scholar
Zhang, J. et al. Interval prediction of short-term photovoltaic power based on an improved GRU model. Energy Sci. Eng. 12, 3142–3156 (2024).
Article CAS Google Scholar
Sun, F. et al. Photovoltaic power prediction based on multi-scale photovoltaic power fluctuation characteristics and multi-channel LSTM prediction models. Renew. Energy 246, 122866 (2025).
Article Google Scholar
Joo, Y., Kim, D., Noh, Y., Choi, J. & Lee, J. Performance comparison of LSTM and ESN models in time-series prediction of solar power generation. Sustainability 17, 8538 (2025).
Article ADS Google Scholar
Li, Y. et al. A TCN-based hybrid forecasting framework for hours-ahead utility-scale PV forecasting. IEEE Trans. Smart Grid 14, 4073–4085 (2023a).
Article Google Scholar
Khalil, I. U., Haq, A. U. & ul Islam, N. A deep learning-based transformer model for photovoltaic fault forecasting and classification. Electr. Power Syst. Res. 228, 110063 (2024).
Article Google Scholar
Singh, N. P. & Alam, M. N. Short-term forecasting in smart grid environment using N-BEATS. Research Square https://doi.org/10.21203/rs.3.rs-4116626/v1 (2024). Preprint.
Suresh, V. Benchmarking transformer variants for hour-ahead PV forecasting: Patchtst with adaptive conformal inference. Energies 18, 5000 (2025).
Article Google Scholar
Liang, H., Shi, K., Chen, X., Tan, L. & Ran, M. Research on the ultra-short-term power prediction method of distributed photovoltaic based on pso-itransformer. In Electrical Artificial Intelligence Conference, 467–477 (Springer, 2024).
Yu, J., Cai, Y., Huang, Y. & Yang, X. Remaining useful life prediction of lithium-ion batteries based on feemd-lstm-tam-okelm. AIP Advances 14 (2024).
Chaudhary, U., Ali, M. F., Kumar, A., Sharma, A. & Jayakody, D. N. K. Unleashing the power of wireless communication in healthcare by empowering patient care and connectivity: A comprehensive survey. IEEE Access https://doi.org/10.1109/access.2025.3578344 (2025).
Article Google Scholar
Chaudhary, U., Rajkumar, S. & Jayakody, D. N. K. Channel estimation of full-duplex relay-assisted RSMA-OFDM based wireless networks. Front. Commun. Netw. 6, 1567879 (2025).
Article Google Scholar
Huang, N. E. et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1998).
Article ADS Google Scholar
Wu, Z., Huang, N. E. & Chen, X. The multi-dimensional ensemble empirical mode decomposition method. Adv. Adapt. Data Anal. 1, 339–372 (2009).
Article Google Scholar
Colominas, M. A., Schlotthauer, G. & Torres, M. E. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed. Signal Process. Control 14, 19–29 (2014).
Article Google Scholar
Zhang, J., Sun, T., Guo, X. & Lu, M. Short-term photovoltaic power prediction with CPO-BiLSTM based on quadratic decomposition. Electr. Power Syst. Res. 243, 111511 (2025).
Article Google Scholar
Zhang, R., Xu, Z., Liu, S., Fu, K. & Zhang, J. Prediction of ultra-short-term photovoltaic power using BiLSTM-Informer based on secondary decomposition. Energies 18, 1485 (2025).
Article Google Scholar
Liu, Q., Li, Y., Jiang, H., Chen, Y. & Zhang, J. Short-term photovoltaic power forecasting based on multiple mode decomposition and parallel bidirectional long short term combined with convolutional neural networks. Energy https://doi.org/10.1016/j.energy.2023.129580 (2024).
Article Google Scholar
Li, Y. et al. Hybrid data-driven approach for predicting the remaining useful life of lithium-ion batteries. IEEE Transactions on Transportation Electrification 10, 2789–2805 (2023b).
Article Google Scholar
Yao, G., Wang, Y., Benbouzid, M. & Ait-Ahmed, M. A hybrid gearbox fault diagnosis method based on gwo-vmd and de-kelm. Appl. Sci. 11, 4996 (2021).
Article CAS Google Scholar
Yang, J., Li, X. & Mao, M. Fault diagnosis model via vibration signal analysis with an improved bka-vmd and cnn-telm hybrid framework. Energy Sci. Eng. https://doi.org/10.1002/ese3.2036 (2025).
Article Google Scholar
Zhang, B., Yin, Y., Li, B., He, S. & Song, J. A hybrid algorithm for predicting the remaining service life of hybrid bearings based on bidirectional feature extraction. Measurement 242, 116152 (2025).
Article Google Scholar
Rajkumar, S., Chaudhary, U. & Jayakody, D. N. K. Power-aware optimization in rsma-ofdm systems with af relay integration. In 2025 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), 1–6 (IEEE, 2025).
Wu, Z. & Huang, N. E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 1, 1–41 (2009).
Article Google Scholar
Yang, L., Cui, X. & Li, W. A method for predicting photovoltaic output power based on pcc-gra-pca meteorological elements dimensionality reduction method. Int. J. Green Energy 21, 2327–2340 (2024).
Article CAS Google Scholar
Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009).
Article Google Scholar
Kim, T. & King, B. R. Time series prediction using deep echo state networks. Neural Comput. Appl. 32, 17769–17787 (2020).
Article Google Scholar
Khan, Z. A., Hussain, T. & Baik, S. W. Boosting energy harvesting via deep learning-based renewable power generation prediction. J. King Saud Univ. Sci. 34, 101815 (2022).
Article Google Scholar
Soydaner, D. Attention mechanism in neural networks: Where it comes and where it goes. Neural Comput. Appl. 34, 13371–13385 (2022).
Article Google Scholar
Download references
No funding.
College of Mathematics and System Science, Xinjiang University, Urumqi, 830017, China
Shuyi Xue
Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center, Shanghai, 100081, China
Lei Li
PubMed Google Scholar
PubMed Google Scholar
Shuyi Xue wrote the the manuscript. Shuyi Xue and Lei Li conducted the experiment(s). Shuyi Xue and Lei Li analysed the results. All authors reviewed the manuscript.
Correspondence to Shuyi Xue.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Xue, S., Li, L. Photovoltaic power forecasting based on secondary decomposition strategy and hybrid model. Sci Rep 16, 12915 (2026). https://doi.org/10.1038/s41598-026-42896-z
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-42896-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.