Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 8197 (2026)
1418
1
Metrics details
Accurate forecasting of photovoltaic performance is essential for improving solar energy management, optimizing operational schedules, and supporting investment decisions. This study proposes a structured data-driven forecasting framework that integrates standalone learners with a hybrid boosting–aggregation strategy to predict two critical photovoltaic performance indicators: the optimal peak operating time (NOPT) and the power conversion efficiency (PCE). The methodology involves systematic data preprocessing, feature normalization, model training using both single and hybrid learners, and performance validation under identical experimental conditions. Multiple data-driven algorithms were examined using comprehensive statistical metrics, including R², RMSE, and U95. Among all models, the hybrid XGBA framework demonstrated superior predictive performance, achieving R2 values of 0.9954 for NOPT and 0.9970 for PCE, and consistently low errors across all evaluation criteria. Model robustness and generalization were further assessed through uncertainty-based evaluation metrics. Sensitivity analyses highlight key influential parameters such as Emin Emax, and Ap, revealing their substantial contributions to model outputs. The proposed hybrid model provides a robust and highly accurate predictive tool that can reduce operational uncertainties, enhance energy yield, and support data-driven decision-making for photovoltaic plant operators and energy sector stakeholders.
The global transition to sustainable energy has highlighted photovoltaic (PV) technology as a pivotal solution for reducing greenhouse gas emissions and dependence on fossil fuels1. Over the past decades, PV research has focused on enhancing power conversion efficiency (PCE), reducing production costs, and incorporating environmentally friendly materials, such as thin-film polymers and perovskite tandems2. The integration of PV systems into diesel-based energy infrastructures, including microgrids, remote power stations, and hybrid vehicles, presents a hybrid solution that can improve fuel efficiency, reduce emissions, and extend engine lifespan3,4,5. Such integration specifically aligns with several Sustainable Development Goals (SDGs), notably SDG 7 (Affordable and Clean Energy) and SDG 13 (Climate Action), by encouraging the use of renewable energy and decreasing the use of diesel6,7,8.
One of the significant breakthroughs in PV technologies is organic photovoltaics (OPVs) that have been the focus of renewed interest in the last couple of years, a fact that can be related to the successful implementation of non-fullerene acceptors (NFAs) allowing single-junction devices to achieve power conversion efficiencies (PCEs) over 18%9,10,11. Nevertheless, there will always be inefficiency-limiting processes in NFA-based systems, which are the main stumbling block for the understanding of these systems that, in turn, obstruct the rationality of human-computer-aided design of new donor–acceptor materials. It has been proven that the quadrupole moment of acceptors is the factor that most strongly influences the interfacial energetics, and high internal quantum efficiencies (IQEs) are generally observed when ionization energy offsets over 0.5 eV are used for exciton dissociation12. Research on material modifications can be exemplified by the end-group engineering effected by fluorination and chlorination that, besides allowing charge transport, also makes recombination less likely, resulting in the ‘device’ overall performance ameliorating13,14,15.
Machine learning (ML) methods are a significant factor in advancing research in renewable energy, especially in forecasting and optimizing photovoltaic (PV) systems16,17,18,19. Keddouda et al.20 developed artificial neural network (ANN) and regression models using meteorological data and operating temperature as inputs, achieving high predictive accuracy with R² values reaching 0.998. Kumari and Toshniwal21 conceived extreme gradient boosting with deep neural network (XGBF-DNN), which essentially integrates extreme gradient boosting forests and deep neural networks by the vehicle of ridge regression, thereby soaring not only the security but also the accuracy of a wide variety of climatic conditions. The use of such ensemble strategies underscores the viability of hybrid ML frameworks for addressing the unpredictability of PV system outputs in the real world.
Nonetheless, interpretability remains a major stumbling block, despite progress in predictive capabilities. Some XAI (Explainable AI) techniques, like SHAP22 and LIME23, can provide an account of feature importance and develop local explanations; however, they are still largely untapped in PV research. Chen et al.24 pointed out the difficulties related to terminology, cross-task evaluation, and the range of existing interpretability techniques; therefore, they suggested that more research should be carried out to enhance the transparency of the processes. Scott et al.25 examined the use of benchmark machine learning algorithms to forecast photovoltaic power generation for building-scale renewable energy systems. Several models, including random forest, neural networks, support vector machines, and linear regression, were compared using operational data from a university campus to evaluate forecasting accuracy across different dataset sizes and prediction horizons. The results showed that random forest achieved the lowest average error, although no single algorithm consistently outperformed the others under all conditions. The study highlighted the importance of dataset characteristics and model usability when selecting forecasting approaches for integration into building management systems. Bhutta et al.26 investigated the use of hybrid machine learning models to improve the prediction accuracy of solar power generation within smart grid systems. Hybrid deep learning architectures, including convolutional–recurrent, convolutional–LSTM, and convolutional–GRU networks, were applied to forecast key solar plant parameters such as power production, plane-of-array irradiance, and performance ratio. The results demonstrated that the hybrid convolutional–LSTM model achieved the highest predictive accuracy, yielding the lowest RMSE and MAE values across all evaluated variables. The findings indicated that hybrid machine learning approaches were effective in enhancing the efficiency and reliability of solar power generation forecasting in intelligent energy networks. Ridha et al.27 proposed a hybrid photovoltaic power prediction framework integrating singular spectrum analysis, an adaptive beluga whale optimization algorithm, and an improved extreme learning machine. Singular spectrum analysis was applied to preprocess long-term PV time-series data, while the adaptive beluga whale optimization method was used to enhance exploration–exploitation balance and optimize model hyperparameters. The improved extreme learning machine further refined output weight estimation to enhance prediction accuracy. Comparative evaluations using benchmark functions and real-world PV data demonstrated that the proposed hybrid model outperformed existing optimization and hybrid learning approaches across multiple statistical performance metrics.
Although ML and hybrid models have achieved high accuracy in photovoltaic forecasting, existing studies mainly focus on single performance indicators and accuracy-driven evaluation. The simultaneous prediction of optimal operating time and efficiency, along with uncertainty-aware validation and robustness assessment, remains largely unexplored. Moreover, despite advances in hybrid learning, model interpretability and sensitivity-based physical insight are insufficiently integrated into PV forecasting frameworks. These limitations highlight the need for a unified, transparent, and decision-oriented modeling approach that balances accuracy, reliability, and practical applicability. In addition, the proposed XGBA model addresses the methodological gap in existing hybrid PV forecasting approaches by enabling simultaneous multi-target prediction, improving robustness and uncertainty-aware performance, and integrating sensitivity-based interpretability for enhanced operational insight.
The primary task in this research is to formulate a hybrid machine learning system capable of accurately predicting solar energy parameters, such as the number of optimal peak operating times (NOPT) and power conversion efficiency (PCE). The accurate forecasting of these targets can lead to better management of solar energy resources, higher-quality service, and easier financial planning. Standard single models often fail to capture the complex relationships between environmental variables and energy outputs. To address this problem, the paper presents the concept of cooperation models, the result of the successful interaction of multiple learning paradigms. Such a combination is based on integrating tree-based algorithmic predictability with metaheuristic optimization strategies, e.g., simulated annealing or genetic algorithms. The most important part of the proposed method is the role the Bat Algorithm (BAT) plays as a tuner. The BAT optimizer is a metaheuristic approach inspired by the echolocation behavior of bats. The advantages include effective exploitation and space searching, fast convergence, and high adaptability to the problem. Moreover, the possibility of balancing global search and regional refinement allows it to be used for adjusting the parameters of hybrid machine learning models. In effect, the performance of different targets, i.e., NOPT and PCE, is improved. In addition to this, this paper also relies on a few sensitivity analysis techniques systematically to explore the effects the input variables exert on the model outputs. The FAST sensitivity methodology gives a comprehensive interpretation of the variable importance. In contrast, the Accumulated Local Effects (ALE) method measures the effect of each input on the predicted outputs regardless of the underlying model. Besides that, post hoc statistical tests such as Dunn’s test are applied to support the significance and independence of model predictions. The joint use of multiple sensitivity tools enables the proposed models not only to be accurate but also interpretable, allowing the identification of critical factors affecting solar energy performance. Figure 1 shows the process of the study.
Process of the present study.
The RBF network, a member of the Artificial Neural Networks (ANNs) family, links input and output components without the use of mathematical formulae. Instead, it infers the model’s structure and unknown parameters only from the data28. The RBF network consists of three layers: input, hidden, and linear output. As input vectors pass through the hidden layer, they undergo transformations that result in radial basis functions. These procedures use an activation mechanism based on the Gaussian distribution and have a solid basis in the properties of the Gaussian function. According to the literature, the Gaussian basis function ((:{mathcal{G}}_{j})) is defined by two essential parameters: width and center29. The following is an expression for the function:
The width and center of the Gaussian basis function are denoted by (:{omega:}_{j}) and (:{gamma::}_{j}), respectively, while (:x) is the input pattern. The output neuron is commonly represented by:
Here, (:{U}_{j}) is the weight factor that connects the (:j)th hidden neuron to the output neuron, (:mathfrak{B}) is the bias coefficient, and (:n) is hidden neuron’s numbers. Figure 2 shows how the RBF model works using a flowchart.
Flowchart of the RBF.
XGBoost, a supervised learning technique, was used to train models for forecasting missing laboratory test data. Because of its effectiveness in model training, the extended distributed gradient boosting package XGBoost was chosen30. This approach employs an adaptive binary splitting algorithm to iteratively select the optimal split at each stage, thereby producing an ideal model. Model selection procedures are enhanced by XGBoost’s resistance to overfitting and outliers due to its tree-based structure. Equation (3) defines the normalized goal of the XGBoost model during the (:s)th training phase. The loss function (:mathcal{L}mathcal{f}left({{x}^{left(sright)}}_{mathcal{p}},:{x}_{gt}right)) measures the difference between the predicted value (:{{x}^{left(sright)}}_{mathcal{p}}) and the corresponding ground truth (:{x}_{gt})
(:left| {omega :} right|^{2}) represents the (:mathcal{L}mathcal{f}2) norm of all leaf scores. The regularizer (:Omega :left( {f_{k} } right) = gamma :T + frac{1}{2}lambda :left| omega right|:^{2}) represents the complexity of the (:q)th tree. The parameters control the accuracy of the tree search, (:gamma:) and (:lambda:). Moreover, Fig. 3 shows the flowchart of the XGBR model.
Structure of the XGB model.
Averaging predictions from hundreds or even thousands of decision trees is how the random forest algorithm, an ensemble approach, creates multiple trees for regression. Each tree is derived from the Classification and Regression Tree (CART), which was first presented by Breiman et al.31. Data complexity shapes the learning process that each tree goes through. A decision tree is made up of decision and leaf nodes. According to Eq. (4), the input vector (:X={{x}_{1},{x}_{2},dots:,{x}_{m}}) maps to a scalar output (:Y) using a training set of (:n) observations ((:{R}_{n})).
By splitting the input data at each node until it reached a terminal leaf or satisfied stopping conditions, like a minimum sample size or maximum depth, the algorithm optimized split functions during the training phase. A prognostic function (:widehat{H}=(X,{R}_{n})) that can forecast results was created by this process. An ensemble of tree-structured base classifiers (:H=(X,{varTheta:}_{K})) was developed in Random Forest Regression32, where each (:{varTheta:}_{K}) denoted a random vector that identified a bootstrap sample of the training data or a subset of features. To ensure an equal selection probability, bootstrap sampling entailed drawing n observations from (:{R}_{n}) with replacement. This process was repeated across several bootstrap sets by the bagging procedure, producing a separate prediction tree for each set. The result was a set of (:q) trees (:widehat{h}left(X,:{S}_{n}^{{{Theta:}}_{1}}right),:…,:widehat{h}left(X,:{S}_{n}^{{{Theta:}}_{q}}right)). In contrast to a single decision tree, the outputs from all trees were averaged to produce the final predicted value, (:widehat{Y}), which improved accuracy and decreased variance33.
The output of the (:l)th tree is denoted by (:{widehat{Y}}_{l}), where (:l) takes values between 1 and (:q).
By integrating bagging with ensembles of unpruned decision trees, Random Forest (RF) regression improves model robustness32,33. RF is a computationally efficient method because it doesn’t require pruning, unlike other approaches. Only two parameters need to be adjusted for it to be simple: the number of trees ((:{n}_{tree})) and the number of randomly chosen predictors for every split ((:{m}_{try}))34. In general, adding more trees increases accuracy and stability, but eventually, there comes a point at which more trees are no longer able to reduce error. Typically, a standard value of (:{n}_{tree})=500 is used. In addition to strengthening trees, increasing (:{m}_{try}) also makes trees more correlated with one another35. Approximately two-thirds of the original dataset is included in each of the (:{n}_{tree}) bootstrap samples that are created during the RF process. To ensure diversity among trees, the optimal split is determined at each node using a random subset of predictors ((:{m}_{try})). While out-of-bag (OOB) samples, which are data not included in bootstrap sets, are used for validation to lower the risk of overfitting, predictions are aggregated through averaging for regression tasks. Figure 4 illustrates the application of the RF regression framework for prediction.
Flowchart of the RF model.
The echolocation method used by wild bats to find food served as the model for the BAT search algorithm. It was first presented by Yang36,37,38,39 and is used to solve several optimization issues. Every virtual bat in the original population updates its position using echolocation in a homologous fashion. Bats use a perceptual mechanism called echolocation, which produces echoes by releasing a sequence of loud ultrasonic waves. Bats can identify a particular prey by using the delays and different sound levels that these waves return. A few guidelines are being researched to expand the BAT algorithm’s structure and take advantage of bats’ echolocation traits40,41,42,43.
(a) Every bat uses echolocation features to differentiate between obstacles and prey; (b) Every bat flies at random with loudness (:{E}_{0}) and velocity (:{k}_{i}) at position (:{x}_{i}) with a fixed frequency (:{f}_{min}) varying wavelength (:lambda:) to find prey; it controls the frequency of its released pulse and modifies the rate of pulse release (:r) in the range of [0,1], depending on how close its aim is; (c) Every bat varies its frequency, loudness, and pulse release rate; (d) The loudness (:{E}_{m}^{iter}) shifts from a significant value (:{E}_{0}) to a minimum constant value (:{E}_{min}); Throughout the optimization process, each bat’s position (:{x}_{i}) and velocity (:{v}_{i}) should be specified and updated; the new solutions (:{x}_{i}^{t}) and velocities (:{k}_{i}^{t}) at time step (:t) are carried out by the following Eqs.44,45:
Where (:phi:) is a random vector selected from a uniform distribution and falls between 0 and 1, after analyzing all of the positions among all (:n) bats, the current global best location is (:{x}^{*}). One may use either (:{f}_{i}) (or (:{lambda:}_{i})) to adjust the velocity change while setting the other component, as the velocity increment is the product of (:{lambda:}_{i}) and (:{f}_{i}). Each bat is given a frequency at random for implementation, which is uniformly selected from ((:{f}_{min}),(:{f}_{max})). Following the selection of one of the existing top solutions for the local search, a random walk is used to produce a new solution for every bat locally.
Where (:t) is the average loudness of all bats at this time step and (:epsilon:) is a random value that falls between 1 and 1. The volume may be set to any convenient number since, after a bat has located its prey, the loudness typically falls while the rate of pulse emission rises. Considering that (:{E}_{min}=0) indicates that a bat has just discovered its victim and has momentarily stopped making noise, one obtains:
Where (:gamma:) is a positive constant and (:beta:) is a constant in the interval [0,1]. The loudness tends to be zero as time approaches infinity, and (:{r}_{i}^{t}) equals (:{gamma:}_{i}^{0}).
This article provides details on various statistical metrics that account for the accuracy of predicting peak times (NOPT) and power conversion efficiency (PCE) in solar energy systems. One of these metrics is the coefficient of determination (R²), which is the measure of agreement between actual and predicted values, where a number close to one indicates a strong match. For instance, a solar module with an actual PCE of 18.5% and an expected value of 18.3% will exhibit a high R², indicating a perfect match between the two values. Besides this, the root mean square error (RMSE) gives the average size of differences between the values. If we consider an example where 49 NOPT is predicted instead of the actual 50, this will have a minimal impact on RMSE. The 95% confidence level uncertainty (U95) indicates prediction stability and helps to ensure that long-term forecasts are reliable. Correspondingly, MRAE and MDAPE are measures of error in percentage that are normalized and robust. At the same time, the prediction interval coverage probability (PICP) is a criterion that checks whether the actual NOPT or PCE values fall within the model’s predicted bounds. The mathematical formulations of the employed evaluation metrics are presented in Eqs. (11) to (16).
Where, (:{t}_{i}) is observed (actual) solar energy value at instance (:i), (:{p}_{i}) is predicted solar energy value at instance (:i), (:stackrel{-}{t}) and (:stackrel{-}{p}) are the mean of observed and predicted values, respectively. (:n) denotes the total number of observations. (:left[{low}_{i},:{up}_{i}right]) are lower and upper prediction interval bounds for the (:i)th prediction, and (:{k}_{i}) demonstrates the indicator variable, equal to 1 if the observed value lies within the prediction interval, and zero otherwise.
The hybridization strategy adopted in this study is designed to enhance nonlinear pattern learning by combining the complementary strengths of different learning paradigms rather than relying on a single-model structure. Single machine learning models, such as kernel-based learners or tree-based algorithms, are effective in capturing specific types of relationships; however, they are inherently limited in representing the full complexity of photovoltaic system behavior, which is governed by highly nonlinear, nonstationary, and interacting environmental and operational variables. In the proposed hybrid framework, gradient boosting models act as strong base learners capable of capturing high-order nonlinear interactions and abrupt regime changes, while the adaptive aggregation mechanism integrates multiple weak and strong predictors to reduce bias and variance simultaneously.
This fusion enables the model to learn both global trends and localized nonlinear responses, which are common in PV systems due to fluctuating irradiance, temperature-dependent efficiency, and extreme energy generation. Hybridization improves learning performance by mitigating the weaknesses of individual models. While single models may overfit local patterns or underperform in extrapolation regions, the fusion strategy stabilizes predictions through ensemble averaging and adaptive weighting, thereby improving generalization and robustness. This is particularly important for small-to-moderate datasets, where individual learners may exhibit high variance. Furthermore, the hybrid framework enhances error correction, as other models in the ensemble can compensate for mispredictions from a single model. This mechanism explains the observed reductions in RMSE and uncertainty bounds, as well as the consistent performance across the training, validation, and test datasets. Compared to single-model baselines, the hybrid approach demonstrates superior capability in learning complex nonlinear relationships while maintaining interpretability and stability, making it especially suitable for simultaneous forecasting of NOPT and PCE. As a result, the hybridization strategy directly addresses the limitations of standalone models and provides a more reliable and scalable solution for real-world photovoltaic system forecasting.
All data preprocessing procedures, hybrid machine learning model implementations (including the XGBA framework), training scripts, and evaluation workflows used in this study were custom-developed and implemented in Python. To ensure transparency and reproducibility, the complete source code, including model configurations, parameter settings, and execution instructions, is available from the corresponding author upon reasonable request. Requests for access can be directed to: asifmmd1in@gmail.com. The code is provided for academic and research purposes without restriction.
The dataset in this research includes 305 records with seven input variables, namely Ap, Amin, Amax, Ep, Emin, Emax and nyield, and the targets for prediction are the number of peak times (NOPT) and the power conversion efficiency (PCE), expressed as percentages. The dataset, obtained from46, was partitioned into training (70%), validation (15%), and testing (15%) subsets. To ensure reproducibility and prevent data leakage, the dataset was explicitly partitioned into three mutually exclusive subsets: 70% (214 samples) for model training, 15% (46 samples) for validation, and 15% (45 samples) for independent testing. This splitting strategy was selected to provide sufficient samples for learning model parameters while reserving adequate data for unbiased hyperparameter tuning and final performance assessment. The validation subset was used exclusively for model selection and hyperparameter optimization, whereas the test subset remained completely unseen during the training and tuning phases. This strict separation ensures that reported test results reflect true generalization performance rather than memorization effects. Moreover, data splitting was performed in a deterministic, reproducible manner, and the same partitions were consistently applied across all single and hybrid models to ensure fair, transparent comparisons. This structured training–validation–testing workflow minimizes the risk of optimistic bias and aligns with best practices in machine learning–based photovoltaic performance modeling.
According to Table 1, the variables characterize various operational and environmental conditions related to solar energy systems. Specifically:
Ap expresses the peak absorption wavelength measured under standard test conditions (nm).
Amin and Amax represent the minimum and maximum absorption wavelength during the measurement period, reflecting daily and seasonal variations in sunlight exposure.
Ep denotes the peak emission wavelength (in nm) produced under the measured irradiance conditions.
Emin and Emax indicate the minimum and maximum emission wavelength across different operating regions, capturing fluctuations due to environmental and system variations.
is the absolute emission quantum yield, ranging from 0 to 100, calculated as the ratio of actual energy output to the available solar resource, reflecting system performance efficiency.
The target variables quantify predictive objectives:
NOPT, with a maximum value of 12.91%, indicates the number of optimal peak operating times for the PV system.
PCE, with a maximum of 4.36%, measures the efficiency with which solar irradiance is converted into electrical energy.
All measurements were collected using calibrated pyranometers for irradiance and standard energy meters for electrical output, ensuring accurate representation of environmental and operational conditions. Hence, this dataset not only signifies environmental variations but also captures system performance metrics, providing a solid foundation for building and validating predictive models. Before training and evaluating the predictive models, the raw dataset underwent a systematic preprocessing workflow to ensure data quality, consistency, and compatibility with machine learning algorithms. First, all input variables were normalized using min–max scaling to map values to 0–1, preventing features with larger numerical ranges from dominating the learning process. Second, missing values were handled using a two-step approach: records with minor missing entries (< 5% of the dataset) were imputed using linear interpolation based on neighboring temporal values, while records with substantial missing information were excluded to avoid introducing bias. This ensured that the final dataset retained meaningful variability without compromising integrity. Third, noise filtering was applied to smooth transient fluctuations in energy and irradiance measurements. A moving average filter with a window size of 3 was applied to the input features , , , , , and to reduce measurement noise while preserving significant trends relevant to model training.
Figure 5 shows a scatter matrix, which displays the distributions and pairwise relationships of the features in the dataset. On the diagonal, each panel represents distributions of individual variables, whereas off-diagonal plots show possible correlations and grouping between pairs of features. The nopt values are distributed mainly between 0 and 4, so most samples are within this range. Likewise, the PCE values are primarily concentrated between 0 and 4.4, consistent with their distribution in the dataset. By and large, the matrix delineates variable ranges, uncovers potential relationships (linear or nonlinear) and regions of concentration, thus giving a brief indication of feature behavior, which is handy for exploratory data analysis.
Scatter matrix plot for the distribution and relationships within the dataset across different feature subsets.
The computational complexity and training time of the proposed models were systematically analyzed to assess their practical feasibility and scalability. The runtime results clearly demonstrate the computational trade-off introduced by BA–based optimization across all models and both target variables (NOPT and PCE). In all cases, incorporating BA increased execution time by approximately 3–5 times compared with the corresponding base models, attributable to the iterative population-based search mechanism and the repeated fitness evaluations inherent to metaheuristic optimization techniques. Among the evaluated models, RBF consistently exhibited the lowest computational cost, both in its base configuration and when coupled with BA. For NOPT prediction, the RBF model required 9.47 s in the base form and 48.29 s with BA optimization, while for PCE prediction, the runtime remained similarly low (10.36 s in the base form and 52.87 s with BA). This behavior reflects the simpler mathematical structure and lower training complexity of kernel-based models, making RBF computationally efficient even under optimization. The XGBoost-based models showed moderate computational demand, with base runtimes of 16–18 s, increasing to 62–66 s after BA optimization. The additional overhead primarily stems from repeated tree construction, gradient boosting iterations, and hyperparameter evaluations during the optimization process.
In contrast, Random Forest exhibited the highest computational cost, particularly in its optimized form, with runtimes reaching 80–84 s, due to the large ensemble size, bootstrap sampling, and repeated evaluation of tree-based structures across BA iterations. From a scalability perspective, the observed computational trends indicate that training time grows approximately linearly with dataset size for RBF and near-linearly to moderately superlinearly for tree-based ensemble models. While BA-based hybridization introduces additional overhead, this cost is incurred offline during model development and optimization, whereas online inference remains computationally lightweight, enabling real-time deployment in photovoltaic monitoring systems. Regarding scalability to larger PV datasets and different climate zones, the proposed hybrid framework is inherently extensible. Larger datasets are expected to improve generalization while increasing training time proportionally, particularly for ensemble models. However, the modular design of the hybrid approach allows parallelization of BA fitness evaluations and tree construction, making it suitable for high-performance or cloud-based computing environments. Moreover, the data-driven nature of the models enables adaptation to diverse climatic conditions, provided that representative environmental and operational data from different regions are included during training.
3D wall plot illustrating the convergence behavior of the optimization process across iterations or parameters.
The random search procedure was conducted using predefined hyperparameter ranges that were selected based on model-specific constraints, prior literature, and preliminary sensitivity trials to ensure both computational feasibility and sufficient exploration of the solution space. For kernel-based hybrid models (RBBA), the length scale was sampled from a continuous logarithmic range of [10⁻³, 10¹], while the lower and upper bounds of the length scale were drawn from [10⁻⁵, 10⁻²] and [10³, 10⁶], respectively, allowing the model to capture both smooth and highly nonlinear functional relationships. For hybrid models (RFBA and XGBA), the number of estimators was randomly sampled from the interval [20, 1000], enabling evaluation of ensemble sizes from small to large. The maximum tree depth was explored within the range [5, 1000] to assess the trade-off between model expressiveness and overfitting risk, while the minimum number of samples required to split a node was sampled from [2, 150] to regulate tree granularity and stability. For boosting-based hybrids (XGBA), the learning rate was sampled from the continuous range [0.01, 0.9] to balance convergence speed and generalization performance. In addition, the column sampling rate per tree (colsample_bytree) was varied within [0.5, 1.0] to enhance feature diversity and reduce correlation among trees, and the number of leaves was sampled from [10, 100] to control the complexity of individual boosting trees.
From a computational standpoint, the random search was executed for a fixed budget of 200 independent hyperparameter evaluations per model–target pair, ensuring consistent and fair optimization across all frameworks. Each candidate configuration was trained on the training subset and evaluated exclusively on the validation subset using RMSE and R² as the primary selection criteria. Table 2 summarizes the hyperparameters optimized for the hybrid models used to predict solar energy targets, specifically NOPT and PCE. The hyperparameters are length scale, length scale bounds (lower and upper), number of estimators, maximum tree depth, minimum samples required to split a node, learning rate, colsample by tree for NOPT, and number of leaves for PCE. The model’s flexibility, complexity, and learning dynamics are controlled by these parameters, which in turn aim to achieve prediction accuracy as the ultimate goal. For example, the RBBA model has a length scale of 3.9516 for NOPT and 2.1531 for PCE, indicating the degree of smoothness of the underlying regression function. In tree-based hybrid models, the number of estimators for RFBA and XGBA are 321 and 246 for NOPT, and 846 and 21 for PCE, respectively, so that the differences in ensemble size and their effects on predictive performance are clear. All experiments were conducted under identical computational settings to ensure fair model comparison and reproducibility. The implementations were executed on a workstation equipped with an Intel® Core™ i7 processor, 32 GB RAM, and a 64-bit Windows operating system. The models were implemented using Python (v3.9) with Scikit-learn, XGBoost, and NumPy libraries, which are widely adopted in ML research.
To address concerns regarding potential overfitting due to the small dataset size (305 samples), a 5-fold cross-validation procedure was implemented on three representative single models: RBF, RF, and XGB. Table 3 shows the 5-fold cross-validation results for the single models. The 5-fold results demonstrate consistent performance across folds, indicating robust generalization ability.
Table 4 summarizes the comprehensive performance of both single and hybrid models in forecasting the NOPT and PCE. The evaluation used a suite of statistical indicators, including R², RMSE, PICP, U95, MRAE, and MDAPE, to ensure a rigorous assessment of predictive accuracy and reliability. Among the single models, the XGB framework consistently outperformed RBF and RF, yielding the lowest error rates across RMSE, MRAE, and MDAPE, which highlights its superior ability to approximate the underlying solar energy dynamics. Nevertheless, the hybrid configurations markedly advanced the prediction quality beyond that of the standalone models. In particular, the XGBA model achieved exceptional results, with R² values of 0.9954 for NOPT and 0.9970 for PCE, thereby capturing nearly all variability observed in the actual system behavior. Furthermore, its minimal uncertainty values (U95 = 0.5346 for NOPT and 0.1526 for PCE) underscore the robustness and stability of its forecasts. These outcomes demonstrate that the XGBA model not only minimizes deviation from ground truth but also ensures reliable and consistent estimations, which are indispensable for effective scheduling, energy resource allocation, and risk reduction in solar energy management. Collectively, the results affirm the superiority of hybrid learning strategies, particularly XGBA, in providing both accuracy and resilience for practical decision-making in renewable energy systems.
Figure 7 shows a comparative graphical representation of the effectiveness of each model using the evaluation metrics, and it identifies the gap between single and hybrid models in terms of NOPT and PCE forecasting. With the value of the metric R² taken as an example, we can determine that the RF model is shown by having the lowest correlation; hence, it has a lower predictive ability; in short words, the respective model’s predictions deviate more from the actual peak operating times and PCE measured in real solar energy systems. As for RMSE, all the models yield lower errors for PCE than for NOPT, indicating that power conversion efficiency is a more accurate predictor than the number of peak times. The same direction can be drawn from the U95 values, which reveal that the predictions are more stable for PCE. On the other hand, MRAE and MDAPE scores are higher for PCE, which signifies that the values of relative and percentage errors are greater for peak time predictions. As for PICP, RBF, XGBA, and XGB are the models that allow the highest coverage for PCE, while XGB and RBBA are the best performers for NOPT, which indicates that these models offer the most reliable probabilistic forecasts in real-world solar energy applications.
Performance evaluation of the developed models using key statistical metrics. The best-performing models were selected based on their superior accuracy and reliability.
Figure 8 compares the scatter plots for NOPT and PCE predictions, which depict the degree to which the models’ predictions are accurate. The points representing the hybrid XGBA model are very close to the best-fit line. They are mostly located within the ± 15% deviation lines, showing that there is a good correlation between the predicted and actual values. In NOPT, this means that the model can accurately predict the optimal peak operating times. With PCE, the forecast is close to the actual power conversion efficiency of solar modules. From a financial perspective, such dependable projections enable solar farm managers and investors to schedule energy production more accurately, thereby making better use of resources and allowing for a higher level of confidence in revenue estimation. Correct predictions of peak times and efficiencies become the basis for making operational decisions that involve the organization of maintenance, energy trading, and capacity planning, all of which lead to a reduction in the economic risk and an increase in the overall profitability.
Scatter plot of predicted versus actual values on the test dataset, showing the performance of the selected models.
Table 5 provides an overview of the statistical comparisons of the best hybrid models for both NOPT and PCE targets in the testing phase. The values of NOPT that were measured vary from 0.1 (min) to 10.12 (max), with 4.3182, 3.8950, and 2.8322 being the mean, median, and standard deviation, respectively. The XGBA model is closest to these statistics, indicating that it not only captures the most frequent but also the extreme variations in the number of optimal peak operating times. The range of measured PCE values is from 0.15 to 4.1, with mean, median, and standard deviation being 1.7836, 1.7250, and 0.9239, respectively. All models offer minimum predictions that are in agreement. In contrast, the XGBA model achieves the maximum value (4.0875), which is closest to the measured maximum, indicating good model performance under peak efficiency conditions. The outcome of this study is that the hybrid models are helpful for solar energy systems as they not only depict the normal performance but also the peak outputs, and therefore, the operators can use the energy scheduling to achieve maximum revenue and minimize financial uncertainty by being able to predict the periods of high energy generation and efficiency.
Figures 9 and 10 show the significant prediction errors for each model and the NOPT and PCE targets. The XGBA model has the narrowest line through the origin, suggesting that almost all of its predictions are very close to the actual values. To be more specific, in Fig. 9, the error of the XGBA model is very close to zero, which is the range of -5 to + 5, while other models have errors in much wider ranges. This high accuracy enables the prediction of any number of peak operating times and power conversion efficiency with surprising accuracy. From an investor’s point of view, such accurate predictions are extremely valuable: they enable solar power investors and managers to estimate likely energy output and efficiency with a high degree of confidence, enabling them to better allocate resources, plan maintenance activities more effectively, and predict revenues more accurately. As a result, models such as XGBA can reduce financial risk, increase profit potential, and enhance decision-making in solar energy projects.
Histogram showing the distribution of prediction errors for the selected models.
Line plot of prediction errors for the selected models.
Table 6 presents the results of Dunn’s post hoc test for pairwise model comparisons alongside the Durbin–Watson (DW) statistics to assess the reliability and independence of model residuals. Dunn’s post hoc test is a non-parametric method used for multiple pairwise comparisons following a Kruskal–Wallis test and was selected because the performance metrics, such as RMSE and R², do not necessarily follow a normal distribution. This test evaluates whether the differences in model performance are statistically significant. The Durbin–Watson statistic measures autocorrelation in the residuals, with values ranging from 0 to 4; values close to 2 indicate no significant autocorrelation, values below 2 suggest positive autocorrelation, and values above 2 indicate negative autocorrelation. In this study, the XGBA model shows DW values of 1.9274 for both NOPT and PCE, which is very close to 2, confirming that the residuals are statistically independent. This independence implies that the model predictions are reliable and not biased by systematic correlation in the data. In contrast, several single or hybrid models exhibit DW values substantially below or above 2, indicating residual correlation and potentially less reliable predictions. Together, Dunn’s post hoc test and DW statistics provide a rigorous assessment of model validity: the former confirms that XGBA’s performance differences are statistically robust, while the latter demonstrates that the residuals are independent, supporting the model’s robustness and generalization capability.
Table 7 presents the confidence intervals (CIs) for RMSE and MDAPE across the training, validation, and testing phases for all single and hybrid models, for both target variables, NOPT and PCE. These intervals provide an explicit measure of prediction uncertainty and offer insight into the statistical stability and reliability of each modeling framework beyond pointwise performance metrics. During the training phase, all models exhibit relatively narrow confidence intervals, indicating stable learning and limited dispersion in prediction errors. For NOPT prediction, the hybrid models—particularly XGBA—show comparatively tighter RMSE and MDAPE intervals, suggesting more consistent error distributions than single-model counterparts. A similar trend is observed for PCE, where hybrid models demonstrate reduced uncertainty bounds, reflecting improved robustness during model fitting. During the validation phase, the confidence intervals slightly widen across all models, as expected, since predictions are evaluated on unseen data used for hyperparameter tuning.
Nevertheless, hybrid models maintain narrower CI ranges than single models for both RMSE and MDAPE. This behavior indicates enhanced generalization capability and reduced sensitivity to data variability, reinforcing the effectiveness of hybridization strategies in controlling prediction uncertainty. In the testing phase, confidence intervals widen further, reflecting realistic uncertainty under fully unseen data conditions. Despite this, the XGBA model consistently exhibits balanced, relatively compact CI ranges for both NOPT and PCE, demonstrating reliable performance and controlled error dispersion. The comparable CI widths across training, validation, and testing subsets indicate the absence of severe overfitting and confirm the statistical stability of the proposed hybrid framework.
Figure 11 shows the Taylor diagram for the difference between measured and predicted values. The RBF-based models outperform the other approaches for both nopt and PCE, achieving the highest correlation coefficients and standard deviations closest to the measured data, which results in the lowest overall error. Tree-based and ensemble models (RF, XGB, and their variants) capture general trends but show noticeable variance mismatch and reduced correlation, especially for PCE. Overall, the Taylor diagrams confirm the RBF model’s superior robustness and generalization, particularly in representing the system’s nonlinear behavior.
Taylor diagram for the difference between measured and predicted values.
Figures 12 and 13 present a combined sensitivity analysis assessing the impact of the input variables on the output variables NOPT and PCE, respectively. As per Fig. 12, the FAST sensitivity analysis identifies as the variable with the most significant influence on NOPT, exhibiting an 1 value of 1.45, while is the leading variable for PCE predictions with an 1 of 2.2. This indicates that the extreme values of generated electrical energy strongly govern both the optimal timing of peak operation and the PV system’s efficiency. In physical terms, reflects periods of minimal energy generation, which critically limit the identification of optimal peak times, whereas corresponds to the highest achievable energy output, directly affecting power conversion efficiency. Furthermore, the accumulated local effects (ALE) study portrayed in Fig. 13 reveals the possible influence of each variable on the model outputs, along with lower and upper confidence intervals for NOPT and PCE predictions. These analyses highlight that, in addition to and , variables such as also significantly contribute, reflecting the direct impact of solar irradiance on system performance. Physically, higher irradiance levels increase energy production and efficiency, while variations in the minimum and maximum energy values determine the system’s operational window and efficiency ceiling.
The different ranking of feature importance between FAST and ALE arises from their distinct perspectives: FAST captures global variance contributions, while ALE highlights local and conditional effects. For example, shows the greatest influence on NOPT in FAST because variations in minimal energy generation dominate overall prediction variance, whereas ALE indicates that has stronger local effects on PCE, reflecting its direct impact on peak conversion efficiency under high irradiance conditions. These findings not only unveil the most sensitive parameters but also provide actionable insights for PV system operators: by understanding which energy extremes and irradiance levels most strongly affect system performance, resource allocation, system design, and maintenance schedules can be optimized to maximize energy yield. This connection between model sensitivity and real-world PV behavior enhances the interpretability and practical relevance of the predictive framework.
FAST Sensitivity analysis depicting the effect of input variables on the model output.
Sensitivity analysis for the impact of input variables on the model’s output based on the ALE method.
Despite the high predictive accuracy and robustness of the proposed hybrid model, several limitations exist. First, the study primarily relies on historical PV systems and meteorological data, which may limit model performance under entirely new climatic scenarios or rapidly changing environmental conditions. Second, while the hybrid framework demonstrates strong accuracy and interpretability, exploring more advanced deep learning architectures, such as Transformers or Graph Neural Networks, was beyond the current scope. Third, uncertainty quantification was performed using standard evaluation metrics, but probabilistic forecasting and real-time adaptive prediction were not fully addressed.
Future work includes:
Integration of advanced reinforcement and deep learning-based hybrid models (e.g., Transformers, GNNs) to capture complex temporal and spatial dependencies in PV systems.
Development of probabilistic and real-time adaptive forecasting approaches to improve reliability under dynamic environmental conditions.
Expansion of the framework to include larger and more diverse PV datasets, enhancing generalization and practical applicability.
Further exploration of explainable AI techniques to deepen physical insight and improve transparency for operational decision-making.
These directions aim to enhance both the predictive performance and practical deployment of hybrid PV forecasting models in real-world energy systems.
In addition, the current dataset and modeling framework do not explicitly account for environmental disturbances, such as dust accumulation, humidity, partial shading, or soiling, which are known to influence photovoltaic system performance in real-world deployments. The absence of such factors may limit the generalizability of the predictions to field conditions where these disturbances occur. Nonetheless, the selected input variables, including , , , , , , and , indirectly reflect cumulative environmental effects on system performance. For example, variability in irradiance and energy output may partially capture the influence of transient shading or atmospheric conditions. To enhance applicability in operational settings, future studies should integrate additional environmental monitoring data, including humidity levels, dust deposition rates, soiling factors, and shading patterns. Incorporating these features into hybrid machine learning models can improve predictive robustness, reduce uncertainty under extreme or variable conditions, and increase the reliability of NOPT and PCE forecasts for real-world PV systems. This limitation does not diminish the current study’s contribution, as the framework provides a robust baseline for forecasting PV system performance under nominal environmental conditions and can readily be extended to include more complex environmental variables in subsequent research.
Beyond numerical accuracy, the proposed forecasting framework can be directly integrated into the operational workflow of real PV plants as a decision-support tool. In a practical deployment scenario, the trained model can be embedded within a plant energy management system to provide day-ahead or intra-day predictions of NOPT and PCE based on real-time or forecasted environmental inputs. Specifically, NOPT predictions enable operators to identify time windows during which the PV system operates at maximum effectiveness, supporting informed scheduling of load management, grid interaction, and energy storage charging or discharging. Accurate PCE forecasting allows continuous assessment of system health and performance degradation, facilitating early detection of faults, soiling, or suboptimal operating conditions. When predicted PCE deviates from expected values under similar irradiance and energy conditions, maintenance actions can be prioritized proactively.
Furthermore, the sensitivity analysis results provide actionable physical insight for system optimization. The dominance of variables such as and indicates that energy extremes critically influence both operational timing and efficiency, suggesting that operational strategies should focus on mitigating low-energy periods and maximizing utilization during high-energy intervals. This information can guide inverter control strategies, energy storage dispatch, and plant design adjustments, such as panel orientation or capacity planning. From an economic perspective, integrating the proposed model into PV plant operation can reduce uncertainty in energy yield forecasting, improve scheduling efficiency, and support more reliable participation in energy markets. The framework is scalable and adaptable to different plant sizes and climatic regions, making it suitable for both utility-scale PV plants and distributed solar installations. As a result, the proposed approach bridges the gap between high-accuracy data-driven modeling and practical, real-world PV system management.
Table 8 compares the proposed XGBA model with recent hybrid PV forecasting studies. Xu et al.47 combined EEMD decomposition, XGBoost, LSTM, and Snake Optimization for PV power prediction, achieving reduced errors but focusing only on power series without addressing optimal operating points or efficiency. Renold et al.19 integrated TCN, LSTM, and GRU networks for short-term PV forecasting, improving accuracy and computational efficiency. Wang et al.48 applied a stacking strategy of gradient-boosted and deep networks for solar irradiance and generation, achieving R ≈ 0.99. Tanyıldızı and Ağır49 combined LSTM with SVM for very short-term PV forecasting, reporting R ≈ 0.9823 and RMSE ≈ 0.0300, demonstrating hybridization benefits over single models. The proposed XGBA model surpasses these approaches, achieving R² up to ~ 0.997 and very low RMSE for both NOPT and PCE. Unlike prior studies, it forecasts both optimal peak times and power conversion efficiency, offering broader applicability, robust generalization, and low prediction uncertainty.
This study introduced a framework to accurately predict solar energy parameters, including the number of optimal peak operating times (NOPT) and power conversion efficiency (PCE), using hybrid machine learning models optimized through the Bat Algorithm (BAT). Based on hyperparameter tuning, the performance of each model, including Radial Basis Function (RBF), eXtreme Gradient Boosting Regression (XGBR), and Random Forest Regression (RFR), was improved by exploring the parameter space and achieving optimal predictive results with fewer iterations of the algorithm. The dataset consisted of 305 records with seven features, including solar irradiance (Ap, Amin, Amax), electrical energy output (Ep, Emin, Emax), and normalized energy yield (nyield), which collectively represented the environmental and operational conditions that influence solar energy systems. Numerical evaluation of the hybrid models highlighted the superiority of the XGBA model. Specifically, XGBA reduced the RMSE of the single XGB model in predicting NOPT by 40.155% and decreased the U95 value for PCE by 135.58%, demonstrating that this model was more accurate, stable, and robust across both targets. These enhancements suggested that the hybrid system was capable of predicting both average and extreme weather conditions, supporting effective management and scheduling of solar energy. In addition, three sensitivity analysis procedures were used to determine the effects of input variables on the models’ outputs. The FAST sensitivity analysis identified as the most crucial variable for NOPT, whereas for PCE, the highest first-order effect (1) corresponded to and the highest total effect (ST) to . These outcomes provided valuable insights regarding the drivers predominantly affecting solar energy performance and enabled informed decision-making. In general terms, the union of hybrid modeling, BAT optimization, and rigorous sensitivity analysis provided a stable, understandable, and highly performing system for predicting solar energy parameters and supporting strategic planning for solar energy projects..
Data will be provided upon reasonable requests, and codes can be accessed in the GitHub repository (https://github.com/AsifMd-10/Accurate-Forecasting-of-Photovoltaic-Optimal-Points-and-Efficiency).
Photovoltaic
Organic photovoltaics
Power conversion efficiencies
Artificial neural network
Explainable AI
Local interpretable model agnostic explanations
Power conversion efficiency
Accumulated local effects
Radial basis function
Classification and regression tree
Coefficient of determination
95% Confidence level uncertainty
RBF + BAT
XGBR + BAT
Lowest Energy Generation
Normalized energy yield
Sustainable development goals
Non-fullerene acceptors
Internal quantum efficiencies
Machine learning
SHapley additive explanations
Number of optimal peak operating times
Bat algorithm
Fourier amplitude sensitivity testing
eXtreme gradient boosting regression
Random forest regression
Root mean square error
Prediction interval coverage probability
RFR + BAT
Efficient electrical energy produced
Highest energy generation
Direct solar irradiance
Zhang, W. Main Contributions, Applications and Future Prospect of PV, In MATEC Web of Conferences, vol. 386, p. 3012. (2023).
Castellano, N. N., Salvador, R. M. G., Rodriguez, F. P., Fernandez-Ros, M. & Parra, J. A. G. Renewable energy: the future of photovoltaic energy, In Living with Climate Change, Elsevier, 373–396. (2024).
El-Sheekh, M. M., El-Nagar, A. A., ElKelawy, M. & Bastawissi, H. A. E. Bioethanol from wheat straw hydrolysate solubility and stability in waste cooking oil biodiesel/diesel and gasoline fuel at different blends ratio. Biotechnol. Biofuels Bioprod. 16 (1), 15 (2023).
Article CAS PubMed PubMed Central Google Scholar
El-Din, H. A., Elkelawy, M. & Yu-Sheng, Z. HCCI engines combustion of CNG fuel with DME and H 2 additives. SAE Tech. Paper, (2010).
El Shenawy, E. A., Bastawissi, H. A. E. & Shams, M. M. Enhancement of the performance and emission attributes for the diesel engine using diesel-waste cooking oil biodiesel and graphene oxide nanofluid blends through response surface methodology. Mansoura Eng. J. 49 (5), 8 (2024).
Google Scholar
Elkelawy, M., El Shenawy, E. A., Bastawissi, H. A. E., Mousa, I. A. & Ibrahim, M. M. A. R. Analyzing the influence of design and operating conditions on combustion and emissions in premixed turbulent flames: A comprehensive review. J. Eng. Res. 8 (1), 34 (2024).
Google Scholar
Aboubakr, M. H., Elkelawy, M., Bastawissi, H. A. E. & El-Tohamy, A. R. A technical survey on using oxyhydrogen with biodiesel/diesel blend for homogeneous charge compression ignition engine. J. Eng. Res, 8, 1, (2024).
Elbanna, A. M., Cheng, X., Yang, C., Elkelawy, M. & Bastawissi, H. A. E. Investigative research of diesel/ethanol advanced combustion strategies: A comparison of premixed charge compression ignition (PCCI) and direct dual fuel stratification (DDFS). Fuel 345, 128143 (2023).
Article CAS Google Scholar
Cui, Y. et al. Over 16% efficiency organic photovoltaic cells enabled by a chlorinated acceptor with increased open-circuit voltages. Nat. Commun. 10 (1), 2515 (2019).
Article ADS PubMed PubMed Central Google Scholar
Yuan, J. et al. Single-junction organic solar cell with over 15% efficiency using fused-ring acceptor with electron-deficient core. Joule 3 (4), 1140–1151 (2019).
Article CAS Google Scholar
Liu, Q. et al. 18% efficiency organic solar cells. Sci. Bull. 65 (4), 272–275 (2020).
Article CAS Google Scholar
Karuthedath, S. et al. Intrinsic efficiency limits in low-bandgap non-fullerene acceptor organic solar cells. Nat. Mater. 20 (3), 378–384 (2021).
Article ADS CAS PubMed Google Scholar
Chen, M. et al. Influences of non-fullerene acceptor fluorination on three-dimensional morphology and photovoltaic properties of organic solar cells. ACS Appl. Mater. Interfaces. 11 (29), 26194–26203 (2019).
Article CAS PubMed Google Scholar
Wang, X. et al. Tuning the intermolecular interaction of A2-A1-D-A1-A2 type non-fullerene acceptors by substituent engineering for organic solar cells with ultrahigh V Oc of ~ 1.2 V. Sci. China Chem. 63 (11), 1666–1674 (2020).
Article CAS Google Scholar
Du, X. et al. Efficient polymer solar cells based on non-fullerene acceptors with potential device lifetime approaching 10 years. Joule 3 (1), 215–226 (2019).
Article MathSciNet CAS Google Scholar
Şahin, F., Işik, G., Şahin, G. & Kara, M. K. Estimation of PM10 levels using feed forward neural networks in Igdir, Turkey. Urban Clim. 34, 100721 (2020).
Article Google Scholar
Sahin, G., Isik, G. & van Sark, W. G. Predictive modeling of PV solar power plant efficiency considering weather conditions: A comparative analysis of artificial neural networks and multiple linear regression. Energy Rep. 10, 2837–2849 (2023).
Article Google Scholar
Hamad, S. A., Ghalib, M. A., Munshi, A., Alotaibi, M. & Ebied, M. A. Evaluating machine learning models comprehensively for predicting maximum power from photovoltaic systems. Sci. Rep. 15 (1), 10750 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Renold, A. P., Sinha, N. & Gao, X. Z. Hybrid machine learning approach for improved short-term PV power forecasting accuracy. Results Eng. 28, 107374. https://doi.org/10.1016/j.rineng.2025.107374 (2025).
Article Google Scholar
Keddouda, A. et al. Solar photovoltaic power prediction using artificial neural network and multiple regression considering ambient and operating conditions. Energy Convers. Manag.. 288, 117186 (2023).
Article Google Scholar
Kumari, P. & Toshniwal, D. Extreme gradient boosting and deep neural network based ensemble learning approach to forecast hourly solar irradiance. J. Clean. Prod. 279, 123285 (2021).
Article Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv Neural Inf. Process. Syst., 30, (2017).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘ Why should i trust you?’ Explaining the predictions of any classifier, In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. (2016).
Chen, Z., Xiao, F., Guo, F. & Yan, J. Interpretable machine learning for Building energy management: a state-of-the-art review. Adv. Appl. Energy. 9, 100123 (2023).
Article Google Scholar
Scott, C., Ahsan, M. & Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 278, 127807 (2023).
Article Google Scholar
Bhutta, M. S. et al. Optimizing solar power efficiency in smart grids using hybrid machine learning models for accurate energy generation prediction. Sci. Rep. 14 (1), 17101 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Ridha, H. M. et al. A novel hybrid photovoltaic current prediction model utilizing singular spectrum analysis, adaptive Beluga Whale optimization, and improved extreme learning machine. Renew. Energy. 256, 123878 (2026).
Article Google Scholar
A. H., Alavi, A. H., Gandomi, M., Gandomi & Sadat Hosseini, S. S. Prediction of maximum dry density and optimum moisture content of stabilised soil using RBF neural networks. IES J. Part. Civ. Struct. Eng. 2 (2), 98–106. https://doi.org/10.1080/19373260802659226 (2009).
Article Google Scholar
Heshmati, R. A. A., Alavi, A. H., Keramati, M. & Gandomi, A. H. A radial basis function neural network approach for compressive strength prediction of stabilized soil, In Road Pavement Material Characterization and Rehabilitation: Selected Papers from the 2009 GeoHunan International Conference, pp. 147–153.https://doi.org/10.1061/41043(350)20 (2009).
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf. Process. Syst, 30, (2017).
Breiman, L., Friedman, J., Olshen, R. & Stone, C. Classification and regression trees–crc press. Boca Raton Florida, (1984).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M. & Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818 (2015).
Article Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomforest. R News. 2 (3), 18–22 (2002).
Google Scholar
Peters, J. et al. Random forests as a tool for ecohydrological distribution modelling. Ecol. Modell. 207, 304–318. https://doi.org/10.1016/j.ecolmodel.2007.05.011 (2007).
Yang, X. S. A new metaheuristic bat-inspired algorithm, in Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), Springer, 65–74. (2010).
Yang, X. S. Bat algorithm for multi-objective optimisation. Int. J. Bio-Inspired Comput. 3 (5), 267–274 (2011).
Article Google Scholar
Bora, T. C., Coelho, L. & Lebensztajn, L. Bat-inspired optimization approach for the brushless DC wheel motor problem, IEEE Trans. Magn., vol. 48, no. 2, pp. 947–950, (2012).
Yang, X. & Hossein Gandomi, A. Bat algorithm: a novel approach for global engineering optimization. Eng. Comput. 29 (5), 464–483 (2012).
Article Google Scholar
Taha, A. M. & Tang, A. Y. C. Bat algorithm for rough set attribute reduction. J. Theor. Appl. Inf. Technol. 51 (1), 1–8 (2013).
Google Scholar
Rakesh, V., Aruna, S. B. & Raju, T. D. Combined economic load and emission dispatch evaluation using BAT algorithm. Int. J. Eng. Res. Appl. 3 (3), 1224–1229 (2013).
Google Scholar
Ramesh, B., Mohan, V. C. J. & Reddy, V. C. V. Application of Bat algorithm for combined economic load and emission dispatch. Int. J. Electr. Eng. Telecommun.. 2 (1), 1–9 (2013).
Google Scholar
Yang, X. S. & He, X. Bat algorithm: literature review and applications. Int. J. Bio-inspired Comput. 5 (3), 141–149 (2013).
Article Google Scholar
Biswal, S., Barisal, A. K., Behera, A. & Prakash, T. Optimal power dispatch using BAT algorithm, In 2013 International conference on energy efficient technologies for sustainability, pp. 1018–1023. (2013).
Kheirollahi, R. & Namdari, F. Optimal coordination of overcurrent relays based on modified BAT optimization algorithm. Int. Electr. Eng. J. 5 (2), 1273–1279 (2014).
Google Scholar
Ferreira, R. A. S. et al. Predicting the efficiency of luminescent solar concentrators for solar energy harvesting using machine learning. Sci. Rep. 14 (1), 4160 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, Y., Ji, X. & Zhu, Z. A photovoltaic power forecasting method based on the LSTM-XGBoost-EEDA-SO model. Sci. Rep. 15 (1), 30177 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, J., Zhang, Z., Xu, W., Li, Y. & Niu, G. Short-Term photovoltaic power forecasting using a Bi-LSTM neural network optimized by hybrid algorithms. Sustainability 17 (12), 5277 (2025).
Article ADS Google Scholar
Gao, J., Cao, Q., Chen, Y. & Zhang, D. Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting, arXiv Prepr. arXiv2406.03808, (2024).
Download references
Department of Electronics and Communication Engineering, GLA University, Mathura, 281406, India
Anjan Kumar
Department of Electrical and Electronics Engineering, Vardhaman College of Engineering, Hyderabad, India
Md Asif
College of Engineering, Applied Science University, Al Eker, Kingdom of Bahrain
Malak Naji
Department of Electrical and Electronics Engineering, School of Engineering and Technology, JAIN (Deemed to be University), Bangalore, Karnataka, India
B. Spoorthi
Department of Electronics & Communication Engineering, Siksha ’O’ Anusandhan (Deemed to be University), Bhubaneswar, 751030, Odisha, India
Badri Narayan Sahu
Department of Electrical and Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
S. Radhika
College of Technical Engineering, the Islamic University, Najaf, Iraq
Marwea Al-hedrewy
College of Technical Engineering, the Islamic University of Al Diwaniyah, Al Diwaniyah, Iraq
Marwea Al-hedrewy
Department of General Science, Mamun University, Khiva, Uzbekistan
Egambergan Khudaynazarov
Faculty of Technology, Urgench State University, Urgench, Uzbekistan
Hayitov Abdulla Nurmatovich
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
A.K. conceived the research idea, designed the methodology, and supervised the overall study. M.A. (corresponding author) managed the project administration, coordinated data collection, and led the manuscript preparation. M.N. performed data preprocessing, feature engineering, and contributed to model development. S.B. implemented the machine learning algorithms, carried out the computational experiments, and validated the results. B.N.S. conducted the statistical analyses, model evaluation, and contributed to the interpretation of findings. S.R. prepared the figures, visualizations, and supported the development of the sensitivity analysis. M.A.-h. contributed to literature review, background formulation, and technical editing of the manuscript. E.K. supported the experimental design, reviewed the technical content, and contributed to refining the methodology. H.A.N. contributed to result interpretation, proofreading, and preparation of the final draft. All authors reviewed and approved the final manuscript.
Correspondence to Anjan Kumar.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
Kumar, A., Asif, M., Naji, M. et al. Accurate forecasting of photovoltaic optimal points and efficiency using advanced hybrid machine learning models. Sci Rep 16, 8197 (2026). https://doi.org/10.1038/s41598-026-39031-3
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-39031-3
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.