Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 15, Article number: 3337 (2025)
5533
14
10
Metrics details
Reliable prediction of photovoltaic power generation is key to the efficient management of energy systems in response to the inherent uncertainty of renewable energy sources. Despite advances in weather forecasting, photovoltaic power prediction accuracy remains a challenge. This study presents a novel approach that combines genetic algorithms and dynamic neural network structure refinement to optimize photovoltaic prediction. This methodology dynamically adjusts the neural network parameters during training, including the number of neurons, transfer functions, weights, and biases, to minimize the root mean square error. Evaluation was performed on twelve representative days using annual, monthly, and seasonal data, and a comparison was made with multiple linear regression and nonlinear autoregressive neural network models, demonstrating the approach’s effectiveness. Evaluation metrics such as mean square error, R-value, and mean percentage error reveal promising prediction accuracy. MATLAB is used for modeling, training, and testing, and a real 4.2 kW PV plant is used for validation. The results indicate significant improvements, with mean square errors as low as 20 W on cloudy days and 175 W on sunny days. The proposed methodology achieves prediction versus target regressions consistency, with R values ranging from 0.95824 to 0.99980, highlighting its efficiency in providing reliable predictions of PV power generation.
The global agenda for sustainable development, exemplified by initiatives such as the ‘Energy and the Green Deal’ strategy, underscores the need for secure, affordable, and environmentally friendly energy systems. At the core of this vision is integrating renewable energy sources into the grid, facilitating the transition to decarbonization, and improving energy efficiency1,2. To this end, power generation systems based on renewables are shown as a clear alternative for the energy transition towards decarbonization of cities3 and electrification of electrically isolated systems4. Solar photovoltaic (PV) stands out among these sources5, offering considerable potential for decentralized power generation and urban electrification6.
Despite the benefits of PV systems, accurately predicting their power generation remains a challenge. Reliable predictions are key to optimizing the management of energy systems, in particular, given the inherent variability of renewable sources7,8,9. This challenge has led to the search for advanced optimization and forecasting methodologies to improve prediction accuracy and support effective energy planning and management10.
Artificial neural networks (ANNs) have emerged as a powerful tool for addressing complex prediction tasks in areas of control11, pattern recognition12, or prediction13 due to their self-learning and adaptive capabilities. Due to their ability to capture nonlinear relationships within data, ANNs have been widely used in various fields, including power generation forecasting14. Previous studies have demonstrated the effectiveness of ANNs in prediction tasks, often outperforming traditional statistical models, such as15 through a data-driven performance analysis of a residential building16, using a multi-stage neural network approach to improve accuracy in daily insolation prediction, resulting in a reduction of the average error from 30 to 20%, and17 exploring the improvement of accuracy in hourly solar radiation predictions.
In addition, recent research suggests the integration of ANN with bio-inspired algorithms. For instance, the study of18 provides an analysis of how the weights of ANNs can be automatically updated by applying bio-inspired algorithms, mainly using the Particle Swarm Optimization (PSO) optimization algorithm, grasshopper optimization algorithm, and Grey Wolf Optimization (GWO). This bio-inspired approach has been used to evolve the weights of ANNs and find a particular architecture of ANNs in this field of work. A similar approach is followed by19 employing different evolutionary techniques to improve the voltage profile generated by the electrical network. An additional case that shows the advantages of these hybridizations is20, which presents a hybrid model that combines a neural network with the PSO algorithm to predict the biomass required in a biomass gasification plant. This bio-inspired approach uses PSO to improve the efficiency of the neural model, enabling better estimation of energy demand in AC microgrids. Similarly21, presents a hybrid model that combines different neural architectures and Markov chain analysis to improve the accuracy of electric load prediction in smart cities.
Genetic Algorithms (GAs) are particularly among bio-inspired algorithms, recognized for improving prediction accuracy in various contexts. GAs excel in multi-objective optimization and have been widely and successfully applied in multiple domains, including optimizing and controlling renewable energy systems. This effectiveness is evidenced by a comparative analysis conducted in22 on PID controllers for power converters, where GAs and other techniques, such as PSO and the GWO algorithm, stand out for their ability to tune these controllers efficiently. Moreover, GAs have proven highly effective in the energy management of multi-microgrid systems23. A noteworthy example of their applicability is in the structural configuration optimization of heat exchangers for commercial electric vehicles, where a multi-objective genetic algorithm has been applied to improve thermal and hydraulic performance24. In the mechanical engineering field, GAs are used to enhance the performance of pump units such as turbines in storage mode25. In addition, GAs are a valuable tool in remaining useful life prediction, as demonstrated by the study of26, where a GA-based method was developed to select optimal sets of useful shapes with supervised learning. Also, in the context of power supply system optimization, an improved adaptive genetic algorithm has been proposed to design efficient and reliable systems27.
Furthermore, various methods have been employed in the PV power generation prediction field, each with distinct advantages and challenges. On the one hand, physical models have been widely explored, as evidenced in the study entitled28, which evaluates various physical and thermal models for forecasting PV power production. However, research such as29 indicates that statistical approaches based on historical data analysis can outperform the predictive accuracy of physical models. Similarly, the study30 examines the performance of linear regression and ANN methods, finding that the latter can provide better results. In addition to ANNs, other machine-learning methods have been investigated for predicting solar power generation, as discussed in31. However, interpreting the resulting models can be more complex than physical models, affecting their accuracy, generalization capability, and computational efficiency. Despite their advantages, physical models and statistical and machine learning methods may require detailed calibration and may not fully capture the complexity of system behavior32. There are still works that develop methodologies to guide the selection of ANNs that can best perform prediction in PV systems. However, these studies do not consider the hybridization and integration of such methods with intelligent optimization algorithms in this field33. Therefore, this study proposes a hybrid approach integrating machine learning techniques with intelligent algorithms.
The combination of machine learning and intelligent algorithms can go beyond these applications, bringing together two areas that are currently undergoing a research boost and which can improve the systems above. The combined use of these two fields can enhance the accuracy of predictions compared to simple model performance34, enhance the search capability, convergence speed, and accuracy of algorithms35, increase efficiency and system performance36, and adaptability and scalability of the obtained results37. Remarkably, the synergy of the combined use of GA and ANN is proving promising solutions through studies for economic predictions, such as carbon trade forecasting38, in the field of the automatic prediction modeling of data degradation in nuclear energy39 or the management simulation of isolated microgrids40.
Nevertheless, while integrating machine learning and intelligent algorithms offers significant potential, there are still challenges in optimizing modeling techniques for complex nonlinear systems33. The study, therefore, focuses on addressing the challenges in accurately predicting solar PV power generation, a complicated task due to the inherent variability of renewable energy sources and the complexity of nonlinear power systems. This study aims to provide a new methodology for estimating the behavior of renewable generation systems, particularly PV systems. The innovation lies in developing a hyperparameter optimization model for feedforward artificial neural networks (FF-ANN) using GA techniques. The use of FF-ANN is due to the successful results in different areas of prediction, with better results than with other standard methods41, its capability to work with complex nonlinear systems42, particularly in the neural domain, offering very high computational power at low costs43.
Emphasis is also placed on prediction time horizons in the PV forecasting domain. On the one hand, short-term forecasting44, which ranges from minutes to hours, plays a significant role in the operational management of power grids. This type of forecasting is fundamental for power dispatch scheduling and immediate response to generation fluctuations caused by sudden weather changes. Its ability to provide an agile and efficient response significantly improves grid stability and reliability. In contrast, medium-term forecasting45, spanning days to weeks, is mainly used in the production and maintenance planning of PV installations and in the strategic management of the purchase and sale of energy in the market. This time perspective enables resource optimization and informed business decisions. On the other hand, long-term forecasting46, extending from months to years, is important in designing and sizing new PV infrastructures and guiding long-term strategic planning in investment, economic, and environmental impact assessment.
Although previous studies use neural networks for solar energy prediction, this study introduces an innovative methodology that dynamically optimizes the neural network structure in the field under study, significantly improving prediction accuracy. In addition, an adaptive optimization approach is incorporated that allows the neural network to adjust to changes in weather and power generation conditions dynamically, thus improving prediction accuracy. Furthermore, the forecasts proposed in this study are necessary for energy management, mitigating the effects of the inherent variability of renewable sources on the power grid. In addition to the significant technical challenges, including the need to achieve high accuracy and fast response, advanced forecasting and optimization methods, such as those developed in this study, are required. Improving the accuracy of these forecasts contributes directly to strengthening the stability and operational efficiency of the power grid, which are key aspects in the growing integration of renewable energies.
To evaluate the effectiveness of the proposed methodology, the study analyzes the performance in a wide range of weather conditions and training data scenarios, testing different times of the year with various meteorological characteristics. Additionally, a sensitivity analysis is performed to identify the most influential parameters on the accuracy of the predictions, allowing further optimization of the model performance and a better understanding of the factors affecting solar PV power generation. Finally, cross-validation of the model is carried out using multiple data sets and training techniques, ensuring the results’ robustness, replicability, and applicability to a wide range of scenarios. Moreover, performance will be compared using different training input data sets, starting with annual data and moving to seasonal and monthly data. These elements highlight the study’s innovation and significant contribution to the solar PV power generation prediction field.
Besides, the objective function of the proposed model is defined by the Root Mean Square Error (RMSE) due to its outstanding performance in optimization algorithms47,48. The RMSE is a robust metric that allows the accuracy of the predictions made by a model to be evaluated, penalizing more significant errors more severely. This feature is especially useful in contexts where considerable errors are sought to be minimized to improve model efficiency. In addition, RMSE is widely used in the scientific literature and in practical applications, which facilitates the comparison and validation of results with other existing studies and models49.
Furthermore, this evaluation uses the base ANN forecast, the multiple linear regression (MLR), and nonlinear autoregressive neural network (NAR) models as benchmarks. The application of such models is due to their acknowledged recognition in the field of prediction, allowing the new methodology to be compared with established approaches. The statistical MLR model represents a classic and widely used method for linear prediction, supported by its proven effectiveness in predicting solar energy production, specifically in contexts where a limited data set is available50,51,52. While machine learning NAR offers an additional capability to capture nonlinear relationships in complex data and changing environments, it is an established strategy in the given context53,54,55. This approach thoroughly evaluates the proposed methodology’s consistency and effectiveness in different situations and contexts.
The organization of the described work consists of the following sections: Section “Methodology” explains the facility under study, the factors influencing the power generation prediction in PV plants, the data processing, the methodology followed for the combination of GA and ANN, and the evaluation methodology. Section “Results” shows the predictions obtained, the results concerning their evaluations, and the performance of the GA and ANN. Section “Discussion” assesses the results obtained. Finally, Section “Conclusions” outlines the conclusions of the research.
The methodology section provides a detailed overview of several study aspects; see Fig. 1. It begins by examining the factors that influence power generation in PV plants. This includes an analysis of critical variables in understanding the power generation process: solar hour, temperature, solar irradiance, PV energy, dew point, humidity, wind speed, pressure, precipitation rate, and accumulated precipitation. The following subsection focuses on data processing techniques employed in the study, including data normality, correlation, and normalization analysis. Following that, the section delves into the structure of the genetic algorithm utilized for optimization purposes and outlines the process of fitness value calculation. Moreover, the design of the network topology, specifically the FF-ANN, is presented, including the number of neurons, weight matrices bias, and transfer functions (transferFcn) optimized. The subsequent subsection discusses the model approach, which integrates the abovementioned tools, depicted through a graphical representation. Next, an explanation of the statistical MLR and machine learning NAR models employed is provided. Then, the evaluation approach is outlined, encompassing metrics such as RMSE, coefficient of determination (R), mean percentage error, and computational time, providing a comprehensive assessment of the model’s performance. Lastly, a comparison of the different forecasting methodologies existing in the state of the art is made.
Methodology flowchart.
The training and objective prediction data come from the measurements made at the facility in Fig. 2, located in Valencia, Spain. The system consists of a grid-connected PV rooftop household installation, possibly feeding surplus power into the grid and receiving power when PV generation cannot meet domestic demand. The installation was commissioned in 2020, but the weather recording began in April 2021.
PV installation.
The PV installation consists of 12 monocrystalline 350Wp panels (Table 1), equivalent to 4.2 kWp, divided into two strings of six panels each, connected to each MPPT input of the inverter (Table 2).
The electronic data precision equipment (weather station) is the Datasol MET. Table 3 shows the technical specifications of the sensors associated with the variables that significantly impact the prediction: the temperature sensor and the irradiance sensor.
A weather station that measured various parameters, including temperature, dew point, humidity, wind speed, pressure, precipitation rate, accumulated precipitation, and solar irradiance, was used for meteorological data collection. This weather station sends the information to the Wunderground portal, which operates as a database through the Personal Weather software. Data related to PV power generation was collected from the PV inverter. These data are transmitted and stored in the FusionSolar software database.
Energy generation in PV plants is predominantly influenced by meteorological factors, thereby introducing uncertainty to PV energy generation. It is widely acknowledged that the energy generation of PV plants is closely tied to local weather conditions56. While there is a certain degree of regularity between power generation and meteorological data, the overall relationship is rather complex. Thus, the strength of the relationships between the measured variables will be studied using correlation analysis, explained in the following subsection.
The data selection for the training of an ANN has a significant influence on its performance. This is why the data used has been measured from May 1st, 2021, to April 30th, 2022, i.e., 105,120 historical data of temperature, dew point, humidity, wind speed, wind direction, pressure, precipitation rate, accumulated precipitation, solar irradiance, and PV energy generation. Several analyses of the collected data have been conducted to make accurate predictions: filtering, normality test, correlation analysis, and data normalization.
To ensure the precise operation of the tools (GA and ANN), the measured data was filtered, discarding those when there was no PV energy generation. It is important to note that several studies have been carried out in PV power forecasting to improve model performance. Many of these studies have chosen not to use nighttime prediction data due to their insignificance in power generation. This approach has been observed in several studies, such as28,57,58. Following this same methodology, the present study has also decided to exclude nighttime data from the predictions. Filtering the night-time values in the input data is due to several reasons:
Improves model accuracy: Eliminating nighttime values allows for the exclusive focus on power generation patterns during the active hours of the PV system. This facilitates the identification of specific trends and patterns related to PV power generation.
Reduction of noise and redundancy: By removing nighttime values from the input data, the introduction of noise into the model is avoided, and information redundancy is reduced. At night, PV power generation is zero, which means that the data corresponding to this period does not provide relevant information to predict power generation during the day. Introducing this data could mislead the model and affect its ability to identify meaningful patterns during the active hours of the day.
Improvement of computational efficiency: By reducing the size of the data set, eliminating nighttime values improves the computational efficiency of the model. This enhances the model’s ability to predict daytime energy production accurately.
This allows the model to focus exclusively on generation patterns during the active hours of the PV system, which can enhance its ability to predict power production during the day accurately.
Before simulating and evaluating the FF-ANN, performing a normality analysis of the input data is necessary. The Anderson–Darling normality test was conducted to determine if the data followed a normal distribution, considering a significant p-value of 0.05. If the p-value is less than 0.05, it is considered that the data do not follow a normal distribution.59. To carry out the test, the following equations must be followed: (1) and (2) from Table 460.
where (n) is the number of samples, (i) is the ith order observation, and (F(x)) is the cumulative distribution function.
Since the aim is to know if the data follow a normal distribution, the following equation is used:
where (n) is the number of samples.
The p-value is calculated depending on the result obtained in A, following the guidelines below:
A correlation assessment measures the strength and direction of the association between two variables. The result obtained in the Anderson–Darling normality test affects the type of correlation to be performed. If the data follow a normal distribution, Pearson’s method should be applied; on the other hand, Spearman’s method should be applied if the data do not follow a normal distribution. In this case, Spearman’s method is used, in which the correlation coefficient is a measure that varies in the range from − 1.0 to + 1.0, and its interpretation is as follows61:
Scores close to + 1 indicate a strong and positive correlation between the variables analyzed.
Scores close to − 1 indicate a strong and negative correlation between the variables analyzed.
Scores close to 0 indicate the absence of a linear correlation between the variables. Alternatively, there may be another type of correlation, but not a linear one.
Spearman’s correlation method is evaluated from Eq. (3).
where (n) is the number of samples, and ({d}_{i}) is the difference in ranks of the ith element.
To further improve the prediction accuracy, preliminary filtering of the data samples and elimination of singular data are necessary to avoid prediction errors. This is followed by normalization processing of the data in the range [0–1]. Hence, learning is accelerated, and the ANN prediction is improved. The formula used is as follows (4):
where ({x}_{i}) is the sampling data, ({x}_{min}) represents the lowest value observed within the data sequences. In contrast, ({x}_{max}) represents the highest value observed within the data sequences.
GAs are a search heuristic method inspired by the natural selection process that is viable for solving both constrained and unconstrained optimization problems. It mimics the mechanism of natural selection via biological evolution. GAs iteratively modify a population consisting of individual solutions. At each iteration, the algorithm selects certain individuals from the current population to serve as parents, and these parents are utilized to generate offspring for the subsequent generation. The population gradually “evolves” through successive generations toward an optimal solution. This algorithm emulates the natural selection process, whereby the most adaptive individuals are chosen for reproduction. By leveraging the genetic algorithm, it becomes possible to tackle mixed integer programming problems where certain components are subject to integer constraints62.
According to the study, the population of the GA is the different parameterizations that the ANN may assume; that is, each population is defined by the following individuals: number of neurons, transfer functions, weights, and biases, representing the architecture of the ANN within the scope. At the start of the algorithm, the initial population comprises a random set of parameter settings for the ANN. The evolutionary algorithm proceeds to iterate through successive generations, gradually improving the quality of the solutions. In each generation, the quality of each individual in the population is evaluated using a fitness function, which, in the proposed case, is the RMSE. This is a widely accepted and used practice in the literature; this choice is based on its ability to provide a clear and objective measure of model performance, enabling comparison and evaluation in different scenarios63,64,65,66.
The fittest individuals are more likely to reproduce and produce offspring for the next generation. During this optimization process, the crossover and mutation genetic operators are applied when creating a new generation of individuals (the former to combine the characteristics of two individuals in the population and the latter to introduce genetic variability into the population from random genetic modifications), thus allowing the search operators to explore the search space in search of optimal solutions effectively. Furthermore, it is challenging to predefine specific ratios for each parameter, especially for mutation and crossover operators. According to the state of the art67, and after testing various mutation and crossover operators, it was observed that the training results of the ANN were more favorable for the values defined in Table 4, with a much smaller mutation operator compared to the crossover operator. The population size was determined according to the number of variables to be optimized, which depends on the number of neurons involved. The maximum number of neurons was limited to 100 since the simulations showed that the best results were achieved with fewer neurons. As for the maximum number of generations, it was set to 100 times the population size. The maximum stagnation generation criterion of the algorithm was limited so that the algorithm stops if the average relative change in the value of the best-fit function is less than or equal to 1e-6. The algorithm continues to iterate through generations until a predefined stopping criterion is reached. This criterion can be a maximum number of generations, set to 100 times the number of individuals squared, thus ensuring a reliable search in the global minimum search space. Alternatively, the algorithm can stop if a minimum improvement in the quality of the solution is reached, that is, a minimum improvement in the mean relative change in the value of the best fitness function equal to or less than 1e-6. This threshold is lowered to ensure convergence to the global minimum (Table 5).
FF-ANNs are ANNs that process information in one direction, from the input to the output layer. They consist of an input, hidden, and output layer. The input layer is the first layer of the network; it is formed by input neurons that receive the initial data to the system for processing in the following layers of the ANN. The hidden layers are responsible for processing the information and transforming it into a form the output layer can use. The hidden layer is the final layer of the ANN that generates the ANN output/prediction based on the input data and the computations performed by the hidden layer68.
The hidden layer of an FF-ANN can comprise one or multiple neurons. The optimal number of neurons is task-dependent, and its prudent selection significantly impacts performance, the ability to learn intricate patterns and training time. Furthermore, Input Weights (IW), Layer Weights (LW), and biases play a significant role in the performance of FF-ANNs. IW corresponds to the weights connecting the input layer to the first hidden layer, while LW represents the weights connecting neurons within a layer to neurons in the subsequent layer. Biases denote values added to the weighted sum of inputs for each neuron in a layer before undergoing an activation function. These parameters have been optimized using a GA to enhance the ANN’s performance. Equations (5) and (6) show the output layer during the forward pass69.
where (j) is the (j) th node in the hidden layer, (k) is the (k) th node in the output layer, (y) is the output, (b) is the bias, (w) is the connection (weight) strength between nodes, and (g) is the activation function.
Conversely, transfer functions are another crucial aspect to consider. The transferFcn property defines the activation function employed by the network’s neurons. It takes the weighted sum of inputs for a neuron and applies a nonlinear transformation to generate the neuron’s output. The transferFcn plays an essential role in the ANN by facilitating learning and enabling predictions based on input data. MATLAB provides built-in activation functions that can serve as transferFcn in an ANN. The available options are presented in Table 6. The choice of transferFcn relies on the specific task and the ANN’s structural configuration. It is possible to assign a transferFcn for each layer within the network, enabling different activation functions to be used in various network parts.
It is important to emphasize that an analysis of the influence of different input data configurations on the neural network has been carried out. This analysis has included the comparison of the results obtained by training the network using aggregated data sets at the annual, seasonal, and monthly levels; the reason for this analysis is given by the following:
The annual training approach may allow for capturing long-term trends and patterns in solar power generation throughout the year, providing an overview of system behavior over time.
Seasonal training may allow for analyzing seasonal variations in solar power generation, considering changes in weather and environmental conditions at different times of the year.
Monthly training may allow for individually examining short-term variations in solar power generation within each month, which enables capturing more specific and detailed patterns in system behavior.
By using these different training approaches, a more complete and detailed understanding of the performance of the proposed model on various time scales is sought. Moreover, to ensure consistent training and reliable evaluation of ANN performance, a data distribution of 70% for training, 15% for testing, and 15% for validation has been used to provide a balanced representation of the data sets and to assess the generalizability of the model comprehensively.
This study employs a combined approach, through GA and ANN, to enhance the performance of an ANN for PV energy generation forecasting by optimizing its parametrization. Figure 3 shows the methodology employed in the proposed study.
GA-FFANN model structure.
Firstly, the measurements made on the actual PV installation are normalized, filtered, and divided into data sets; an FF-ANN is then created, and the GA initializes the population, consisting of several neurons, transferFcn, LW, IW, and biases. The computed information is used to calculate the fitness value. Consequently, the population defined by the GA is used to parameterize the FF-ANN, while the normalized data are used to train the ANN. The RMSE corresponding to the training phase is then calculated from the obtained values.
Secondly, if the RMSE is not the lowest or not the last generation, the GA procedure continues by applying the selection operator, the crossover operation, and the mutation operator to create a new generation and update the GA population. This iterative procedure continues until the GA obtains a minimum value for the fitness function or reaches the last generation.
On the contrary, if it is the last generation or the RMSE is lower than the previous generation, the GA results are extracted to configure an optimal FF-ANN architecture, the weather characteristics of the target day are introduced to the neural network, to predict the PV power generation. Finally, the estimates versus target evaluation metrics are calculated: RMSE, R-value, and mean percentage error.
Moreover, the measurements of twelve days have been employed for the PV generation prediction. These days have been chosen because each of the twelve days selected represents a different month of the year, which allows for analyzing how the model behaves concerning the seasonal variability of the input data. In addition, several criteria were considered when selecting the evaluation days. First, days with very stable PV generation conditions were sought to be included, which allowed for evaluation of how the model handled predictable and consistent situations (from May to September). Second, days with low power generation were included (January, February, and December), which allowed the model to be assessed for its ability to predict generation under low solar irradiance conditions. Finally, days with abrupt fluctuations in power generation were also selected (March, April, October, and November), which allowed evaluation of the model’s ability to adapt to rapid changes in weather conditions.
Regarding the data employed, after identifying solar irradiance and temperature as the most influential variables for the prediction of solar PV generation, the data set for the optimization comprised 6,402 records for solar irradiance, 6,402 records for temperature, and 6,402 records corresponding to solar PV generation measurements obtained using the power meter. Subsequently, 288 input data for the solar irradiance variable and 288 for temperature were used in the prediction stage. Moreover, according to the data partitioning scheme described in the current state of the art70, 75% of the available data was used for the network training, while 15% was allocated for testing and another 15% was selected for validation purposes.
MLR is a commonly used statistical method for analyzing the relationship between multiple predictor variables and a response variable. In the context of solar power forecasting, the MLR model estimates PV power production as a function of various predictors, such as solar radiation and ambient temperature. Thus, historical PV power generation data and relevant meteorological variables are collected to implement the MLR model. These data are used to train the MLR model, where the regression coefficients are adjusted to minimize the prediction error. Once trained, the model can forecast future PV power production as a function of weather conditions.
The methodological approach based on the MLR model employs a series of coefficients. These are adjusted using the ordinary least squares method, which finds the model coefficients that minimize the sum of the squares of the differences between the observed and predicted values. Equation (7) expresses the multiple linear regression model.
where (y) is the response variable, ({X}_{1}), ({X}_{2}),…, ({X}_{k}) are the predictor variables, and ({beta }_{0}), ({beta }_{1}),…, ({beta }_{k}) are the coefficients of the model71. In the study context, ({X}_{1}) and ({X}_{2}), will be used, the former representing solar radiation and the latter representing ambient temperature.
The methodological approach applied in predicting PV power generation using the NAR model is based on the ability of neural networks to model nonlinear relationships in time series. This strategy focuses on time series forecasting, using a recurrent dynamic network based on a linear autoregressive model with feedback connections. It is assumed that the present behavior of the variable of interest will explain its future behavior, which is reflected in the nonlinear function used to calculate the next value based on the previous steps of the output signal, as illustrated in Eq. (8).
where (y) represents the PV data series over time (t), (d) is the input delay of the data series, and (f) denotes a transfer function54.
While training the NAR model, the historical time series of PV power generation is used as input, which allows the model to learn specific patterns and behaviors of the PV plant under study. The model’s ability to capture the dynamic and nonlinear relationships in the data is optimized by adjusting the neural network parameters. The typical architecture of a NAR model includes feedback connections that enable the use of predictor variables to predict future values of the response variable, which gives the model the ability to capture feedback effects and time dependencies in the time series of PV power generation.
Different metrics have been used to evaluate the ANN optimization and prediction thoroughly. For this purpose, the RMSE, R-value, relative mean percentage error, and computation time.
It is a measure that evaluates the difference between the prediction versus target of the model. Low RMSE indicates better model performance72. Formula (9) shows its calculation procedure.
where the output obtained from the ANN is represented by the symbol ({o}_{predicted}), whereas ({o}_{target}) refers to the target value obtained from the experimental data. The symbol (N) represents the total number of samples used.
A linear regression analysis between forecast and target PV power, calculating the R-value and plotting its results, has been conducted to evaluate the model’s performance. The R = 1 factor reflects the quantity and quality of available data for training the ANN and the strength between the selected input and output variables during the training process73.
It is used as a metric to quantify the variation between the predicted and actual values of the model, presented as a percentage relative to the actual value. It is employed to assess the precision of the model; a low mean percentage error value means superior model performance74.
At night, when the PV power generation is zero and any of the implemented models has a different value, the relative percentage error is considered maximum, i.e., 100% or − 100%, as appropriate, as shown in the maximum and minimum values in Fig. 5.
This parameter is measured with the other metrics to determine whether GA suits the proposed purpose. The lower the computational time with high-accuracy results, the better.
The code associated with the approach followed can be accessed via the Harvard Dataverse repository (link in the data availability statement).
This section shows the main findings of the research paper. The section is divided into four subsections: variable correlation analysis, weather conditions, comparison between forecasted versus target values, and model performance.
The variables to be analyzed in this study are solar hour, temperature, solar irradiance, dew point, humidity, wind speed, pressure, precipitation rate, precipitation accumulated, and PV power. The results of this analysis will help to determine the strength of the relationship between the different variables.
Before the correlation analysis, an Anderson–Darling Normality test was performed to ensure the normality of the data measured. For this purpose, the p-value has been calculated for each group of measurements, obtaining results in the order of 0.0005. According to the test criteria, p-values less than 0.05 do not follow a normal distribution. Hence, Spearman’s correlation method has been used.
The results of the correlation analysis are shown in Table 7. The degree of strength of the relationships of the meteorological variables with the PV power measurement reflects the parameters that condition the PV power generation. Consequently, they serve to define the meteorological variables that enter the FF-ANN.
From the results of the Spearman correlation coefficient matrix, it can be deduced that the most influential variables in the generation of solar PV energy are, from highest to lowest contribution, solar irradiance, temperature, wind speed, humidity, dew point, solar hour, precipitation rate, precipitation accumulated and pressure, some of them having an inverse relationship. However, only two variables significantly impact PV power output: solar irradiance and temperature.
The prediction performance of twelve days, one from each month of an entire year, has been compared to evaluate the methodology’s effectiveness. Table 8 shows the weather summary for each simulated day.
Considering the variables that most affect PV power generation, it is concluded that the days between October and April are cloudy, while the rest are sunny days.
Figure 4 compares predicted versus target PV generation for the base ANN prediction, the three types of GA-FFANN training data: annual, seasonal, and monthly, versus the MLR and NAR models. This comparison is performed twelve times, once each month, to visualize different types of PV generation curves with varying weather conditions.
Comparison forecast versus Target PV power. (a) 01/09/2023, (b) 02/10/2023, (c) 03/06/2023, (d) 04/10/2023, (e) 05/09/2022, (f) 06/06/2022, (g) 07/11/2022, (h) 08/15/2022, (i) 09/05/2022, (j) 10/10/2022, (k) 11/14/2022, and (l) 12/10/2022.
In the following graphs, the best-performing methodologies are highlighted in bright colors, clearly and easily identifying the most effective approaches. Conversely, the methods that have shown inferior performance are represented with lighter colors and thicker lines, thus facilitating their visual differentiation and avoiding confusion between the curves. In addition, the scale used in the graph corresponds to the maximum scale of all simulated scenarios (2.5 kW), ensuring that the evaluations of the predictions are performed under the same conditions. This scale unification is necessary to provide a balanced and accurate comparison of each methodology’s performance in the scenarios analyzed.
Different days with different weather conditions have been chosen, one day each month, as shown in Fig. 4, to evaluate the adaptability of the ANN and its proper optimization through GA in different scenarios and with varying forecasting models. In this way, different PV power generation profiles can be studied.
Examining Fig. 4 individually, it is observed that the prediction models that best fit the actual data are the GA-FFANN annual train, GA-FFANN seasonal train, and NAR. These models show outstanding ability to predict PV power generation. However, evaluating all the results is essential, considering individual predictions and aggregated results.
Figure 5 shows the relative mean percentage error between the prediction and measurement curves for the twelve days simulated and for each training dataset of the base ANN forecast, GA-FFANN: annual, seasonal, and monthly, and for commonly used MLR and NAR models. In such figures, the best-performing methodologies have been highlighted in bright colors, whereas the poorest-performing methodologies have been depicted in less bright colors. It is important to note that negative error values indicate that the prediction is lower than the actual measurement. In contrast, positive values reflect that the prediction is higher than the observed measurement. The desired goal is for the error values to be as close to zero as possible, as this indicates a prediction that is very close to reality. This approximation is necessary to assess the accuracy and reliability of the analyzed methodologies.
Target versus Output relative percentage error. (a) 01/09/2023, (b) 02/10/2023, (c) 03/06/2023, (d) 04/10/2023, (e) 05/09/2022, (f) 06/06/2022, (g) 07/11/2022, (h) 08/15/2022, (i) 09/05/2022, (j) 10/10/2022, (k) 11/14/2022, and (l) 12/10/2022.
Analyzing the results in Fig. 5, it is highlighted that the base ANN and NAR model shows a higher prediction difficulty in simulations with abrupt fluctuations and low power generation characteristics than the GA-FFANN annual train model. The percentage errors for the base ANN, NAR, GA-FFANN monthly train, and MLR models can exceed 50%, indicating a lower accuracy in predicting these scenarios. On the other hand, the rest of the models trained with annual and seasonal data show a greater capacity to anticipate these challenging scenarios with higher accuracy.
Regarding estimating the most favorable training data type for PV power generation prediction in the proposed model versus the base ANN, MLR, and NAR models, Table 9 shows its performance by assessing the prediction RMSEs.
Furthermore, when analyzing the average RMSE of all the scenarios evaluated in Table 9, it is observed that the GA-FFANN model with annual data presents the lowest value (24.17 W), followed by the MLR model (53.50 W), GA-FFANN with seasonal data (58.58 W), then the NAR model (68.83 W), followed by the GA-FFANN model with seasonal data (71.50 W). The least favorable result corresponds to the base ANN (218.92 W). These results indicate that the GA-FFANN model with annual data is the most accurate overall, with the lowest RMSE.
The R coefficient values for the different models and scenarios represented in Fig. 6 are detailed in Fig. 6 and Table 10, where the coefficient of determination (R) values are presented; it is highlighted that the GA-FFANN model shows the best match between the real and simulated values. The models are ordered from least to best R: base ANN, NAR, GA-FFANN monthly data, GA-FFANN seasonal data, MLR, and GA-FFANN annual data. This indicates that the GA-FFANN model, especially when trained on annual data, has the most outstanding ability to predict PV power generation accurately.
Comparison of scatter plots of each model for each forecasted day. (a) 01/09/2023, (b) 02/10/2023, (c) 03/06/2023, (d) 04/10/2023, (e) 05/09/2022, (f) 06/06/2022, (g) 07/11/2022, (h) 08/15/2022, (i) 09/05/2022, (j) 10/10/2022, (k) 11/14/2022, and (l) 12/10/2022.
The wide range of scenarios evaluated, covering various environmental conditions and influencing factors, enables a thorough evaluation of the predictive performance of each model in different contexts.
Moreover, to facilitate visual interpretation of Fig. 6, those better-performing methodologies have been highlighted in bright colors, whereas less prominent methodologies have been represented in less striking colors. It must be noted that the closer the points on the scatter plot are to a straight line with a slope equal to 1, the higher the prediction accuracy. This is the main objective, as an alignment close to this line indicates that the predictions are consistent with the real measurements, thus reflecting the accuracy and reliability of the methodology employed.
In Fig. 6 and Table 10, where the coefficient of determination (R) values are presented, it is highlighted that the GA-FFANN model shows the best match between the real and simulated values. The models are ordered from least to best R: base ANN, NAR, GA-FFANN monthly data, GA-FFANN seasonal data, MLR, and GA-FFANN annual data. This indicates that the GA-FFANN model, especially when trained on annual data, has the most outstanding ability to predict PV power generation accurately.
Finally, the base ANN train computing time is 4 min and 47 s. Meanwhile, the forecasting time is 1 s on average for all cases. The computing times of GA training optimizations in the ANN models have been obtained for each training data set, as well as the GA (annual, seasonal, and monthly) (Table 10) and the forecasting times of the optimized ANN for each simulated day (Table 11).
This section compares various state-of-the-art PV energy prediction methodologies, evaluated in terms of the RMSE and R-ratio, between measured and predicted values. Table 12 summarizes the results of these benchmarks, highlighting the performance of different approaches in various test cases with different PV system capacities.
Specific cases indicate that RNN-LSTM and QT-MARF consistently achieve low RMSE and high R coefficients, outperforming other methods such as IAMFN, CNN-LSTM, and CNN-GRU. It is also observed that, although methods such as ELM and SVR offer competitive results, neural networks are the most effective for accurately predicting PV power generation.
This benchmark provides a comprehensive overview of the effectiveness of various PV prediction methodologies and highlights the importance of selecting the right approach based on system capacity and accuracy requirements. The findings indicate that advanced techniques based on recurrent neural networks offer significant advantages for accurate prediction of PV power generation.
This research focuses on developing a methodology to optimize PV power generation prediction by integrating ANN and GA. PV power generation forecasting is performed through an FF-ANN. On the one hand, the best training dataset for PV generation curve prediction from annual, seasonal, and monthly data is evaluated. On the other hand, the correct parameterization of ANNs is essential for achieving good learning capability and producing accurate results. However, this task is complex due to several factors, such as the large number of parameters to be adjusted, the interdependence of the parameters, and overfitting. Therefore, this work proposes using GA to optimally configure a predictive ANN tuning the number of neurons in the hidden layer, transferFcn, LW, IW, and biases.
Regarding the correlation analysis, it is deduced that solar irradiance intensity is the most crucial factor affecting the output power of PV plants (cc = 0.99), followed by the ambient temperature (cc = 49); the remaining variables have not been considered to have a considerable impact on the PV power output. For this reason, a 2-input ANN has been modeled. Moreover, the results of optimizing the ANNs based on GAs for each training methodology show a low optimal number of neurons, consistently equal to or less than 8. Also, the optimization results in the most suitable transfer function for most cases being ‘elliotsig’, which is named after an Elliot symmetric sigmoid transfer function.
Concerning the forecasted days, since one day has been chosen for each month of the year, scenarios that present a wide variety of weather conditions are explored, for example, days with high solar irradiance (06/06/2022), cloudy days (02/10/2023), or rainy days (10/10/2022).
The development of this work compares the measured PV energy generation with that predicted by ANN, GA-FFANN optimized in training for each month of the year, besides the MLR and NAR models. Observations show that in all months, the expected curve follows a steady increase similar to the measured curve at the beginning, and after reaching a peak, a gradual reduction of the energy. The energy generation in all curves is between 7:00 h and 20:00 h. Regarding the GA-FFANN, for January, February, March, May, and June, the predicted curves are very close to the actual generation; however, for the rest of the months, the training with monthly data is quite different from the simulated scenarios, having a worse optimization and prediction performance. The methodology for GA-FFANN optimization and energy prediction shows a good performance for the training with annual and seasonal data. Furthermore, concerning the literature methods, it is observed that the MLR always performs a lower prediction value than the rest of the simulated models.
The comparison between measurement and prediction can be extended through mean percentage errors at each instant. Optimization through training with annual data proved to have a lower relative daily mean percentage error than the other options. In contrast, using monthly data has resulted in worse performance in any of the predictions performed, as it can barely achieve 0% relative errors, as opposed to the annual and seasonal data optimizations. Concerning the NAR method, in all scenarios, there are very significant mean percentage errors in hours with no solar radiation; this is caused because the PV generation prediction should be zero while the model predicts low values. Concerning the MLR, this also occurs, although to a lesser extent, in addition to observing that in both methods of the prediction literature, the mean relative percentage errors are generally higher than the optimization performed with the proposed methodology. The base ANN model has one of the worst performances in the evaluation metrics presented. This model shows the importance of data filtering since it is observed that during nighttime hours, the model continues to erroneously predict PV generation, which distorts the performance of a PV forecasting ANN.
Additionally, the prediction capacity of the ANN optimized by GA is close to actual measurements, with minimum RMSEs of 13.4 W for the prediction with monthly data for March, 31.8 W for the forecast with seasonal data for February, and 15.6 W for the prediction with annual data for August. To evaluate which of the five methodologies has had a better performance, the average RMSEs obtained are 24 W, 59 W, 72 W, 53 W, 69 W, and 219 W for the annual, seasonal, monthly GA-FANN methodologies, MLR, NAR, and base ANN respectively, being the most favorable the first one. This may be due to the data provided during training, which allows the ANN greater adaptability than in the other two training cases, and the remarkable adaptability of the GA-FFANN performance in this proposed application type.
On the other hand, the regression analysis of prediction versus target has shown varying results. The aim is to obtain R-values as close to 1 as possible. Considering them, the lowest regressions are for October and March, when the days are cloudy and rainy when weather conditions are more unstable and fluctuating. Therefore, the optimization and prediction are more complex for cloudy days than sunny days with all simulated models, but favorable results with the proposed methodology are still obtained.
Regarding the overall performance of each model, the GA-ANN is trained annually, seasonally, and monthly; although these models require a considerable amount of computational resources due to optimization with GA and face challenges in adapting to abrupt and unanticipated changes in weather conditions, they have demonstrated greater adaptability and accuracy compared to other models. In contrast, the MLR model shows limited predictive capability, especially in capturing complex nonlinear relationships, and exhibits errors at low radiation by predicting low rather than zero values during hours without solar radiation. The NAR also suffers from significant errors at low radiation and has difficulty adapting to changing conditions. Lastly, the ANN without optimization shows suboptimal performance due to the lack of optimization and mispredicts power generation during nighttime hours. Despite the aforementioned limitations, the GA-ANN model has proven superior in most evaluations, standing out for its ability to provide more accurate predictions and improved reliability against input data variations.
Comparing the benchmark results with the GA-FFANN model, it is observed that the proposed model shows superior performance in terms of RMSE and R coefficient. For example, for the day 01/09/2023, the GA-FFANN achieves an RMSE of 20W and an R of 0.99851, while the best benchmark method, QT-MARF in case 1 (1600W), has an RMSE of 43W and an R of 0.99599. This indicates that the proposed model reduces the error by less than half and improves R. Moreover, for day 03/06/2023, the GA-FFANN presents an RMSE of 33W and an R of 0.99945, compared to the RNN-LSTM in case 3 (2000W), which has an RMSE of 30W and an R of 0.99750. Although the RNN-LSTM shows a slightly lower RMSE, the correlation coefficient of the proposed model is significantly better. Furthermore, on days such as 08/15/2022, the proposed model achieves an RMSE of 16W and an R of 0.99976, compared to the best benchmark performance in case 4 (1500W) with RNN-LSTM, which has an RMSE of 20W and an R of 0.99715. This again shows that the proposed model not only has a lower prediction error but also a better R. Summing up, in the remaining cases, the comparison indicates that the proposed GA-FFANN model not only remains competitive against the best methods reported in the literature but mostly, especially with annual data, outperforms the RNN-LSTM and QT-MARF based methods in accuracy. For example, the proposed model obtains RMSE from 16 to 38W and R from 0.99850 to 0.99976, while the best benchmark methods have RMSE from 20 to 90W and R from 0.93140 to 0.99800. This performance highlights the GA-FFANN’s effectiveness for accurate PV power generation prediction, offering a reliable solution that provides advantages for renewable energy systems applications.
Finally, the GA computation time varies according to several factors: population size (number of neurons, transfer functions, weights, and biases), the complexity of the transfer function, and computational capacity of the computer (in this case, an Intel® Core™ i5 processor has been used). The training computation time varies between 1 and 5 h. The forecasting computation time is much lower since it only takes a few seconds. These computational times are scalable; if a computer with a higher computational capacity were available, they would be decreased.
This study introduces a novel approach employing a GA-based ANN to enhance the accuracy of PV power plant forecasting. The algorithm dynamically modifies the ANN’s architecture during training to minimize the RMSE. Various parameters, including the number of neurons, transfer functions, weights, and biases, are incorporated into the optimization function. Twelve representative days were selected for analysis to assess the ANN’s efficacy, utilizing annual, seasonal, and monthly input training data. The proposed methodology’s performance has been evaluated using a data acquisition system implemented in an actual PV generation facility, considering different weather conditions (sunny and cloudy, rainy). Moreover, a comparison with three training methodologies (annual, seasonal, and monthly) has been carried out, showing that the PV prediction performance of ANNs can be improved by using GA to optimize their parametrization.
Additionally, a correlation analysis of the meteorological variables with the most decisive influence on PV power generation was necessary to carry out the study. As a result, it was obtained that the factors with the most significant are solar irradiance (cc = 0.99) and temperature (cc = 0.47). Furthermore, the forecasting RMSE calculation has shown that the training methodology with better performance has been for annual data sets, most likely due to the large amount of input data provided during training enabling a better adaptation of the ANN to meteorological changes, as well as indicating that the model exhibits a more remarkable ability to capture the complex relationships present in annual temporal data compared to those of a seasonal or monthly nature. Also, the prediction versus target regression analysis helps to understand that optimization and prediction are more complex in cloudy scenarios (between October and March), with minimum regressions of 0.958241 on 12/10/2022 and instead, maximum regressions of 0.99931 on 05/09/2022.
Upon further comparison of results, the GA-FFANN model reveals a clear superiority in the PV power generation prediction context over other well-known models used in the forecasting literature, such as ANN, the MLR, and NAR models. The RMSE analysis showed that the GA-FFANN annual training obtained significantly lower errors in the prediction of solar power generation. The R values also reflected these differences, where GA-FFANN achieved R coefficients closer to 1 than the base ANN, MLR, and NAR models. These findings support the superiority of GA-FFANN in accurately and efficiently predicting solar PV power generation compared to the traditional ANN, MLR, and NAR models.
Finally, the outstanding results demonstrate the favorable performance of GAs in optimizing ANNs to predict PV power generation. The ability of GAs to dynamically adapt the neural network architecture during training, thus minimizing the RMSE, highlights their effectiveness in this context. These promising results suggest that the proposed integration can be a powerful and effective tool for improving the prediction accuracy of PV power generation, which has significant implications for the efficiency and management of PV plants under varying conditions. Future works must explore a more detailed study of the system’s response to other potential weather disturbances and further validation.
Sequence data that support this study have been deposited and can be accessible through the following link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:https://doi.org/10.7910/DVN/IIV7PI. The model approach followed in this study has been deposited and can be accessible through the following link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:https://doi.org/10.7910/DVN/RRGMCZ
Anderson test
Anderson–Darling normality test
Artificial neural network
Convolutional neural networks-gated recurrent unit
Convolutional neural networks-long short-term memory
Extreme learning machine
Feed forward-artificial neural networks
Genetic algorithm
Grey wolf optimization
Attention-based memory fully-connected network
Input weights
Layer weights
Multiple linear regression
Multimicrogrid
Nonlinear autoregressive neural network
Particle swarm optimization
Photovoltaic
Quantile-transformed multi-attention residual framework
Coefficient of determination
Root mean square error
Recurrent neural network
Recurrent neural network-long short-term memory
Support vector regression
Number of samples
Cumulative distribution function
Spearman’s correlation
Difference in ranks of the ith element
Normalized sample
Sampling data
Lowest value observed within the data sequences
The highest value observed within the data sequences
Network nomenclature
Bias of network
Connection (weight) strength between nodes of a network
Network activation function
Predictor variables of multiple linear regression
Coefficients of multiple linear regression
Input delay of the data series in nonlinear autoregressive neural network
Transfer function of nonlinear autoregressive neural network
Time of the time series in autoregressive neural network
Output obtained from the artificial neural network
Target value obtained from the experimental data
Mean percentage error
Energy and the Green Deal. Accessed 1 Jun 2023; https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/european-green-deal/energy-and-green-deal_en.
Díaz-Bello, D., Vargas-Salgado, C., Águila-León, J. & Lara-Vargas, F. Methodology to estimate the impact of the DC to AC power ratio, azimuth, and slope on clipping losses of solar photovoltaic inverters: Application to a PV system located in valencia Spain. Sustainability (Switzerland) 15, 2797. https://doi.org/10.3390/su15032797 (2023).
Article Google Scholar
Vargas-Salgado, C., Berna-Escriche, C., Escrivá-Castells, A. & Díaz-Bello, D. Optimization of all-renewable generation mix according to different demand response scenarios to cover all the electricity demand forecast by 2040: The case of the grand canary Island. Sustainability (Switzerland) 14, 1738. https://doi.org/10.3390/su14031738 (2022).
Article Google Scholar
Elkadeem, M. R., Wang, S., Sharshir, S. W. & Atia, E. G. Feasibility analysis and techno-economic design of grid-isolated hybrid renewable energy system for electrification of agriculture and irrigation area: A case study in Dongola, Sudan. Energy Convers. Manag. 196, 1453–1478. https://doi.org/10.1016/j.enconman.2019.06.085 (2019).
Article ADS Google Scholar
Watson, S. et al. Advantages of operation flexibility and load sizing for PV-powered system design. Sol. Energy 162, 132–139. https://doi.org/10.1016/j.solener.2018.01.022 (2018).
Article ADS MATH Google Scholar
Gómez-Navarro, T., Brazzini, T., Alfonso-Solar, D. & Vargas-Salgado, C. Analysis of the potential for PV rooftop prosumer production: Technical, economic and environmental assessment for the city of Valencia (Spain). Renew. Energy 174, 372–381. https://doi.org/10.1016/j.renene.2021.04.049 (2021).
Article Google Scholar
Bhattacharya, S., Sadhu, P. K. & Sarkar, D. Performance evaluation of building integrated photovoltaic system arrays (SP, TT, QT, and TCT) to improve maximum power with low mismatch loss under partial shading. Microsyst. Technol. https://doi.org/10.1007/s00542-023-05564-0 (2023).
Article PubMed PubMed Central MATH Google Scholar
Sarkar, D. & Sadhu. P. K. Power Enhancement by hybrid BIPV arrays with fewer peaks and reduced mismatch losses under partial shading. In 2023 3rd International Conference on Intelligent Technologies, CONIT 2023. (Institute of Electrical and Electronics Engineers Inc., 2023).
Sarkar, D. & Sadhu. P. K. GMPP Improvement with fewer power peaks and lower mismatch losses using a new hybrid BIPV array configuration. In 5th International Conference on Energy, Power, and Environment: Towards Flexible Green Energy Technologies, ICEPE 2023. (Institute of Electrical and Electronics Engineers Inc., 2023).
Aguila-Leon, J., Vargas-Salgado, C., Chiñas-Palacios, C. & Díaz-Bello, D. Solar photovoltaic maximum power point tracking controller optimization using Grey Wolf Optimizer: A performance comparison between bio-inspired and traditional algorithms. Expert. Syst. Appl. 211, 118700. https://doi.org/10.1016/j.eswa.2022.118700 (2023).
Article Google Scholar
Aguila-Leon, J., Vargas-Salgado, C., Chiñas-Palacios, C. & Díaz-Bello, D. Energy management model for a standalone hybrid microgrid through a particle Swarm optimization and artificial neural networks approach. Energy Convers. Manag. 267, 115920. https://doi.org/10.1016/j.enconman.2022.115920 (2022).
Article Google Scholar
Alogdianakis, F., Dimitriou, L. & Charmpis, D. C. Data-driven recognition and modelling of deterioration patterns in the US National Bridge Inventory: A genetic algorithm-artificial neural network framework. Adv. Eng. Softw. 171, 103148. https://doi.org/10.1016/j.advengsoft.2022.103148 (2022).
Article MATH Google Scholar
Roldán-Blay, C. et al. Upgrade of an artificial neural network prediction method for electrical consumption forecasting using an hourly temperature curve model. Energy Build. 60, 38–46. https://doi.org/10.1016/j.enbuild.2012.12.009 (2013).
Article MATH Google Scholar
Zeynali, S., Rostami, N., Ahmadian, A. & Elkamel, A. Two-stage stochastic home energy management strategy considering electric vehicle and battery energy storage system: An ANN-based scenario generation methodology. Sustain. Energy Technol. Assess. 39, 100722. https://doi.org/10.1016/j.seta.2020.100722 (2020).
Article MATH Google Scholar
Saryazdi, S., Mohammad, E., Etemad, A., Shafaat, A. & Bahman, A. M. Data-driven performance analysis of a residential building applying artificial neural network (ANN) and multi-objective genetic algorithm (GA). Build. Environ. 225, 109633. https://doi.org/10.1016/j.buildenv.2022.109633 (2022).
Article Google Scholar
Kemmoku, Y., Orita, S., Nakagawa, S. & Sakakibara. T. Daily insolation forecasting using a multi-stage neural network (1999)
Sfetsos, A. & Coonick, A. H. Univariate and multivariate forecasting of hourly solar radiation with artificial intelligence techniques. Sol. Energy 68(2), 169–178. https://doi.org/10.1016/S0038-092X(99)00064-X (2000).
Article ADS MATH Google Scholar
Braik, M., Al-Zoubi, H. & Al-Hiary, H. Artificial neural networks training via bio-inspired optimisation algorithms: Modelling industrial winding process, case study. Soft comput. 25, 4545–4569. https://doi.org/10.1007/s00500-020-05464-9 (2021).
Article MATH Google Scholar
Chandrasekaran, K., Selvaraj, J., Xavier, F. J. & Kandasamy, P. Artificial neural network integrated with bio-inspired approach for optimal VAr management and voltage profile enhancement in grid system. Energy Sour. Part A Recov. Util. Environ. Effects 43, 2838–2859. https://doi.org/10.1080/15567036.2021.1919790 (2021).
Article Google Scholar
Chiñas-Palacios, C., Vargas-Salgado, C., Aguila-Leon, J. & Hurtado-Pérez, E. A cascade hybrid PSO feed-forward neural network model of a biomass gasification plant for covering the energy demand in an AC microgrid. Energy Convers. Manag. 232, 113896. https://doi.org/10.1016/j.enconman.2021.113896 (2021).
Article Google Scholar
Safari, A., Kharrati, H. & Rahimi, A. Multi-term electrical load forecasting of smart cities using a new hybrid highly accurate neural network-based predictive model. Smart Grids Sustain. Energy 9, 8. https://doi.org/10.1007/s40866-023-00188-9 (2024).
Article MATH Google Scholar
Aguila-Leon, J. et al. Particle swarm optimization, genetic Algorithm and grey Wolf optimizer algorithms performance comparative for a DC-DC boost converter PID controller. Adv. Sci. Technol. Eng. Syst. 6, 619–625. https://doi.org/10.25046/aj060167 (2021).
Article Google Scholar
Aguila-Leon, J., Chiñas-Palacios, C., Garcia, E. X. M. & Vargas-Salgado, C. A multimicrogrid energy management model implementing an evolutionary game-theoretic approach. Int. Trans. Electr. Energy Syst. 30, e12617. https://doi.org/10.1002/2050-7038.12617 (2020).
Article Google Scholar
Meng, L. et al. Multi-objective optimization of plate heat exchanger for commercial electric vehicle based on genetic algorithm. Case Stud. Therm. Eng. 41, 102629. https://doi.org/10.1016/j.csite.2022.102629 (2023).
Article MATH Google Scholar
Zhang, F. et al. Performance improvement of a pump as turbine in storage mode by optimization design based on genetic algorithm and fuzzy logic. J. Energy Storage 62, 106875. https://doi.org/10.1016/j.est.2023.106875 (2023).
Article MATH Google Scholar
Ahn, G., Jin, M. K., Hwang, S. B. & Hur, S. Shapelet selection based on a genetic algorithm for remaining useful life prediction with supervised learning. Heliyon https://doi.org/10.1016/j.heliyon.2022.e12111 (2022).
Article PubMed PubMed Central MATH Google Scholar
Chen, Q. & Hu, X. Design of intelligent control system for agricultural greenhouses based on adaptive improved genetic algorithm for multi-energy supply system. Energy Rep. 8, 12126–12138 (2022).
Article MathSciNet MATH Google Scholar
Dolara, A., Leva, S. & Manzolini, G. Comparison of different physical models for PV power output prediction. Sol. Energy 119, 83–99. https://doi.org/10.1016/j.solener.2015.06.017 (2015).
Article ADS MATH Google Scholar
Massidda, L., Bettio, F. & Marrocu, M. Probabilistic day-ahead prediction of PV generation. A comparative analysis of forecasting methodologies and of the factors influencing accuracy. Sol. Energy 271, 112422. https://doi.org/10.1016/j.solener.2024.112422 (2024).
Article MATH Google Scholar
Sahin, G., Isik, G. & van Sark, W. G. J. H. M. Predictive modeling of PV solar power plant efficiency considering weather conditions: A comparative analysis of artificial neural networks and multiple linear regression. Energy Rep. 10, 2837–2849. https://doi.org/10.1016/j.egyr.2023.09.097 (2023).
Article MATH Google Scholar
Scott, C., Ahsan, M. & Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 278, 127807. https://doi.org/10.1016/j.energy.2023.127807 (2023).
Article MATH Google Scholar
Wu, Y. K., Huang, C. L., Phan, Q. T. & Li, Y. Y. Completed review of various solar power forecasting techniques considering different viewpoints. Energies (Basel) 15, 3320 (2022).
Article MATH Google Scholar
Moreira, M. O. et al. Design of experiments using artificial neural network ensemble for photovoltaic generation forecasting. Renew. Sustain. Energy Rev. 135, 110450. https://doi.org/10.1016/j.rser.2020.110450 (2021).
Article MATH Google Scholar
Aslam Khan, M. N. et al. Prediction of thermal diffusivity of volcanic rocks using machine learning and genetic algorithm hybrid strategy. Int. J. Therm. Sci. 192, 108403. https://doi.org/10.1016/j.ijthermalsci.2023.108403 (2023).
Article MATH Google Scholar
Xiong, J., Liang, W., Liang, X. & Yao, J. Intelligent quantification of natural gas pipeline defects using improved sparrow search algorithm and deep extreme learning machine. Chem. Eng. Res. Des. 183, 567–579. https://doi.org/10.1016/j.cherd.2022.06.001 (2022).
Article CAS MATH Google Scholar
Ji, B. et al. Research on optimal intelligent routing algorithm for IoV with machine learning and smart contract. Digit. Commun. Netw. 9, 47–55. https://doi.org/10.1016/j.dcan.2022.06.012 (2023).
Article MATH Google Scholar
Wu, L. et al. Daily reference evapotranspiration prediction based on hybridized extreme learning machine model with bio-inspired optimization algorithms: Application in contrasting climates of China. J. Hydrol. (Amst) 577, 123960. https://doi.org/10.1016/j.jhydrol.2019.123960 (2019).
Article Google Scholar
Nadirgil, O. Carbon price prediction using multiple hybrid machine learning models optimized by genetic algorithm. J. Environ. Manage 342, 118061. https://doi.org/10.1016/j.jenvman.2023.118061 (2023).
Article PubMed Google Scholar
Zheng, S., Xiao, Y. & Liu, J. Automatic prediction modeling for Time-Series degradation data via Genetic algorithm with applications in nuclear energy. Ann. Nucl. Energy 186, 109781. https://doi.org/10.1016/j.anucene.2023.109781 (2023).
Article CAS MATH Google Scholar
Cheng, T., Zhu, X., Yang, F. & Wang, W. Machine learning enabled learning based optimization algorithm in digital twin simulator for management of smart islanded solar-based microgrids. Solar Energy 250, 241–247. https://doi.org/10.1016/j.solener.2022.12.040 (2023).
Article ADS Google Scholar
Muruganandam, S. et al. A deep learning based feed forward artificial neural network to predict the K-barriers for intrusion detection using a wireless sensor network. Meas. Sens. 25, 100613. https://doi.org/10.1016/j.measen.2022.100613 (2023).
Article MATH Google Scholar
Tian, S. et al. Using perceptron feed-forward artificial neural network (ANN) for predicting the thermal conductivity of graphene oxide-Al2O3/water-ethylene glycol hybrid nanofluid. Case Stud. Therm. Eng. 26, 101055. https://doi.org/10.1016/j.csite.2021.101055 (2021).
Article MATH Google Scholar
Hajduk, Z. Reconfigurable FPGA implementation of neural networks. Neurocomputing 308, 227–234. https://doi.org/10.1016/j.neucom.2018.04.077 (2018).
Article MATH Google Scholar
Davis, D. & Brear, M. J. Impact of short-term wind forecast accuracy on the performance of decarbonising wholesale electricity markets. Energy Econ. 130, 107304. https://doi.org/10.1016/j.eneco.2024.107304 (2024).
Article Google Scholar
Matrenin, P. et al. Adaptive ensemble models for medium-term forecasting of water inflow when planning electricity generation under climate change. Energy Rep. 8, 439–447. https://doi.org/10.1016/j.egyr.2021.11.112 (2022).
Article MATH Google Scholar
Kim, J. H., Lee, B. S. & Kim, C. H. A Study on the development of long-term hybrid electrical load forecasting model based on MLP and statistics using massive actual data considering field applications. Electric Power Syst. Res. 221, 109415. https://doi.org/10.1016/j.epsr.2023.109415 (2023).
Article MATH Google Scholar
Uyeh, D. D. et al. Grid search for lowest root mean squared error in predicting optimal sensor location in protected cultivation systems. Front. Plant Sci. 13, 920284. https://doi.org/10.3389/fpls.2022.920284 (2022).
Article PubMed PubMed Central MATH Google Scholar
Wang, J. et al. Photovoltaic cell parameter estimation based on improved equilibrium optimizer algorithm. Energy Convers. Manag. 236, 114051. https://doi.org/10.1016/j.enconman.2021.114051 (2021).
Article ADS MATH Google Scholar
Wu, X. et al. Intelligent optimization framework of near zero energy consumption building performance based on a hybrid machine learning algorithm. Renew. Sustain. Energy Rev. 167, 112703. https://doi.org/10.1016/j.rser.2022.112703 (2022).
Article MATH Google Scholar
AlShafeey, M. & Csáki, C. Evaluating neural network and linear regression photovoltaic power forecasting models based on different input methods. Energy Rep. 7, 7601–7614. https://doi.org/10.1016/j.egyr.2021.10.125 (2021).
Article MATH Google Scholar
Babatunde, A. A. & Abbasoglu, S. Predictive analysis of photovoltaic plants specific yield with the implementation of multiple linear regression tool. Environ. Prog. Sustain. Energy 38, 13098. https://doi.org/10.1002/ep.13098 (2019).
Article CAS MATH Google Scholar
De Giorgi, M. G., Congedo, P. M. & Malvoni, M. Photovoltaic power forecasting using statistical methods: Impact of weather data. IET Sci. Meas. Technol. 8, 90–97. https://doi.org/10.1049/iet-smt.2013.0135 (2014).
Article MATH Google Scholar
Khojasteh, D. N. et al. Long-term effects of outdoor air pollution on mortality and morbidity–prediction using nonlinear autoregressive and artificial neural networks models. Atmos. Pollut. Res. 12, 46–56. https://doi.org/10.1016/j.apr.2020.10.007 (2021).
Article CAS MATH Google Scholar
Nogay, H. S. Estimating the aggregated available capacity for vehicle to grid services using deep learning and nonlinear autoregressive neural network. Sustain. Energy Grids Netw. 29, 100590. https://doi.org/10.1016/j.segan.2021.100590 (2022).
Article Google Scholar
Sunayana, K. S. & Kumar, R. Forecasting of municipal solid waste generation using non-linear autoregressive (NAR) neural models. Waste Manag. 121, 206–214. https://doi.org/10.1016/j.wasman.2020.12.011 (2021).
Article CAS PubMed MATH Google Scholar
Sobri, S., Koohi-Kamali, S. & Rahim, N. A. Solar photovoltaic generation forecasting methods: A review. Energy Convers. Manag. 156, 459–497 (2018).
Article ADS Google Scholar
Jakoplić, A., Franković, D., Kirinčić, V. & Plavšić, T. Benefits of short-term photovoltaic power production forecasting to the power system. Optim. Eng. 22, 9–27. https://doi.org/10.1007/s11081-020-09583-y (2021).
Article Google Scholar
Behera, M. K. & Nayak, N. A comparative study on short-term PV power forecasting using decomposition based optimized extreme learning machine algorithm. Eng. Sci. Technol. Int. J. 23, 156–167. https://doi.org/10.1016/j.jestch.2019.03.006 (2020).
Article MATH Google Scholar
El estadístico de Anderson-Darling – Minitab. Accessed 13 Mar 2023; https://support.minitab.com/es-mx/minitab/21/help-and-how-to/statistics/basic-statistics/supporting-topics/normality/the-anderson-darling-statistic/.
Puig, P. & Stephens, M. A. Tests of fit for the laplace distribution, with applications. Technometrics 42, 417–424. https://doi.org/10.1080/00401706.2000.10485715 (2000).
Article MathSciNet MATH Google Scholar
Extenso Juliana
Kramer O Studies in Computational Intelligence 679 Genetic Algorithm Essentials
Abdel-Basset, M. et al. Parameters identification of photovoltaic models using Lambert W-function and Newton-Raphson method collaborated with AI-based optimization techniques: A comparative study. Expert Syst. Appl. 255, 124777. https://doi.org/10.1016/j.eswa.2024.124777 (2024).
Article MATH Google Scholar
Gao, X. et al. Special trans function based exact expressions for the double and triple diode models of solar cells: Superior fitness, accuracy and convergence. Energy Rep. 11, 5252–5270. https://doi.org/10.1016/j.egyr.2024.05.016 (2024).
Article MATH Google Scholar
Liu, Q. et al. Multi-strategy adaptive guidance differential evolution algorithm using fitness-distance balance and opposition-based learning for constrained global optimization of photovoltaic cells and modules. Appl. Energy 353, 122032. https://doi.org/10.1016/j.apenergy.2023.122032 (2024).
Article Google Scholar
Chen, X. et al. A two-stage method for model parameter identification based on the maximum power matching and improved flow direction algorithm. Energy Convers. Manag. 278, 116712. https://doi.org/10.1016/j.enconman.2023.116712 (2023).
Article MATH Google Scholar
Hassanat, A. et al. Choosing mutation and crossover ratios for genetic algorithms-a review with a new dynamic approach. Information (Switzerland) 10, 390. https://doi.org/10.3390/info10120390 (2019).
Article Google Scholar
Ketkar N, Moolayil J (2021) Deep learning with python: Learn Best Practices of Deep Learning Models with PyTorch. Apress Media LLC
Adedeji, B. P. & Kabir, G. A feedforward deep neural network for predicting the state-of-charge of lithium-ion battery in electric vehicles. Decis. Anal. J. https://doi.org/10.1016/j.dajour.2023.100255 (2023).
Article MATH Google Scholar
Genç, B. & Tunç, H. Optimal training and test sets design for machine learning. Turk. J. Electr. Eng. Comput. Sci. 27, 1534–1545. https://doi.org/10.3906/elk-1807-212 (2019).
Article MATH Google Scholar
Granados RM Modelos de regresión lineal múltiple
Ćalasan, M., Abdel Aleem, S. H. E. & Zobaa, A. F. On the root mean square error (RMSE) calculation for parameter estimation of photovoltaic models: A novel exact analytical solution based on Lambert W function. Energy Convers. Manag. https://doi.org/10.1016/j.enconman.2020.112716 (2020).
Article Google Scholar
Regresión lineal – MATLAB & Simulink – MathWorks España. Accessed 12 Mar 2024; https://es.mathworks.com/help/matlab/data_analysis/linear-regression.html
Baraldo M, Furlanut M (1995) ERROR: A USER’S NOTE
Mirza, A. F. et al. Quantile-transformed multi-attention residual framework (QT-MARF) for medium-term PV and wind power prediction. Renew. Energy https://doi.org/10.1016/j.renene.2023.119604 (2024).
Article MATH Google Scholar
Akhter, M. N. et al. An hour-ahead PV power forecasting method based on an RNN-LSTM model for three different PV plants. Energies (Basel) https://doi.org/10.3390/en15062243 (2022).
Article PubMed Central Google Scholar
Hossain, M. et al. Application of extreme learning machine for short term output power forecasting of three grid-connected PV systems. J. Clean Prod. 167, 395–405. https://doi.org/10.1016/j.jclepro.2017.08.081 (2017).
Article MATH Google Scholar
Download references
This research has been funded by “Modelado, experimentación y desarrollo de sistemas de gestión óptima para microrredes híbridas renovables” (CIGE/2021/172). (01/01/22–31/12/23). Investigación competitiva proyectos. Conselleria de Educación, Universidades y Empleo, GENERALITAT VALENCIANA. Additionally, one of the authors (D.D.B) was supported by the Ministry of Universities of Spain under the grant FPU21/00677.
Instituto de Ingeniería Energética, Universitat Politècnica de València, Valencia, Spain
Dácil Díaz-Bello, Carlos Vargas-Salgado, Manuel Alcazar-Ortega & David Alfonso-Solar
Departamento de Ingeniería Eléctrica, Universitat Politècnica de València, Valencia, Spain
Carlos Vargas-Salgado & Manuel Alcazar-Ortega
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
D.D.-B. Conceptualization, Methodology, Data curation, Writing—original draft, Visualization, Investigation, Validation. C.V.-S.: Conceptualization, Methodology, Visualization, Investigation, Supervision, Validation, Writing—review & editing. M.A.-O.: Conceptualization, Methodology, Data curation, Writing—original draft. All authors reviewed the manuscript. D.A.-S.: Data curation, Writing—original draft, Supervision, Writing—review & editing.
Correspondence to Carlos Vargas-Salgado.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The parameterization resulting from ANN optimization across GA for PV power generation prediction is shown in Tables 13 and 14. The optimal setting is specified for each training methodology: annual, seasonal, and monthly, by showing the number of neurons of the hidden layer, the transfer functions to be implemented, and the values of IW, LW, and biases.
The parameterization resulting from the base ANN for PV power generation prediction is shown in Tables 15 and 16.
The parameterization resulting from MLR for PV power generation prediction is shown in Table 17.
The parameterization resulting from NAR for PV power generation prediction is shown in Table 18.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Díaz-Bello, D., Vargas-Salgado, C., Alcazar-Ortega, M. et al. Optimizing photovoltaic power plant forecasting with dynamic neural network structure refinement. Sci Rep 15, 3337 (2025). https://doi.org/10.1038/s41598-024-80424-z
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-80424-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Scientific Reports (2025)
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.