PhysEmbedFormer: a physics-guided interpretable architecture for days-ahead forecasting of PV power – Nature

Posted on March 17, 2026 by Now.Solar

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 4705 (2026) Cite this article
780 Accesses
Metrics details
Days-ahead forecasting of photovoltaic (PV) power generation is crucial for pricing and balancing the renewable power grids. The traditional physics-based models offer trade-off interpretability with limited accuracy, whereas the attention-based data-driven models have high predictive accuracy, but limited interpretability. The accurate predictions of PV power are challenging due to stochastic, weather-dependent nature of solar radiation, which induces distribution shifts and non-stationary patterns in the time series data. This paper addresses some of these limitations by proposing a novel physics-guided architecture termed PhysEmbedFormer for forecasting the PV power data in the context of meteorological data. It offers improved interpretability, and forecasting robustness by accounting for the external weather-dependent factors. In particular, the input PV time series are first decomposed into the physics-estimated component and the residual component. The components are jointly embedded into a high-dimensional vector-space using the cross-modality module. The subsequent dual-stage Kolmogorov-Arnold Network (KAN) refinement module represent a learnable non-linear transformation to better match the sample distributions to simpler downstream forecasting models such as the previously proposed iTransformer. Extensive experiments on multiple PV datasets show that PhysEmbedFormer achieves consistently lower MAE, RMSE, and higher (R^2) than other competing architectures across different prediction horizons up to 72 hours. At the same time, PhysEmbedFormer experiences the second narrowest 95% confidence interval in its predictions, so it is also robust to sample distribution shifts due to changes in the present weather conditions.
As the generation and use of renewable energy continues to boost the global power systems, a photovoltaic (PV) power forecasting has become a critical task in managing the operations of modern power grids^1,2. In practical applications, the PV forecasting is often categorized according to the prediction horizon. The short-term forecasting typically refers to predictions within one day, which is crucial for the real-time grid balancing, the frequency regulation, and adaptive control of smart grids. The medium-term forecasting spans one to several days, and in some cases, even up to a fewe weeks, and it used to support the market bidding, energy trading, unit commitment, and storage scheduling.
Considering the PV application requirements, the existing PV forecasting methods can be divided into the three categories, i.e., physics-based models, data-driven models, and the hybrid models that integrate both approaches. The physics-based models for the PV power forecasting focus on modeling and predicting the meteorological data rather than extrapolating the historical PV power data^3,4. If the weather conditions are sufficiently stable, such a strategy can achieve satisfying results⁵. However, in many real-world scenarios, the weather conditions often vary. Moreover, accurate physical modeling of weather conditions demands large models with many parameters, which are difficult to create, and numerically costly to evaluate^6,7. Consequently, the PV forecasting methods reported in the literature now mainly focus on data-driven approaches. These methods have greater flexibility, and enable adaptive learning of patterns directly from the historical observations, but at the cost of reduced interpretability. The examples of lightweight data-driven models include the FITS⁸ and the SparseTSF⁹. The recursive neural network (RNN)-based models are good at capturing short-term temporal dependencies, but they struggle to learn long-term dependencies and multivariate interactions¹⁰. The transformer-based models for the time series data such as Pyraformer¹¹, FEDformer¹², and Crossformer¹³ mitigate these issues by leveraging the attention mechanism originally proposed in Transformers. The iTransformer¹⁴ treats each time series as a token, so the attention captures the inter-time series correlations, while the feed-forward layers with layer normalization are used for representation learning. This makes the iTransformer possibly effective also for the PV power forecasting. The hybrid forecasting models combine different architectures, and often integrate signal decomposition methods^15,16. For example, the improved CEEMDAN-transformer extracts intrinsic modal features from meteorological data, which are usually highly correlated with the PV power data¹⁷. The PTFNet employs a solar positioning algorithm to calculate the new features including the solar zenith and azimuth angles, and the diffuse horizontal irradiance in order to improve the PV power forecasting using a transformer¹⁸. The WTGRU model decomposes the PV power data using a wavelet transform, and then predicts each component independently using gated recurrent units (GRUs)¹⁹. Nevertheless, the majority of existing hybrid models rely on classical signal decompositions over extended time windows, and even the entire sequences. Moreover, these methods often involve non-causal filtering or require access to future samples to determine the decomposition boundaries, they may introduce unintended information leakage, unless they are carefully designed. The standard decompositions are also not explicitly linked to physically interpretable formulations, so they are prone to various issues such as mode mixing and modal aliasing. The PV power forcasting models in the existing Table 1.
The additional challenge in accurate PV power forecasting is a so-called concept drift. It refers to the phenomenon where the statistical distribution of data changes over time²⁰. This issue is especially common in the PV systems due to a stochastic and highly weather-dependent nature of the solar radiation^21,22. The seasonal transitions, abrupt cloud movements, and evolving atmospheric conditions continuously alter the relationship between the input quantities and the generated PV output power, resulting in dynamic and non-stationary data patterns^23,24. The existing approaches for PV power forecasting primarily focus on online ensemble learning^25,26, and online adaptive learning^27,28,29. These methods largely address the concept drift assuming training-time adaptations, dynamic updating, incremental learning, and re-weighting strategies in order to maintain the model relevance as the data distributions evolve.
In data processing, the concept drift is normally referred to as the distribution drift. It is usually mitigated by the model-intrinsic mechanisms instead of adaptive learning strategies in order to enhance the model robustness in non-stationary environments. One such popular strategy a normalization–forecasting–denormalization framework³⁰. The RevIN³¹ assumes normalizations taht are reversible, and employ these normalizations to each look-back window. It is followed by variable-length multiplication by learnable scale factors, and addition of the bias term. The RevIN is easy to design, so it is now frequently integrated into most modern forecasting architectures. Other similar normalization strategies include the simple instance normalization^9,32 and the layer normalization^13,14. In addition to these normalization strategies, the feature extraction techniques, such as signal decomposition^30,33 and the feature extraction with fusion^34,35 can be adopted to improve the forecasting model robustness under the distribution shifts. In case of the PV power forecasting, these general strategies can further leverage the domain-specific auxiliary variables. It allows the models to more effectively capture both temporal dynamics and the PV-relevant dependencies.
In this paper, a physics-guided forecasting architecture referred to as PhysEmbedFormer is proposed to improve the forecasting performance of existing architectures for PV time series data under a weather-defined context while also offering better interpretability. The proposed architecture creates a data processing work-flow consisting of physically meaningful step-wise decomposition followed by the cross-modality embedding with fusion and the Kolmogorov-Arnold Networks (KAN)-based refinement module. In the last step, an iTransformer is employed for generating the actual predicted samples. The core idea is to first decompose the input PV power time series into a physically estimated component and a residual component. Then, the temporal and the spatial embeddings are extracted for these components and also from the meteorological data in order to obtain fine-grained modeling of the internal dynamics and interactions. The dual-stage KAN refinement module non-linearly adjusts the PV power signal components to account for the weather context. Finally, the refined signal components are fed together with other auxiliary inputs into an iTransformer to generate the forecasted values.
The main contribution of this paper is a hybrid architecture referred to as PhysEmbedFormer specifically intended for forecasting the PV power values in the context of weather data. Even though the iTransformer has robust multivariate modeling capabilities, it treats all inputs including context data equally. The iTransformer also fails to exploit the domain-specific dynamics of the predicted time series data. The proposed PhysEmbedFormer mitigates these limitations by incorporating a physics-guided refinement mechanism that prioritizes modeling of the target variable while exploiting both physically estimated signals and the residuals. The spatial and temporal embeddings are explicitly constructed to enhance multi-dimensional component representations. The targeted refinement offers more accurate predictions, and also better interpretability.
The specific innovations reported in this paper can be summarized as follows.
A physics-guided decomposition module is devised to decompose the target forecasting variable into a physically meaningful component and a residual component. This decomposition is performed at every time step, so it is strictly dependent only on historical measurements. This creates temporal causality, and avoids potential information leakage from future measurements, modal aliasing, and frequency mixing. It can be readily combined with moving averaging or convolution-based smoothing methods. The physics-based estimation of the physical component also strengthens the overall interpretability of the forecasting model.
A cross-modality embedding and fusion module is designed to capture the spatial-temporal characteristics and dependencies of the physics-based information-bearing signal component, the residual component, and the forecasting context defined by the extraneous variables. Since the signal modalities are affected by the extraneous variables, their distribution may shift at different rates over time. A structural modeling of signals as separate modalities reduces the extent to which the distribution changes in one component affect the other components, which contributes to more stable feature representations, and in turn, more accurate forecasting.
A KAN module is inserted into the data processing work-flow to non-linearly transform the target signal representation within the context of extraneous variables. Unlike the conventional MLPs that rely on fixed activation functions, the KAN learns the activation functions being parameterized B-splines from data. It enables flexible and robust adaptation to changes in the data statistics, while also providing some level of interpretability unlike the MLP-based modules.
Extensive numerical experiments are reported to demonstrate that the proposed PhysEmbedFormer for predicting the PV power time series consistently outperforms the other existing architectures. The former also shows more accurate predictions for different prediction horizons up to three days ahead, and across different datasets. In addition, the 95% confidence intervals of the PhysEmbedFormer predictions are the second narrowest among all other architectures considered. It makes the PhysEmbedFormer predictions to be more stable and more robust, and thus, it is more suitable for predicting the PV data than other existing architectures.
The main goal is to accurately forecast the PV power time series data for up to several days ahead. Since the PV power data are strongly influenced by weather conditions, the meteorological data need to be added as prediction context defining extraneous variables. In particular, the historical PV power data up to the current time, (P_{text {ac}}), must be considered together with the weather data such as the ambient air temperature, (T_{text {air}}), the relative humidity, (text {RH}), the global horizontal irradiance, (text {GHI}), and the diffuse horizontal irradiance, (text {DHI}). Other context variables include rating, (P_{text {rated}}), tilt, (beta), azimuth, (alpha), and a temperature coefficient (gamma) of the PV solar panel, the ground reflectance, (rho _g), the inverter efficiency, (eta), and the maximum output AC power of the inverter, (P_{text {max}}). The corresponding units and a short description of each of these parameters can be found in Table 1.
Thus, given the historical data from the last (L_p) measurements,
and a vector of constant context parameters,
the objective is to predict the next, (L_h), PV power values,
by learning a non-linear mapping, f in order to minimize the prediction error (loss),
The proposed PhysEmbedFormer aims to improve the forecasting accuracy of PV power time series data while also emphasizing better interpretability. The block-diagram of PhysEmbedFormer architecture is shown in Fig. 1 including the evaluation metrics assumed. Specifically, the target PV power samples are first decomposed into the physics-estimated component and a residual component. The physics-estimated component models the PV power that should be observed under ideal conditions, given the values of the extraneous variables. The residual component represents the noise and other variations that are not modeled explicitly. The physics-guided decomposition module is followed by cross-modality embedding and fusion, and the KAN-based representation refinement. The final stage is the iTransformer that generates the predicted samples of the PV power.
A block-diagram and data processing work-flows of the proposed PhysEmbedFormer including the assumed evaluation metrics.
The embedding module generates effective joint spatial-temporal representations of the signal components. The cross-modality attention module is used to inject the weather-related context into these representations. The signal representations are then concatenated and passed through the first KAN to learn non-linear interactions among the signal components. The refined representations are combined with the original samples using the second KAN module to generate the adjusted PV power samples for enhancing the predictions by the subsequent iTransformer in the last step. The iTransformer also utilizes other extraneous variables when making the predicting of the PV power data. Such a forecasting architecture leverages both the physical understanding and the deep learning modeling while offering more robust and interpretable predictions.
Given the values of weather-related parameters listed in Table 2, the physics-guided decomposition module splits the input PV power time series, (P_{text {ac}}), into the sum of two components, i.e.,
The signal, (P_{text {physics}}(t)), represents the physically explained approximation of the input samples, (P_{text {ac}}(t)), whereas (P_{text {residual}}(t)) are the residual variations, which are not explained by the physical model. These two components facilitate more effective extraction of the useful signal features, and they also enhance the representation learning as subsequent spatial-temporal embedding. More importantly, the decomposition is performed at every time step, t, and thus, the information leakage from future samples is avoided.
In the physics-model, the effective irradiance, (G_{text {eff}}), received by the tilted solar panel is modeled as a sum of the three quantities, i.e.,³⁶
where
In equation (6), (G_{text {beam}}cdot cos theta), is the direct beam contribution, (text {DHI}cdot k_{text {diff}}), models the isotropic diffusion, and, (text {GHI}cdot rho _g cdot k_{text {refl}}), the ground-reflected component. The first term in equation (6) represents the horizontal beam irradiance projected onto the tilted panel. The incidence angle, (theta), solar zenith, Z, and solar azimuth, (alpha _s), determine the sun position relative to the solar panel orientation. The second term in equation (6) scales the diffuse horizontal irradiance, (text {DHI}), by the tilt-dependent view factor, (k_{text {diff}}), assuming an isotropic scattering of skylight over the hemisphere. The third term in equation (6) accounts for the ground-reflected irradiance, which is computed by adjusting (text {GHI}) by the ground reflectance coefficient, (rho _g), and the tilt factor, (k_{text {refl}}), respectively.
The solar cell temperature is estimated assuming the ambient air temperature, (T_{text {air}}), and the nominal operating cell temperature (NOCT), i.e.,³⁷
Equation (8) assumes a linear increase of the cell temperature from the ambient temperature, which is constrained by the NOCT condition with the standard irradiance of 800 (W/m^2), and the standard temperature of (20^circ C). The NOCT defines how much the cell heats up under the standard conditions, and it can be used as a reference point for extrapolating the cell temperature.
Given the actual environmental conditions, the output power of the direct current (DC) converter in the PV module is calculated as,³⁸
Model (9) assumes the initial rated power output, (P_{text {rated}}), defined under the standard test conditions with (1000 {W}/{m}^2) irradiance, and the cell temperature of (25^circ C). The estimated power is then linearly adjusted with respect to the effective irradiance, and the temperature-induced efficiency loss, respectively.
Finally, the estimated DC power, (P_{text {DC}}), is converted to the estimated alternating current (AC) power, (P_{text {physics}}), by accounting for the inverter’s efficiency and capacity constraints, i.e.,
In equation (10), the DC power is scaled by the inverter efficiency to reflect the energy losses incurred during the current conversion. Moreover, since the inverters normally have the maximum output power limit, the overall AC output power is the minimum between the scaled power, (eta _{text {inv}} P_{text {DC}}), and the rated inverter maximum power, (P_{text {invv_text {max}}}). This ensures that the estimated output power remains physically plausible under all operating conditions.
The embedding module extracts the temporal and spatial representations from the input data, ({varvec{X}in mathbb {R}^{n_v times B times L_p}}), where B denotes the batch size, (L_p) is the number of past samples used in the prediction, and (n_v) is the number of the input variables. In the scenario considered, (n_v=5), assuming one of the components, (P_{text {physics}}), or, (P_{text {residual}}), and the four meteorological variables, (text {RH}), (text {GHI}), (text {DHI}), and (T_{text {air}}), respectively.
The temporal embedding processes each time-series independently, and models a time evolution in the current batch of (L_p) samples. In particular, the input data, (varvec{X}), are first permuted and reshaped into the matrix, ({varvec{X}_{text {temporal}} in mathbb {R}^{Bcdot n_v times L_p times 1}}). The data matrix is then linearly projected with layer normalization and non-linear activation into a new data matrix, i.e.,
where GELU denotes the Gaussian error linear unit, which is used as the activation function. The data matrix, (varvec{X}^{prime }_{text {temporal}}), is processed by the standard encoder comprising a multi-head self-attention and the feed-forward networks. The encoded temporal embedding is denoted as, (varvec{E}_{text {temporal}}).
The spatial embedding must be computed at every time step by modeling the interactions among the (n_v) variables. In particular, the input data, (varvec{X}), are again first permuted and reshaped into the matrix, ({varvec{X}_{text {spatial}} in mathbb {R}^{Bcdot L_p times n_v times 1}}). The linear projection with non-linear activation can be described as,
Using another transformer, the corresponding output spatial embedding is denoted as, (varvec{E}_{text {spatial}}). The complexity of these temporal and spatial encoders is dominated by the squared batch size, i.e., (L_p^2).
It should be noted that, unlike many existing architectures that jointly embed all variables, our design strategy is to define three separate embeddings. It allows each embedding module to be specialized for its input type, and in turn, the overall robustness and forecasting performance can be improved. In particular, there is one embedding module for (P_{text {physics}}) component, one for (P_{text {residual}}) component, and the third embedding module is used for the four meteorological variables (T_{text {air}}), (text {RH}), (text {GHI}), and (text {DHI}).
The embedded signals are combined using the cross-modality fusion module. The key component of the fusion module is cross-attention across time and across different time series, which preserves the distinct characteristics of each signal. The outputs are merged with the original embeddings through residual connections, and the normalization layers are used for stability. The output linear transformations are used for refining the final signal representations. The numerical efficiency of the fusion module is improved by the parameter sharing, so that only three cross-attention modules are used for the six attention blocks, as indicated by the same colors in Fig. 1.
The cross-attention assumes the standard scaled dot-products. For example, the cross-attention of the spatial embedding, (varvec{E}_{text {physics, spatial}}), of the signal, (P_{text {physics}}), and the spatial embedding, (varvec{E}_{text {auxiliary, spatial}}), of the four auxiliary meteorological variables is computed as,
where the learnable linear projections corresponding to the query, key, and value matrices of dimension, (d_k), respectively, are,
The KAN-based refinement module aims to improve the signal quality for the subsequent predictions using a simple iTransformer. The block diagram of the refinement module is shown in Fig. 2. The inputs to the refinement module are the spatial-temporal embeddings of the physics-estimated and the residual components. These components are stacked and reshaped, and jointly processed by the first KAN. The resulting output is used as an interpretable correction, which is concatenated with a batch of the original PV power time series. The correction makes the target time series to be more informative by removing the undesirable variations and other anomalies in the second KAN.
The block diagram of the two-stage KAN-based refinement module for the physics-estimated and the residual components and their space-time embeddings as the inputs. The first stage yields a correction that is combined with the original PV power time series using the second KAN.
More precisely, the inputs to the refinement module in Fig. 2 are the physics-estimated component, (P_{text {physics}}), and its spatial-temporal embedding, (P_{text {physics}}^{(s-t)}), and the residual component, (P_{text {residual}}), and its spatial-temporal embedding, (P_{text {residual}}^{(s-t)}), respectively. These inputs are linearly combined using learnable scalar weights, (alpha _{text {physics}}), and, (alpha _{text {residual}}), and then permuted and stacked to form the input to the first KAN model, i.e.,
The i-th output of the first layer, (varvec{h}_i), of the first KAN is computed as,³⁹
where (d_{text {KAN}}) is the layer dimension, (varvec{c}^{(1)}) are the learnable weights, (B_m) is the m-th out of the M B-spline basis functions, (varvec{S}_j^{(1)}) are the spline normalization factors, and (odot) denotes the Hadamard matrix product. The second term represents a linear transformation with learnable weights, (varvec{W}^{(1)}), followed by the activation, (sigma (cdot )), which acts as regularization. In the second layer, the outputs are,
where the quantities are defined similarly as in the first layer.
The second KAN has the same structure as the first KAN. The complexity of the layers comprising both KANs are determined by their dimension, (d_{text {KAN}}).
The iTransformer is a modified transformer intended for processing multivariate time series¹⁴. It was shown to be effective for modeling the temporal as well as the inter-variable dependencies. The block diagram of iTransformer is shown in Fig. 3. In this paper, the iTransformer is used to forecast PV power data from the past data up to the currect time instant while accounting for the weather-defined context represented by the four meteorological variables ((text {RH}), (text {GHI}), (text {DHI}), (T_{text {air}})). Each layer of the iTransformer comprises multivariate attention, residual connections, and layer normalization to forecast the contextualized space-time features. The final predictions are obtained by linearly combining all the features. Recall also that, in the proposed PhysEmbedFormer, the original PV power measurements are first pre-processed by several modules as discussed above, and as indicated in Fig. 1. Such a pre-processing is necessary for improving the forecasting performance of the iTransformer, since it otherwise treats all its inputs equally.
The block diagram of the iTransformer for forecasting time series.
The proposed PhysEmbedFormer architecture was evaluated and compared with other existing architectures using the public PV power dataset. In addition to the PV power data, the dataset also contains the data about the weather. The dataset can be freely downloaded from the Desert Knowledge Australia Solar Center (DKASC) portal⁴⁰. The data samples are provided at various time steps since year 2008. The subset of data chosen for our numerical experiments include the hourly values from March 2021 till February 2023 at the sites 10, 11, 17, and 19, respectively. Specifically, the time series data consist of the five variables: (P_{text {ac}}), (T_{text {air}}), (text {RH}), (text {GHI}), (text {DHI}); the description and the units of these variables are provided in Table 2.
The measurement sites have the latitude of (23.76^circ S) and the longitude of (133.87^circ E), and they are located in the Australia/Darwin timezone. The key data statistics for each of the four sites considered are summarized in Table 3. The value of the ground reflectance, (rho _g), was not provided, so it is assumed to be 0.2, for all sites; such a value is typical for the natural surfaces including soil and grass that may be expected for the sites in the chosen region. Note that the minimum values are slightly negative, since the PV systems may report small negative readings at night due to the inverter standby consumption, or due to the sensor noise about the zero output.
The prediction accuracy of different architectures is evaluated and compared assuming the following four metrics, i.e.,
where (varvec{y}) denotes the sequence of (L_h) true values to be predicted (i.e., the ground truth), and (hat{varvec{y}}) is the corresponding predicted sequence. Recall also that, in our numerical results, the sequence to be predicted are the samples of the PV power.
The proposed PhysEmbedFormer is compared with the following five baseline models: the standalone iTransformer¹⁴, iTransformer-LSTM⁴¹, SparseTSF⁹, SegRNN³², and CrossFormer¹³. The numerical experiments were conducted using the single NVIDIA GeForce RTX 4060 GPU. All models were trained with a batch size of 128 for 150 epochs, respectively. The forecasting experiments consider multiple window-length configurations, ((L_p, L_h) = (24,4), (48,12), (72,24), (96,48), (120,72)), where (L_p) is the look-back window length, and (L_h) is the length of the prediction horizon. For example, given that the data values are recorded at hourly intervals, the setting, (24, 4), indicates that the PV power values over the next 4 hours are predicted assuming the data for the last 24 hours.
The dataset was split into the training, validation, and testing subsets at a ratio, 8 : 1 : 1. The embedding and the hidden dimensions for the transformer-based modules, i.e., the RNN-based modules and the SparseTSF are considered from the set, ({8, 16, 32, 64 }). The models are trained using the Adam optimizer with the initial learning rate of (1e-4), which is then dynamically reduced by a factor of 0.5 whenever the validation loss does not improve for the five consecutive epochs. Moreover, the training is stopped early, if the validation loss does not improve by at least 0.0001, for the five consecutive epochs. The number of attention heads is chosen from the set, ({2, 4, 8 }), and the number of layers for the transformer-based, and the RNN-based modules is set to 2. The hidden dimension for the KAN modules is chosen within the interval, [2, 64]. The number of B-spline basis functions in KAN is chosen to be, (M=5+3=8), assuming 5 to be the grid size, and 3 is the order of the B-splines. This choice of B-splines ensures adequate coverage of the input domain by overlapping splines while offering a manageable model complexity. The case of the B-spline order less then 3 is investigated in the ablation experiments. For all architectures considered, the package Optuna⁴² has been used for optimizing the architecture hyperparameters.
The forecasting performance of different architectures is compared in Tables 4, 5, 6 and 7 assuming the PV power data and weather at the sites 10, 11, 17, and 19, respectively. The best performance is always highlighted in bold. It can be observed that the proposed PhysEmbedFormer generally achieves lower MAE and RMSE, and higher (R^2) values than the other architectures. Although the advantage of PhysEmbedFormer is less pronounced under sMAPE, the performance gap between PhysEmbedFormer and the best baseline architecture is nearly always within 1%. It suggests that the iTransformer and other similar methods can greatly benefit from pre-processing the time series to be forecasted, particularly when the forecasting needs to be done within a context defined by the extraneous variables. It is also highly beneficial to include any prior knowledge, when it is available, for example, in the form of a physical model describing the underlying physical mechanisms involved.
Furthermore, it can be observed from Tables 4, 5, 6 and 7 that the performance of all architectures except the SparseTSF deteriorates when the prediction horizon is increased despite also increasing the look-back window length. It implies that the forecasting performance eventually saturates, even when more historical data are used. This can be expected, since the samples too far back in the history contain less and less information about the current values samples. However, the SparseTSF architecture seems to be an exception to these rules. The performance of SparseTSF varies slightly as the prediction horizon length is increased. This could be attributed to its focus on the periodic and longer-term patterns, so it is more effective in exploiting the past data in making predictions. For (96, 48), the SparseTSF even outperforms the PhysEmbedFormer in terms of MAE for datasets at two sites. This could be explained by the tendency of SparseTSF to produce stable and conservative predictions, which usually reduces the large absolute prediction errors, and thus, it also reduces MAE. On the other hand, the simplicity of SparseTSF limits its ability to capture the short-term sample fluctuations, especially when the predictions suddenly deviate from the true values. Subsequently, the SparseTSF exhibits relatively larger values of RMSE, which penalizes larger errors. It also attains smaller (R^2) values, which reflects its weaker ability to explain the variance of the predicted samples. In contrast, the proposed PhysEmbedFormer seems to be better suited for capturing the global and local patterns, so it can achieve more balanced performance in all metrics considered. For the site 10 dataset, the sMAPE values of all models are generally larger than those at the other three sites. This may be primarily caused by the site 10 having the smallest mean PV power among the four sites, as sMAPE is highly sensitive to smalll values at its denominator.
The physics-guided decomposition of the input time series is one of the key components of the proposed PhysEmbedFormer as shown in Fig. 1. It may be possible to analyze such a decomposition mathematically. However, in this paper, only a visual example is provided to gain some basic insights about its usefulness. In particular, Fig. 4 illustrates the physics-guided decomposition of the PV power time series over the first 168 hours for the four sites considered. Since these sites are located not too far from each other, they are affected by very similar meteorological conditions, so their PV power outputs exhibit similar diurnal patterns. However, the sites have different sizes and configurations of the PV arrays and inverters, so their peak power values, and the amplitude variations are noticeably different.
The physics-guided decomposition of the PV power time series into two components over 168 hours, for the four sites considered.
Recall also that the proposed decomposition module effectively separates the input time series, (P_{text {ac}}), into two components: (P_{text {physics}}), which captures the expected physical behavior under ideal conditions, and (P_{text {residual}}), which reflects the deviations caused by various unmodeled disturbances. The main benefit of the component separation is that it allows more flexible and more interpretable modeling and learning of the underlying time series.
The values of combining weights, (alpha _{text {physics}}), and, (alpha _{text {residual}}), in equation (15) for different datasets and the window lengths, ((L_p, L_h)), are plotted in Fig. 5. The values of these weights that are learned during training are between 0.80 and 0.95. It indicates that the components are more important than their space-time embeddings, but the latter cannot be neglected. Moreover, (alpha _{text {physics}}) is always larger than (alpha _{text {residual}}), for ((L_p, L_h) = (24, 4)), and the sites 10, 11, and 17. Similarly, (alpha _{text {physics}}) is larger than (alpha _{text {residual}}) also for ((L_p, L_h) = (48, 12)) and ((L_p, L_h) = (120, 72)), and the sites 11, 17, and 19. This confirms that the physics model for extracting the information bearing component is meaningful.
The learned values of (alpha _{text {physics}}) and (alpha _{text {residual}}) at different sites.
The usefulness of the space-time component embeddings, (P_{text {physics}}^{(s-t)}), and, (P_{text {residual}}^{(s-t)}), in the first KAN refinement can be assessed by computing the (L_p)-norms of the matrices, (varvec{C}_1 = left[ varvec{S}_1^{(1)} varvec{c}_{1,1}^{(1)},ldots , varvec{S}_1^{(1)} varvec{c}_{1,M}^{(1)}right]), and, (varvec{C}_2 = left[ varvec{S}_2^{(1)}varvec{c}_{2,1}^{(1)},ldots , varvec{S}_2^{(1)} varvec{c}_{2,M}^{(1)}right]), that are used in calculating the first KAN output in its first layer. The (L_1) and (L_2) norms of these matrices are shown in Fig. 6. The values of both norms allow assessing the relative importance and effectiveness of the embeddings, (P_{text {physics}}^{(s-t)}), and, (P_{text {residual}}^{(s-t)}). In particular, the (L_1) norm approximately measures the number of significant B-spline weights, which reflects how strongly the component contributes to the model output. On the other hand, the value of the (L_2) norm is indicative of how spread out are the influences of the B-spline basis functions. Moreover, for nearly all datasets and the window lengths, ((L_p, L_h)), the (L_1) and (L_2) norm values for (varvec{C}_1) corresponding to (P_{text {physics}}^{(s-t)}) are significantly larger than for (varvec{C}_2), which corresponds to (P_{text {residual}}^{(s-t)}). It suggests that the B-spline representation of the physical component plays a dominant role in the first KAN-based refinement.
The (L_1) and (L_2) norms of the B-spline weights at the first KAN module.
The values of the (L_1) and (L_2) norms of the B-spline coefficients in the second KAN are shown in Fig. 7. Recall that this second refinement assumes the space-time embedding, (P_{text {ac}}^{(s-t)}), of (P_{text {ac}}). It can be observed that, for nearly all datasets, and the window lengths, ((L_p, L_h)), the (L_1) and (L_2) norm of (varvec{C}_1) corresponding to (P_{text {ac}}^{(s-t)}) are much smaller than those of (varvec{C}_2) corresponding to the original time series, (P_{text {ac}}). This suggests that the B-spline representation of the (P_{text {ac}}) component plays more dominant role in the second KAN-based refinement. However, for ((L_p, L_h)=(120,72)), in three out of the four site datasets, (P_{text {ac}}^{(s-t)}) represents larger proportion of the refined (P_{text {ac}}). This illustrates that there are scenarios when the second space-time embedding and the refinement are important, especially for longer prediction horizons.
The (L_1) and (L_2) norms of the B-spline weights at the second KAN module.
Figure 8 summarizes the computational costs of all architectures considered in terms of the number of floating-point operations (FLOPs) and the number of trainable parameters used. For each dataset and each configuration, ((L_p, L_h)), the FLOPs and the parameter counts of the hyperparameter optimized models are first obtained, and then these values are averaged across the four datasets. The proposed architecutre have the largest overall complexity of 556M FLOPs, and uses 229K parameters on average. The FLOPs are high mainly because of the cross-modality fusion and the iTransformer modules. These modules perform multiple attention operations over multiple time series, which often requires expensive matrix multiplications having the cubic complexity. The two-stage KAN refinement introduces many spline coefficients, making it the main contributor to the parameter count. In contrast, the SparseTSF remains highly efficient, since its architecture avoids both attention mechanism, and also complex operations in feature extraction. Its structure is built from the channel-independent MLPs, which also saves substantial costs for channel mixing. Moreover, the iTransformer achieves the second-smallest FLOPs due to its inverted tokenization mechanism. Instead of treating time steps as tokens as in the standard Transformer, the iTransformer treats each variable as a token. Since many multi-variate datasets have far fewer variables than the lookback window length, such a size inversion drastically reduces the attention complexity. The SegRNN has the second-smallest number of parameters. It divides each variable into the fixed-length segments, and then applies a linear projection as the embedding step. It utilizes a RNN encoder and the parallel multi-step forecasting with simple positional and temporal decoding steps. There is also no channel mixing in SegRNN, which further reduces its parameter count. Overall, the proposed architecture appears to trade off the computational efficiency with the improved feature representations and increased forecasting accuracy. The other architectures are simpler, but it negatively affects their prediction accuracy.
The computational complexity of the forecasting architectures considered including the number of FLOPs and the parameter counts.
In order to assess the contributions of different modules of the PhysEmbedFormer to the overall forecasting accuracy, the performance of the following five forecasting models have been evaluated, i.e.:
the complete PhysEmbedFormer as shown in Fig. 1;
the two KANs are replaced with the two MLPs having the same configuration of hidden layers;
the cross-modality embedding module and the fusion module are removed;
the number of B-splines in the two KANs is decreased from (M=8) to 6, and the spline order is decreased from 3 to 1;
the number of B-splines in the two KANs is decreased from (M=8) to 6, and the spline order is decreased from 3 to 2.
The corresponding results are reported in Tables 8, 9, 10 and 11. In particular, the model M2 generally shows the most significant performance degradation, indicating that the KANs are more effective learners than the MLPs. The model M3 also results in performance degradation, even though not as severe as M2. It indicates that the fusion module contributes additional useful information. The models M4 and M5 show relatively minor, but consistent drops in the performance. The decrease is larger for M4 having the spline order reduced to one. It implies that the KAN configuration has to be optimized to enable capturing the complex temporal dynamics. Note also that, for site 11 data, and the window lengths, ((L_p, L_h) = (72,24)), and (96, 48), the models M5 and M4 even slightly outperforms the model M1. Hence, the spline configuration in KANs could be fine-tuned to achieve a better performance. Overall, it can be concluded that the model M1 achieves the best and the most robust performance, so all its modules contribute positively to good forecasting performance, and can be further fine-tuned.
Figure 9 presents the half-widths of the (95%) confidence intervals (CI) corresponding to the measured standard errors MAE and RMSE, respectively, for the six forecasting architectures and four site datasets considered, assuming the two configurations, ((L_p,L_h)=(120,72)), and, ((L_p,L_h)=(24,4)). The CI half-width is a measure of uncertainty in the error estimates; the narrower intervals indicate better stability of predictions as well as better statistical reliability. Across most datasets and prediction horizons, the proposed PhysEmbedFormer ranks the second in terms of the CI half-width. At the same time, this architecture consistently has the narrowest CIs than all other baseline models. The relatively compact CIs of PhysEmbedFormer suggests that the proposed architecture can offer a reliable prediction performance under varying temporal resolutions and the data statistics, which are precisely the characterizes that are crucial in practical forecasting applications.
The half-width 95% confidence intervals for MAE and RMSE for the six architectures and four datasets considered, assuming two lookback and horizon prediction window settings.
The paper reported a novel architecture called PhysEmbedFormer for forecasting univariate time series that are dependent on the extraneous variables. The proposed architecture was shown to be very effective for forecasting the PV power time series that are strongly dependent on the meteorological conditions. This requires modeling both the temporal dependencies as well as the inter-variable interactions. The PhysEmbedFormer first decomposes the input time series to separate the physics-informed component from the unexplained random fluctuations. It is followed by the cross-modal space-time embedding. The non-linear correction of the time series to be predicted by the KAN refinement module significantly improves the performance of a simple downstream iTransformer, which produces the actual predictions. Extensive numerical experiments were carried out to evaluate the accuracy of up to three days-ahead forecasts. The proposed architecture was found to consistently outperform the other five deep learning architectures, and may be more interpretable, however, it comes at the cost of increased computational complexity and the number of model parameters used.
The future work will need to address the limitations of PhysEmbedFormer, especially its relatively large computational costs. This could be problematic in time-critical applications, and when the model must be implemented on a resources-constrained platforms. The complexity can be reduced by assuming less complex alternatives to multiple attentions that are used in PhysEmbedFormer. Also having a large number of learanble parameters makes the model training more difficult, and requires more training data. For example, the splines used in KAN modules should be constrained to reduce the number of model parameters. The potential research directions to investigate include adopting sparse and low-rank attention mechanisms, sharing spline parameters across the KAN modules, and devising other less complex sample correction mechanisms. Another avenue to explore is adopting the advanced training strategies for the architectures consisting of multiple modules as is the case of the proposed PhysEmbedFormer. In particular, incorporating online ensemble learning and online adaptive learning could make the forecasting more stable when processing highly non-stationary time series under the previously unseen external conditions. Hence, reducing the model complexity and improving the training efficiency are the two main areas that are important for practical deployment of all forecasting models.
No new data were generated in the study. The DKASC, Alice Springs public datasets were used to produce the numerical results, https://dkasolarcentre.com.au/download?location=alice-springs.
Cavus, M. & Allahham, A. Spatio-temporal attention-based deep learning for smart grid demand prediction. Electronics14(13), 2514. https://doi.org/10.3390/electronics14132514 (2025).
Article Google Scholar
Cavus, M. & Bell, M. Enabling smart grid resilience with deep learning-based battery health prediction in ev fleets. Batteries11(8), 283 (2025).
Article Google Scholar
Gupta, M., Arya, A., Varshney, U., Mittal, J. & Tomar, A. A review of PV power forecasting using machine learning techniques. Prog. Eng. Sci.2, 100058. https://doi.org/10.1016/j.pes.2025.100058 (2025).
Article Google Scholar
Liu, C. et al. A review of multitemporal and multispatial scales photovoltaic forecasting methods. IEEE Access10, 35073–35093. https://doi.org/10.1109/ACCESS.2022.3162206 (2022).
Article Google Scholar
Massidda, L. & Marrocu, M. Use of multilinear adaptive regression splines and numerical weather prediction to forecast the power output of a PV plant in borkum, germany. Solar Energy146, 141–149. https://doi.org/10.1016/j.solener.2017.02.007 (2017).
Article ADS Google Scholar
Zhang, J. et al. A suite of metrics for assessing the performance of solar power forecasting. Solar Energy111, 157–175. https://doi.org/10.1016/j.solener.2014.10.016 (2015).
Article ADS Google Scholar
Soman, S., Zareipour, H., Malik, O. & Mandal, P. A review of wind power and wind speed forecasting methods with different time horizons. In Proceedings of NAPS, 1–8 (2010). https://doi.org/10.1109/NAPS.2010.5619586.
Xu, Z., Zeng, A. & Xu, Q. FITS: Modeling time series with 10k parameters. In Proceedings of ICLR (2024).
Lin, S., Lin, W., Wu, W., Chen, H. & Yang, J. SparseTSF: Modeling long-term time series forecasting with 1k parameters. In Proceedings of ICML (2024).
Abdellatif, A., Amine, T. & Mohammed, T. Bi-LSTM, GRU and 1D-CNN models for short-term photovoltaic panel efficiency forecasting case amorphous silicon grid-connected PV system. Results Eng.21, 101886. https://doi.org/10.1016/j.rineng.2024.101886 (2024).
Article CAS Google Scholar
Liu, S. et al. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of ICLR (2022).
Zhou, T. et al. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of ICML (2022).
Zhang, Y. & Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of ICLR (2023).
Liu, Y. et al. iTransformer: Inverted transformers are effective for time series forecasting. In Proceedings of ICLR (2024).
Li, Y. et al. WNPS-LSTM-Informer: A hybrid stacking model for medium-term photovoltaic power forecasting with ranked feature selection. Renew. Energy244, 122687. https://doi.org/10.1016/j.renene.2025.122687 (2025).
Article Google Scholar
Zhai, C. et al. Photovoltaic power forecasting based on VMD-SSA-Transformer: Multidimensional analysis of dataset length, weather mutation and forecast accuracy. Energy324, 135971. https://doi.org/10.1016/j.energy.2025.135971 (2025).
Article Google Scholar
Tang, H., Kang, F., Li, X. & Sun, Y. Short-term photovoltaic power prediction model based on feature construction and improved transformer. Energy320, 135213. https://doi.org/10.1016/j.energy.2025.135213 (2025).
Article Google Scholar
Tao, K., Zhao, J., Tao, Y., Qi, Q. & Tian, Y. Operational day-ahead photovoltaic power forecasting based on transformer variant. Appl. Energy373, 123825. https://doi.org/10.1016/j.apenergy.2024.123825 (2024).
Article Google Scholar
Singh, P., Singh, N. & Singh, A. Wavelet transform based gated-recurrent unit deep learning approach for power output of solar photovoltaic system forecasting. SN Comput. Sci.6(3), 243. https://doi.org/10.1007/s42979-025-03786-9 (2025).
Article Google Scholar
Zhang, L., Zhu, J., Cheung, K. & Zhou, J. Online prediction of photovoltaic power considering concept drift. In 2023 IEEE Power & Energy Society General Meeting (PESGM), 1–5 (2023). https://doi.org/10.1109/PESGM52003.2023.10252625.
Hamad, S., Ghalib, M., Munshi, A., Alotaibi, M. & Ebied, M. Evaluating machine learning models comprehensively for predicting maximum power from photovoltaic systems. Sci. Rep.15(1), 10750. https://doi.org/10.1038/s41598-025-91044-6 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Thaker, J. & Höller, R. Hybrid model for intra-day probabilistic PV power forecast. Renew. Energy232, 121057. https://doi.org/10.1016/j.renene.2024.121057 (2024).
Article Google Scholar
Wang, K. et al. Accurate photovoltaic power prediction via temperature correction with physics-informed neural networks. Energy https://doi.org/10.1016/j.energy.2025.136546 (2025).
Article Google Scholar
Tian, Z., Chen, Y. & Wang, G. Enhancing PV power forecasting accuracy through nonlinear weather correction based on multi-task learning. Appl. Energy386, 125525. https://doi.org/10.1016/j.apenergy.2025.125525 (2025).
Article Google Scholar
Azeem, A. et al. Boosting STLF in smart grid via adaptive ensemble for concept drift. In Proceedings of the International Conference on E-Mobility—Volume 2. ICEM 2024, vol. 1439 of Lecture Notes in Electrical Engineering (2025). https://doi.org/10.1007/978-981-96-8093-1_20.
Jagait, R., Fekri, M., Grolinger, K. & Mir, S. Load forecasting under concept drift: Online ensemble learning with recurrent neural network and arima. IEEE Access9, 98992–99008. https://doi.org/10.1109/ACCESS.2021.3095420 (2021).
Article Google Scholar
Azeem, A., Ismail, I., Jameel, S., Romlie, F. & Danyaro, K. Concept drift scenarios in electrical load forecasting with different generation modalities. In 2022 International Conference on Future Trends in Smart Communities (ICFTSC), 18–23 (2022). https://doi.org/10.1109/ICFTSC57269.2022.10039888.
Luo., X. & Zhang., D. An adaptive deep learning framework for day-ahead forecasting of photovoltaic power generation. Sustain. Energy Technol. Assess.52, 102326 (2022). https://doi.org/10.1016/j.seta.2022.102326.
Azeem, A., Ismail, I., Jameel, S. & Danyaro, K. Transfer-learning enabled adaptive framework for load forecasting under concept-drift challenges in smart-grids across different-generation-modalities. Energy Rep.12, 3519–3532. https://doi.org/10.1016/j.egyr.2024.09.040 (2024).
Article Google Scholar
Kim, J., Kim, H., Kim, H., Lee, D. & Yoon, S. A comprehensive survey of deep learning for time series forecasting: Architectural diversity and open challenges. Artif. Intell. Rev.58, 216. https://doi.org/10.1007/s10462-025-11223-9 (2025).
Article Google Scholar
Kim, T. et al. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations (2021). https://openreview.net/forum?id=cGDAkQo1C0p.
Lin, S. et al. SegRNN: Segment recurrent neural network for long-term time series forecasting (2023). ArXiv:2308.11200 [cs.LG].
Yu, Y., Ma, R. & Ma, Z. Robformer: A robust decomposition transformer for long-term time series forecasting. Pattern Recognit.153, 110552. https://doi.org/10.1016/j.patcog.2024.110552 (2024).
Article Google Scholar
Ma, X., Li, X., Fang, L., Zhao, T. & Zhang, C. U-mixer: an unet-mixer architecture with stationarity correction for time series forecasting. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence, AAAI’24/IAAI’24/EAAI’24 (2024). https://doi.org/10.1609/aaai.v38i13.29337.
Tian, G., Zhang, C., Shi, Y. & Li, X. Multiwavenet: A long time series forecasting framework based on multi-scale analysis and multi-channel feature fusion. Expert Syst. Appl.251, 124088. https://doi.org/10.1016/j.eswa.2024.124088 (2024).
Article Google Scholar
J. A. Duffie (Deceased), N. B., W. A. Beckman. Available Solar Radiation, 2, 43–137 (Wiley, 2013).
Aoun, N. Methodology for predicting the PV module temperature based on actual and estimated weather data. Energy Convers. Manag.: X14, 100182. https://doi.org/10.1016/j.ecmx.2022.100182 (2022).
Article Google Scholar
Durusoy, B., Ozden, T. & Akinoglu, B. Solar irradiation on the rear surface of bifacial solar modules: A modeling approach. Sci. Rep.10(1), 13300. https://doi.org/10.1038/s41598-020-70235-3 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Z. et al. KAN: Kolmogorov-Arnold networks. In Proceedings of ICLR (2025).
DKASC, Alice Springs. The PV power full data set with weather data (Accessed: June 2025). https://dkasolarcentre.com.au/download?location=alice-springs.
Wu, G., Wang, Y., Zhou, Q. & Zhang, Z. Enhanced photovoltaic power forecasting: An iTransformer and LSTM-based model integrating temporal and covariate interactions (2024). ArXiv:2412.02302 [cs.LG].
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of ACM SIGKDD, 2623–2631 (2019).
Download references
The research was funded by a research grant from Zhejiang University.
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310058, China
Yue Yu
Zhejiang University – University of Illinois Urbana-Champaign Institute, Haining, 314400, China
Pavel Loskot
AI Research Center, Midea Group, Shanghai, 201103, China
Yu Gao
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Y.Y.: Writing – original draft, Conceptualization, Investigation, Formal analysis, Software, Methodology, Data curation. P.L.: Writing – review & editing, Conceptualization, Investigations, Project administration, Supervision. Y.G.: Resources, Project administration, Supervision. All authors reviewed the manuscript.
Correspondence to Pavel Loskot.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Yu, Y., Loskot, P. & Gao, Y. PhysEmbedFormer: a physics-guided interpretable architecture for days-ahead forecasting of PV power. Sci Rep 16, 4705 (2026). https://doi.org/10.1038/s41598-025-34874-8
Download citation
Received: 12 September 2025
Accepted: 31 December 2025
Published: 29 January 2026
Version of record: 03 February 2026
DOI: https://doi.org/10.1038/s41598-025-34874-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
Collection
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

source

This entry was posted in Renewables. Bookmark the permalink.

PhysEmbedFormer: a physics-guided interpretable architecture for days-ahead forecasting of PV power – Nature

Like this:

Leave a ReplyCancel reply

Links

WebSite

Follow Now.Solar via Email

Solar Now

Top Posts & Pages

New Posts

Calendar

Archives

Categories

Meta

Blog Followers

PhysEmbedFormer: a physics-guided interpretable architecture for days-ahead forecasting of PV power – Nature

Share this:

Like this:

Leave a ReplyCancel reply

Links

WebSite

Follow Now.Solar via Email

Solar Now

Top Posts & Pages

New Posts

Calendar

Archives

Categories

Meta

Tags

Blog Followers

Discover more from Solar Now