Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 10336 (2026)
1620
Metrics details
Accurate photovoltaic (PV) power forecasting is essential for grid operation but remains difficult due to nonlinear multi-scale dynamics and seasonal distribution shifts. This work presents MKAN-iTransformer, a cascaded framework that integrates two existing components—the Multi-Scale Kolmogorov–Arnold Network (MKAN) for scale-aware temporal representation learning and iTransformer for variable-wise attention and inter-variable dependency modeling—under a 15-minute single-step setting. Experiments on a real-world 30 MW PV plant dataset from the Chinese State Grid Renewable Energy Generation Forecasting Competition use chronological splits within each season. MKAN-iTransformer achieves the best overall performance in spring, autumn, and winter. In spring, it reaches MSE=2.892, RMSE=1.701, MAE=0.864, and ({R^{2}}=0.947), improving over LSTM by 23.5%/12.5%/20.5% (MSE/RMSE/MAE). In autumn, it attains MSE=2.884, RMSE=1.698, MAE=0.774, and ({R^{2}}=0.962), reducing errors vs. iTransformer by 16.5%/8.7%/12.4%. In winter, it achieves MSE=1.721, RMSE=1.312, MAE=0.443, and ({R^{2}}=0.969), yielding 81.6%/57.1%/71.9% error reductions vs. Transformer. Ablation further confirms the complementarity between MKAN and iTransformer and shows that direct KAN integration can be unstable under winter shifts (KAN-iTransformer: MSE=7.082, ({R^{2}}=0.872)).
Amid escalating global climate change, transforming energy structures and accelerating renewable energy adoption have become shared priorities worldwide1. The growing electricity demand drives increasing renewable power requirements2, motivated by the carbon neutrality and eco-friendliness of renewable energy sources (RESs) compared to fossil fuels3. Empirical studies report negative associations between carbon emissions and renewable energy consumption, indicating emissions decrease as per-capita renewable energy use increases4,5.
According to the International Energy Agency (IEA), renewables are expected to supply 42% of global electricity generation between 2023 and 2028, with solar and wind contributing 25%6. Photovoltaic (PV) generation, as representative green technology, has experienced rapid expansion. Global PV installed capacity continues growing with steadily rising power system share7. While large-scale PV integration brings substantial environmental benefits, it introduces new operational challenges8.
The primary challenge stems from PV output variability. PV generation is highly sensitive to meteorological conditions—irradiance, temperature, humidity, and wind speed—whose nonlinear and time-varying nature creates pronounced fluctuations and uncertainty9. This uncertainty complicates grid dispatch, increases storage and flexibility requirements, and affects electricity market operations10,11. Therefore, accurate PV power forecasting is essential for secure and efficient power system operation12 and supports downstream decision-making including operational management and demand response13.
These observations motivate a design that (i) captures multi-scale temporal dynamics of PV series and (ii) models cross-variable dependencies among meteorological inputs and historical power, while supporting transparent analysis. We develop MKAN-iTransformer, integrating Multi-Scale Kolmogorov-Arnold Networks (MKAN)14 with iTransformer15. MKAN provides multi-scale temporal representation learning with explicit functional structure, while iTransformer models inter-variable dependencies through variable-wise attention, together targeting robust and interpretable PV forecasting.
Contributions. Our main contributions are:
Cascaded forecasting framework. We develop MKAN-iTransformer, cascading multi-scale temporal representation learning with variable-wise attention for 15-minute single-step PV power prediction.
Season-wise chronological evaluation. Beyond overall test splits, we evaluate models within each season using chronological splits, making seasonal robustness and failure modes explicit.
Comprehensive KAN-enhanced baseline construction and evaluation. We systematically construct KAN/MKAN-augmented variants of recurrent and attention-based baseline architectures and establish unified benchmarking framework with consistent preprocessing, training, and evaluation protocols, enabling fair comparison and demonstrating the broad applicability of interpretable neural components in PV forecasting.
Interpretability analysis. We provide multi-scale time-frequency decomposition, learned KAN function inspection, and attention visualization for transparent explanations.
PV power forecasting has progressed from physics-driven and classical statistical models to modern machine learning and deep learning pipelines, largely driven by the need to handle nonlinearity, non-stationarity, and regime shifts.
Physical and statistical models. Early forecasting relied on physical simulation using meteorological inputs and device characteristics, which can be physically meaningful but often requires high-quality inputs and detailed plant specifications, limiting scalability in practice9,12. Classical statistical models (e.g., ARMA/ARIMA and regression families) exploit temporal correlations and can perform well under relatively stable conditions; for instance, regression combined with numerical weather prediction has shown robust hourly forecasting16. However, abrupt ramps and distribution shifts common in PV generation challenge these assumptions and motivate more flexible nonlinear approaches.
Machine learning approaches. Conventional ML methods improved nonlinear mapping from weather variables to PV output, including linear regression and ensemble methods such as random forests and gradient boosting17,18. Support vector regression has also been adopted for high-dimensional nonlinear forecasting17,19. Despite progress, many ML pipelines rely on handcrafted features and can degrade under seasonal and weather-regime shifts, encouraging end-to-end deep architectures with better representation learning.
RNN-based models. LSTM and GRU variants have been widely used to capture temporal dependencies in PV forecasting20,21,22. Performance gains have been reported via parallel structures, feature selection, CNN integration, and attention augmentation20,23,24,25. Nevertheless, RNN-based components remain sequential and may become computational bottlenecks for long contexts, while their ability to explicitly model cross-variable interactions is often limited.
Hybrid CNN-RNN architectures. CNN-LSTM and related hybrids seek to combine local pattern extraction and temporal modeling26, with variants replacing standard CNN blocks by temporal convolutional networks to improve receptive fields and parallelism27. Attention mechanisms are frequently introduced for feature weighting and fusion; for example, dual-stream CNN-LSTM with self-attention has been reported to improve accuracy on PV datasets28. However, these hybrids may still struggle to represent multi-scale behaviors spanning intra-hour variability to seasonal cycles, and they often treat heterogeneous meteorological variables as homogeneous inputs without an explicit variable-wise dependency mechanism.
Transformer-based architectures. Transformers have enabled stronger long-range dependency modeling for time series, with forecasting-oriented variants targeting efficiency and inductive biases. Informer reduces attention complexity via ProbSparse attention with (O(L log L)) behavior29; Autoformer and FEDformer incorporate decomposition and frequency-aware mechanisms to better capture trend/seasonality30,31. In PV-specific contexts, multi-scale and hybrid designs combine Transformers with CNNs/GRUs or decomposition modules32,33,34, and domain-enhanced Transformers inject domain knowledge or nonlinear dependency modeling to improve robustness35,36. The iTransformer introduces an inverted design that treats variables (rather than time steps) as tokens, enabling efficient variable-wise attention for cross-variable dependency modeling15, which is particularly relevant for PV forecasting where meteorological drivers and historical power jointly determine future output.
Multi-scale pattern recognition. PV generation exhibits multi-scale dynamics (diurnal cycles, intra-hour fluctuations, and weather-driven ramps), motivating multi-resolution modeling through decomposition, frequency-aware transformations, or multi-scale feature extraction. Interpretable deep learning pipelines have been proposed to disentangle multi-scale solar radiation variations while retaining predictive accuracy (e.g., reporting (R^2=0.97))37. Yet, many approaches increase architectural complexity and do not always provide transparent, component-wise explanations that remain stable across operating regimes.
Interpretability requirements in energy systems. For energy applications, interpretability supports operational decision-making and stakeholder trust, but many deep models remain black boxes; moreover, attention weights alone do not guarantee faithful explanations. This motivates exploring model families with more explicit functional forms.
Kolmogorov–Arnold Networks (KAN) for interpretable learning. KANs parameterize multivariate mappings via sums of learned univariate functions, often implemented with spline-based learnable functions, offering a potentially more inspectable representation than standard MLP layers38. Recent surveys summarize rapid development of KAN variants and applications (e.g., TKAN, Wav-KAN, DeepOKAN) and discuss their empirical strengths38,39. Theoretical extensions such as KKANs further improve robustness and approximation behavior40. For temporal data, KAN-based time series modeling has been explored, including general demonstrations and targeted work on bridging accuracy and interpretability in time series settings41,42. KAN integration with dynamical systems has also been studied via KAN-ODEs43. More recently, multi-scale KAN variants (MKAN) have been proposed to better capture mixed-frequency behaviors in temporal signals14. Despite these developments, systematic integration of KAN-style multi-scale representations with state-of-the-art variable-wise attention, and task-specific interpretability validation for PV forecasting, remains limited.
Many PV forecasting studies emphasize aggregate metrics, which can obscure failure modes under seasonal regime shifts. Seasonal variability changes irradiance, temperature, and daylight duration, making season-wise evaluation important for deployment9,12. However, evaluation protocols and baselines are often inconsistent across model families, hindering fair comparison and limiting insights into robustness under regime transitions.
The above literature motivates four gaps addressed in this work:
Architectural integration gap: limited evidence on combining multi-scale temporal representations with explicit variable-wise attention for PV forecasting14,15.
Interpretability integration gap: insufficient task-specific validation of interpretability when integrating KAN-style components with attention-based architectures38,39.
Evaluation methodology gap: limited systematic assessment under seasonal regime shifts9,12.
Benchmarking consistency gap: inconsistent protocols across model families impede fair comparison and understanding of when interpretable neural components help21.
Building on existing components14,15, our MKAN-iTransformer focuses on principled integration of MKAN-style multi-scale representation learning with variable-wise attention, accompanied by systematic seasonal evaluation and interpretability-oriented analyses to clarify both strengths and limitations under different regimes.
The photovoltaic (PV) power forecasting task aims to predict the near-future output power of a PV plant based on historical multivariate time series observations. Let the historical observation sequence be
where (x_t in mathbb {R}^d) denotes the d-dimensional feature vector at time step t.
In this study, we consider single-step forecasting with a 15-minute horizon (sampling interval = 15 minutes). Therefore, the forecasting horizon is (h=1), and the prediction target is the PV power at the next time step:
The input variables include total solar irradiance, direct normal irradiance, global horizontal irradiance, air temperature, atmospheric pressure, relative humidity, and historical PV power. The target variable is the PV plant output power at the next 15-minute step.
The Multi-Scale Kolmogorov-Arnold Network (MKAN) module is designed to efficiently capture complex, multi-scale temporal dependencies in multivariate time series forecasting. The overall structure is illustrated in Fig. 1 and consists of the following key components.
Overall architecture of the Multi-Scale Kolmogorov-Arnold Network (MKAN) module. The left part shows the hierarchical residual structure with stacked TimeKAN blocks, each extracting features at different scales through multi-scale patching (MSP) modules. The right part details the patching, encoding, KAN-based transformation, decoding, and unpatching process within each MSP block. Cumulative addition and subtraction operations are used to aggregate both local and global temporal features.
Multi-scale patching: Given an input sequence (X in mathbb {R}^{T times d}), we divide it into S sets of patches at different temporal scales, where the s-th scale consists of (N_s) patches of length (l_s):
Patch encoder: Each patch is mapped to a latent embedding via a learnable encoder:
where (operatorname {Enc}_s) denotes the patch encoder for scale s.
KAN-based Transformation: Each scale has a dedicated Kolmogorov-Arnold Network (KAN) block to transform the encoded patch embedding:
where (operatorname {KAN}_s) is the KAN subnetwork for the s-th scale.
Patch decoder: The transformed embeddings are decoded back to the temporal domain:
Feature aggregation: The reconstructed patches are reassembled to form multi-scale feature maps, which are then aggregated (e.g., by summation or concatenation) to obtain the final sequence representation:
where (operatorname {Agg}) denotes the aggregation operation across scales.
Forecasting head: The aggregated features are passed to a forecasting head to generate the final prediction:
The overall output of the MKAN module can be summarized as a weighted sum of KAN transformations across all scales:
where (phi _{s,n}(cdot )) represents the output of the KAN subnetwork for the n-th patch at scale s, (alpha _{s,n}) are learnable weights, and b is a bias term.
A major advantage of the MKAN module is its interpretability. Each KAN block is inherently symbolic and can be visualized or analyzed, allowing for direct inspection of the learned temporal features at each scale. In summary, the MKAN module integrates multi-scale patching with expressive KAN transformations, providing a transparent and effective solution for multivariate time series forecasting.
The iTransformer module is designed to efficiently model multivariate time series forecasting by leveraging an inverted Transformer architecture. The overall structure is illustrated in Fig. 2 and consists of the following key components.
Overall architecture of the iTransformer module. The framework consists of independent variable-wise embedding, temporal layer normalization, multivariate self-attention, feed-forward transformation, and aggregation for final forecasting. The left and right parts of the figure detail the embedding and feed-forward processes, respectively.
Variable-wise embedding: Given a multivariate input sequence (X in mathbb {R}^{T times N}), where T is the sequence length and N is the number of variables, each variable’s time series (X_{:,n}) is independently embedded into a latent representation:
where (operatorname {Embedding}) is a learnable mapping from (mathbb {R}^T) to (mathbb {R}^d).
Temporal layer normalization: Each variable embedding is normalized along the temporal dimension to reduce scale and distribution discrepancies:
where (mu _n) and (sigma _n) are the mean and standard deviation of the n-th variable embedding.
Multivariate self-attention: All variable embeddings are jointly processed by a self-attention mechanism to capture inter-variable dependencies:
where Q, K, V are linear projections of the variable embeddings. The detailed structure of the multivariate self-attention mechanism is shown in Fig. 3.
Detailed structure of the multivariate self-attention mechanism in the iTransformer module. The input is first projected to Q, K, and V, then split into multiple heads for independent attention computation. The results are merged and projected to form the final output.
Feed-forward network: Each variable embedding is independently transformed by a shared feed-forward network to extract nonlinear features:
where (operatorname {FFN}) denotes a two-layer MLP with activation and dropout.
Stacked blocks and aggregation: The above operations are stacked for L layers, and the final output embeddings are aggregated for forecasting:
where (operatorname {TrmBlock}) denotes one iTransformer block, and (operatorname {Projection}) maps the final embedding to the prediction space.
The overall output of the iTransformer module can be summarized as:
where (operatorname {Head}) is typically a linear layer for regression or forecasting.
A major advantage of the iTransformer module is its variable-centric design. By treating each variable’s time series as an independent token, the model can explicitly capture inter-variable correlations and global temporal patterns, while maintaining efficient parallel computation and interpretability of learned representations. In summary, the iTransformer module integrates variable-wise embedding, normalization, and attention-based transformation, providing a simple yet powerful backbone for multivariate time series forecasting.
The hybrid architecture adopts a cascaded design, where the Multi-Scale Kolmogorov-Arnold Network (MKAN) module first extracts multi-scale temporal features from the input sequence, and the resulting representations are subsequently processed by the iTransformer module to model inter-variable dependencies. The overall structure is illustrated in Fig. 4.
Overall architecture of the cascaded MKAN-iTransformer framework. The pipeline consists of sequential MKAN and iTransformer modules, followed by a forecasting head. The left part details the multi-scale patching and KAN transformation, while the right part illustrates variable-wise attention and prediction.
Multi-scale feature extraction (MKAN): Given an input sequence (X in mathbb {R}^{T times N}), the MKAN module extracts multi-scale temporal features:
where (Z_{text {MKAN}}) encodes rich temporal dependencies across different resolutions.
Inter-variable modeling (iTransformer): The multi-scale features (Z_{text {MKAN}}) are fed into the iTransformer module, which captures global dependencies among variables via self-attention mechanisms:
where (Z_{text {iTrm}}) denotes the variable-attentive feature representation.
Forecasting head: The final representation is passed to a forecasting head to generate the prediction:
This cascaded hybrid design enables the model to:
Efficiently extract multi-scale temporal patterns using the MKAN module, which models complex dynamics at various time resolutions.
Explicitly capture inter-variable relationships through the iTransformer, which leverages attention to integrate information across variables.
Produce robust and interpretable representations for accurate multivariate time series forecasting.
In summary, the cascaded MKAN-iTransformer architecture unifies multi-scale temporal feature extraction and variable-wise attention modeling, forming a transparent and powerful backbone for multivariate time series forecasting.
In this study, real-world operational data from a 30 MW photovoltaic (PV) power plant are utilized for experimental evaluation. The dataset contains records from 2019 and 2020, with a sampling interval of 15 minutes. The input features include total solar irradiance, direct normal irradiance, global horizontal irradiance, air temperature, atmospheric pressure, and relative humidity. The target variable is the output power of the PV power plant. Details are shown in Table 1.
The quality of the dataset has a decisive impact on the accuracy of forecasting models. Therefore, it is particularly important to pay attention to missing value handling and dataset partitioning during the process of model optimization. To ensure the overall trend and consistency of the data, this study first employs linear interpolation to impute missing values during the data preprocessing stage. For outliers in each column, reasonable value ranges are defined based on actual physical meanings. Values exceeding these ranges are clipped to the valid interval, thereby improving the reliability of the data and the prediction accuracy of the model.
Due to the fact that the operational intensity of photovoltaic systems is almost negligible during nighttime, the dataset contains sparse and uninformative data points for these periods. Such sparsity is detrimental to the performance of forecasting models. To address this issue, all nighttime data points were excluded from the dataset in this study. Specifically, only data collected between 6:00 AM and 8:00 PM were retained for subsequent experiments.For the 15-minute single-step setting, we align inputs and targets by shifting the PV power series by one step: the target at time t is the PV power at (t+1). This alignment is performed after nighttime filtering, and no future information is included in the model inputs.
A total of 70,177 sampling points were collected from two years of photovoltaic data. The data were divided into four seasons according to the following scheme: spring (March to May), summer (June to August), autumn (September to November), and winter (December to February). The number of sampling points for each season was 17,666, 17,378, 17,467, and 17,666, respectively.
To explore the relationships between meteorological and operational features and photovoltaic (PV) output power, this study employs the Pearson Correlation Coefficient for all numerical variables. The Pearson correlation coefficient measures the degree of linear correlation between two variables, with possible values in the interval ([-1,1]), where a value closer to 1 or (-1) indicates a stronger correlation. A positive value indicates a positive correlation, while a negative value indicates a negative correlation. Note that the correlation analysis is conducted for interpretability and exploratory understanding, rather than for feature selection. In particular, we retain all physically meaningful variables to support the subsequent variable-wise attention visualization of the iTransformer and to avoid excluding variables that may contribute through nonlinear interactions.
The calculation formula for the Pearson correlation coefficient is as follows:
where (x_i) and (y_i) denote the i-th observations of the two variables, (bar{x}) and (bar{y}) are their respective means, and n is the total number of samples.
The correlation among features is visualized in the form of a heatmap, as shown in Fig. 5. Furthermore, the Pearson correlation coefficients between the main meteorological features and PV output power are listed in Table 2.
Heatmap of Pearson correlation coefficients among main features.
As shown in Fig. 5 and Table 2, PV output power (Power, MW) has the strongest correlation with total solar irradiance (Total solar irradiance, W/m(^2)), with a coefficient as high as 0.95. It also shows strong positive correlations with direct normal irradiance (Direct normal irradiance, W/m(^2)) and global horizontal irradiance (Global horizontal irradiance, W/m(^2)), with coefficients of 0.89 and 0.64, respectively. This indicates that irradiance is the dominant factor affecting PV output power.
Air temperature ((^circ)C) has a correlation coefficient of 0.26 with output power, indicating a weak positive correlation. Relative humidity (%) shows a negative correlation with output power, with a coefficient of (-0.35). Atmospheric pressure (hPa) exhibits a very low correlation with PV output power, suggesting a limited linear association. Overall, irradiance-related features are the primary factors influencing PV output power, while temperature, humidity, and pressure provide complementary meteorological information.
Final input features. In the forecasting experiments, the model inputs include total solar irradiance, direct normal irradiance, global horizontal irradiance, air temperature, atmospheric pressure, relative humidity, and historical PV power, while the prediction target is the PV plant output power at the next 15-minute step.
This subsection describes the compared models and the unified hyperparameter tuning protocol used to ensure fair and reproducible evaluation.
Compared models. We evaluate multiple forecasting backbones and their KAN/MKAN-augmented variants for 15-minute single-step PV power forecasting. KAN and MKAN are adopted from prior work; we implement their integrations with different backbones to form the compared variants. Specifically, we consider LSTM/GRU/BiLSTM/Transformer/xLSTM/iTransformer and their corresponding KAN- and MKAN-augmented versions (i.e., KAN-LSTM and MKAN-LSTM; KAN-GRU and MKAN-GRU; KAN-BiLSTM and MKAN-BiLSTM; KAN-Transformer and MKAN-Transformer; KAN-xLSTM and MKAN-xLSTM; KAN-iTransformer and MKAN-iTransformer). All models are trained and evaluated under the same input–output setting.
Chronological split. To avoid look-ahead bias in time-series forecasting, we split the data in chronological order into a training set (80%), a validation set (10%), and a test set (10%). Specifically, the earliest 80% of samples are used for training, the subsequent 10% for validation, and the latest 10% for testing. The same temporal rule is applied within each seasonal subset.
Grid search protocol. Hyperparameters are tuned on the validation set using a grid search with the following candidate values: learning rate in ({1times 10^{-2}, 5times 10^{-3}, 1times 10^{-3}, 5times 10^{-4}}), hidden dimension in ({32, 64, 128}), number of skip connections in ({1, 2, 3}), number of attention heads in ({2, 4, 8}), and convolution kernel size in ({3, 5, 7}). This yields (4times 3times 3times 3times 3 = 324) configurations.
For model components where a hyperparameter is not applicable (e.g., attention heads for purely recurrent architectures), we keep that component at its default setting while tuning the remaining applicable parameters. The same tuning criterion (minimum validation loss) and training budget are applied to all models.
Training and selection. Each configuration is trained for up to 100 epochs with early stopping based on the validation loss (patience = 10), and the checkpoint with the best validation loss is selected. The best hyperparameter setting is chosen according to the validation loss. Using the selected hyperparameters, we retrain the model on the union of the training and validation sets and report the final performance on the held-out test set. All experiments are conducted with a fixed random seed (seed = 42) to reduce randomness.
All models are implemented in PyTorch and trained using the same pipeline to ensure a fair comparison. The input features are standardized using statistics computed on the training split only, and the same transformation is applied to the validation and test splits. The PV power target is kept in its original scale (i.e., no target normalization is applied).
We optimize all models using the Adam optimizer and minimize the mean squared error (MSE) on the training set. The initial learning rate and other hyperparameters are selected via the validation-based grid search described above. We use mini-batch training with a batch size of 64. To improve training stability, gradient clipping is applied with a maximum norm of 1.0. Early stopping is performed based on the validation loss with a patience of 10 epochs, and the checkpoint with the lowest validation loss is selected.
After hyperparameter selection, each model is retrained on the combined training and validation sets using the selected configuration, and the final performance is reported on the held-out test set using MSE, RMSE, MAE, and (R^2). All experiments are conducted with a fixed random seed (seed = 42) to reduce randomness.
To comprehensively evaluate the prediction performance of the proposed MKAN-iTransformer and baseline models, four commonly used regression metrics are adopted: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination ((R^2)). The definitions are as follows:
Mean Squared Error (MSE):
Root Mean Squared Error (RMSE):
Mean Absolute Error (MAE):
Coefficient of determination ((R^2)):
where (bar{y}) is the mean of the true values.
A lower value of MSE, RMSE, and MAE indicates better model performance, while a higher (R^2) value (closer to 1) implies a better fit between predictions and actual values.
In this section, we present and analyze the experimental results of the proposed MKAN-iTransformer model and various baseline methods on photovoltaic power forecasting tasks. The experiments are conducted under different seasonal. The performance of all models is evaluated using the metrics introduced previously.
To systematically evaluate the impact of seasonal variations on model performance, we adopted the conventional monthly division method to classify the dataset into four seasons: spring (March–May), summer (June–August), autumn (September–November), and winter (December–February). This classification enables a more detailed analysis of the predictive capabilities of MKAN-iTransformer and baseline models under different seasonal conditions.
Typical daily PV power curves for each month in 2019 and 2020.
As shown in Fig. 6, the typical daily power curves for each month in 2019 and 2020 exhibit significant seasonal variations. The power output is higher in spring and summer due to abundant sunlight, while it is relatively lower in autumn and winter. These seasonal differences provide a solid foundation for the subsequent model performance analysis based on seasonal classification.
To validate the effectiveness of MKAN-iTransformer, we conducted a detailed analysis of model prediction results and error distributions across different seasonal conditions. This section presents a comparative evaluation of MKAN-iTransformer and baseline models, highlighting both the accuracy and robustness of the proposed approach.
Spring: single-day prediction curves and prediction error distribution.
Summer: single-day prediction curves and prediction error distribution.
Autumn: single-day prediction curves and prediction error distribution.
Winter: single-day prediction curves and prediction error distribution.
Figures 7, 8, 9 and 10 present typical-day forecasting results for spring, summer, autumn, and winter, respectively. For each season, the upper subfigure compares the predicted PV output power with the ground-truth measurements (black curve) over the daytime period (6:00–20:00 at 15-minute intervals), while the lower subfigure summarizes the corresponding prediction error distribution of each model. This season-wise “curve fitting + error distribution” layout allows an intuitive assessment of both temporal tracking ability (shape, peak timing, and ramping behavior) and statistical error characteristics (bias, dispersion, and tail behavior).
In the spring case (Fig. 7), most models can capture the overall diurnal pattern, but noticeable deviations appear around rapid ramping segments and local peaks. The proposed MKAN-iTransformer shows a closer alignment with the ground truth during the main rising stage and peak region, and its error histogram is more concentrated around zero, suggesting reduced dispersion and fewer large-magnitude errors.
For summer (Fig. 8), PV output typically exhibits a smoother and higher plateau under stronger irradiance conditions, making the dominant daily trend easier to learn. Accordingly, multiple models achieve relatively good tracking performance. Nevertheless, differences remain in reproducing sharp changes (e.g., abrupt drops and recoveries), where MKAN-iTransformer tends to maintain smaller deviations. The error distribution in summer is comparatively narrower for several models, indicating that the forecasting task is less challenging than in transitional or winter conditions.
In autumn (Fig. 9), the ground-truth curve shows more frequent fluctuations and irregular ramps, likely due to increased variability in meteorological conditions. Some baseline models display either lagged responses or oversmoothing, leading to larger deviations during abrupt changes. MKAN-iTransformer provides more stable tracking across multiple fluctuation segments, and its error distribution shows reduced spread relative to many baselines, implying improved generalization to more volatile patterns.
Winter results (Fig. 10) are the most challenging, as reflected by larger mismatches in several baselines and a visibly broader error spread in the histogram. The seasonal difficulty may be attributed to lower sun angles, shorter effective generation windows, and more frequent rapid variations (e.g., due to clouds and atmospheric conditions), which amplify both bias and variance in predictions. In contrast, MKAN-iTransformer remains closely aligned with the ground truth for most time intervals, and the error distribution remains comparatively concentrated, indicating stronger robustness under adverse seasonal conditions.
Overall, across all four seasons, MKAN-iTransformer consistently achieves closer curve fitting and more compact error distributions, demonstrating improved accuracy and robustness. These observations are consistent with the quantitative seasonal metrics reported in Table 3, where MKAN-iTransformer achieves competitive or best performance on MSE/RMSE/MAE and high (R^2) in multiple seasons.
Using MKAN-iTransformer as the main benchmark, we compare it with representative baselines (LSTM, GRU, BiLSTM, Transformer, xLSTM, and iTransformer) as well as KAN/MKAN-augmented variants on seasonal datasets. The quantitative results in Table 3 indicate that MKAN-iTransformer achieves the most consistent and competitive performance across seasons. In particular, it attains the best overall results in spring, autumn, and winter (covering MSE, RMSE, MAE, and (R^2)), while in summer it delivers the lowest MSE/RMSE and remains highly competitive in (R^2), although the best MAE is achieved by KAN-GRU.
In spring, MKAN-iTransformer achieves the best performance across all four metrics, with MSE = 2.892, RMSE = 1.701, MAE = 0.864, and (R^2) = 0.947. Compared with LSTM, it reduces MSE/RMSE/MAE by 23.5%, 12.5%, and 20.5%, respectively, and improves (R^2) by 1.7%. Relative to GRU, MKAN-iTransformer reduces MSE by 13.7%, RMSE by 7.1%, and MAE by 13.4%, while increasing (R^2) by 0.9%. Against Transformer, the reductions are 7.4% (MSE), 3.7% (RMSE), and 11.5% (MAE), with a 0.4% gain in (R^2). These improvements demonstrate that MKAN-iTransformer better captures springtime ramping and peak behaviors, yielding both lower average error and improved goodness-of-fit.
Summer exhibits different characteristics: MKAN-iTransformer achieves the lowest MSE (3.962) and RMSE (1.991) among all compared models, while the best MAE is obtained by KAN-GRU (0.951), and the highest (R^2) is achieved by xLSTM (0.924). Compared with LSTM, MKAN-iTransformer decreases MSE and RMSE by 3.3% and 1.6%, and slightly increases (R^2) (0.921 to 0.923). Compared with iTransformer, it yields a clear reduction in MSE (9.5%) and RMSE (4.8%) and improves (R^2) from 0.915 to 0.923. Although its MAE is not the best in summer, the advantage in MSE/RMSE suggests MKAN-iTransformer is particularly effective at suppressing larger deviations (which are weighted more heavily by MSE), while some models (e.g., KAN-GRU) achieve smaller absolute errors on average.
In autumn, MKAN-iTransformer again provides the best results across all metrics (MSE = 2.884, RMSE = 1.698, MAE = 0.774, (R^2) = 0.962). Compared with LSTM, it reduces MSE/RMSE/MAE by 24.9%, 13.4%, and 27.4%, respectively, and improves (R^2) by 1.4%. Relative to GRU, it reduces MSE by 9.4%, RMSE by 4.8%, and MAE by 11.6%, with (R^2) increasing from 0.958 to 0.962. Compared with iTransformer, MKAN-iTransformer reduces MSE by 16.5%, RMSE by 8.7%, and MAE by 12.4%, while improving (R^2) from 0.954 to 0.962. These results indicate stronger adaptability to autumn’s higher variability and more frequent fluctuations.
Winter is the most challenging season for many baselines, yet MKAN-iTransformer achieves the strongest overall performance with MSE = 1.721, RMSE = 1.312, MAE = 0.443, and (R^2) = 0.969. Compared with LSTM, it reduces MSE/RMSE/MAE by 71.4%, 46.5%, and 66.6%, respectively, and improves (R^2) from 0.891 to 0.969 (an 8.8% relative increase). Against Transformer, the reductions are 81.6% (MSE), 57.1% (RMSE), and 71.9% (MAE), with (R^2) increasing from 0.831 to 0.969. Compared with iTransformer, MKAN-iTransformer remains slightly better in error-based metrics (e.g., MSE from 1.730 to 1.721 and RMSE from 1.315 to 1.312) while maintaining the same (R^2). Overall, these results demonstrate that MKAN-iTransformer offers strong robustness under winter conditions, substantially reducing both average errors and large-error events relative to most baselines.
This work focuses on evaluating the effectiveness of combining an iTransformer backbone with KAN-based modules. Note that KAN and MKAN are borrowed from prior work and are not proposed in this paper; our goal is to investigate whether integrating these modules with iTransformer yields complementary gains and improved robustness across seasonal distributions.
Model variants. We compare four variants: (1) iTransformer, the backbone baseline; (2) KAN-iTransformer, which integrates a KAN-based (ekan) module into iTransformer; (3) MKAN-iTransformer, which combines MKAN with iTransformer (our main combination model); and (4) MKAN, the standalone MKAN model without iTransformer, included to distinguish the effect of MKAN alone from the fusion setting. All variants are trained and evaluated under the same experimental protocol.
Metrics. We report MSE, RMSE, and MAE (lower is better) as well as (varvec{R^2}) (higher is better). To examine distribution shifts, results are presented for Spring, Summer, Autumn, and Winter.
Results and discussion. As shown in Table 4, MKAN-iTransformer delivers the most consistent improvements across seasons. In Spring, it achieves the best results on all metrics, indicating clear complementarity between MKAN and iTransformer. In Autumn, MKAN-iTransformer again obtains the best overall performance, slightly outperforming KAN-iTransformer, suggesting that the multi-scale design provides additional benefit beyond directly integrating KAN.
In Summer, MKAN-iTransformer yields the lowest MSE/RMSE and the highest (R^2), while iTransformer attains the lowest MAE. This indicates a trade-off between reducing larger errors (more reflected by squared-error metrics) and minimizing average absolute deviation; nevertheless, the improved RMSE and (R^2) suggest a better overall fit for MKAN-iTransformer.
In Winter, iTransformer and MKAN-iTransformer are nearly identical, implying that the iTransformer backbone already captures the dominant winter dynamics and that MKAN integration does not introduce degradation. By contrast, KAN-iTransformer shows a pronounced performance drop in winter (MSE=7.082, (R^2)=0.872), indicating that this integration may be more sensitive to seasonal distribution shifts. Overall, these results support that MKAN-iTransformer is a robust and effective combination, whereas the gains from KAN-iTransformer are less stable across seasons.
To explain the seasonal performance differences observed in the previous sections, we conduct an interpretability analysis of MKAN from three complementary perspectives. First, we decompose the PV power signal into hierarchical temporal components to isolate fast ramps, intermediate variations, and slow diurnal trends, and validate the separation in both time and frequency domains. Second, we inspect the learned KAN edge functions to understand how MKAN adapts its nonlinear transformations across seasons. Third, we visualize the inverted attention mechanism over features to quantify seasonal changes in feature importance, attention dispersion, and cross-feature interaction pathways.
Together, these analyses form a consistent evidence chain from signal dynamics (multi-scale decomposition), to nonlinear representation (KAN activations), and finally to decision routing (feature-wise attention), clarifying why the model behaves differently under distinct seasonal atmospheric regimes.
To capture PV dynamics from fast cloud-induced ramps to slow diurnal trends, the MKAN module decomposes the 15-min PV power series into three additive temporal components using hierarchical moving-average (MA) operators and residual (difference) bands. This formulation yields a physically consistent separation of high-, mid-, and low-frequency behaviors while preserving approximate additivity.
Let P(t) denote the normalized PV power at 15-min resolution and let (textrm{MA}_m(cdot )) be an m-step moving average (centered window for analysis/visualization). We define:
Thus,
where (epsilon (t)) mainly captures boundary effects and minor mismatch.
Figure 11 illustrates the multi-scale decomposition of PV power on a representative summer day. The decomposition separates the observed signal into three time-scale components, which helps interpret variability sources and motivates using scale-aware features in forecasting.
High-frequency (45 min and below): rapid ramps and short-term fluctuations dominated by transient clouds and local turbulence, critical for short-horizon forecasting.
Medium-frequency (90–180 min): intra-day variability related to evolving weather regimes and smooth changes in solar geometry.
Low-frequency (180 min trend): slowly varying baseline reflecting the dominant diurnal envelope and seasonal irradiance level.
Multi-scale temporal decomposition of PV power on a representative summer day. From top to bottom, the panels show the original signal and its high-, medium-, and low-frequency components. The low-frequency term captures the smooth diurnal envelope, the medium-frequency term reflects intra-day regime changes, and the high-frequency term highlights fast fluctuations.
To validate that the proposed multi-scale decomposition indeed separates variability across time scales, we conduct a frequency-domain check using Welch’s power spectral density (PSD). Figure 12 reports the PSD characteristics of the decomposed components: the high-frequency residual (P_{text {high}}), the medium-frequency component (P_{text {mid}}), and the low-frequency trend (P_{text {low}}) for a representative summer day.
We partition the frequency axis into three bands (in cycles/hour) to summarize spectral energy:
Low-frequency band: (f < 0.1) (dominant diurnal/slow envelope and baseline variations).
Mid-frequency band: (0.1 le f le 0.5) (intra-day variability and regime transitions).
High-frequency band: (f > 0.5) (fast ramps and short-term fluctuations).
For each component, the band energy percentages are computed by integrating its PSD over the corresponding frequency band and normalizing by the component’s total spectral energy:
where
denotes the Welch PSD estimate and
is one of the three bands above.
As shown in Fig. 12, (P_{text {high}}) allocates a larger portion of energy to higher frequencies, while (P_{text {low}}) concentrates energy in the low-frequency region consistent with the diurnal envelope. The medium-scale component (P_{text {mid}}) mainly captures intermediate-band energy, supporting the intended multi-scale separation.
Frequency-domain validation (summer). The left column shows Welch PSD for (P_{text {high}}), (P_{text {mid}}), and (P_{text {low}}). The top-right panel compares PSD curves across scales, and the bottom-right panel summarizes the energy distribution over the predefined low/mid/high frequency bands.
Overall, this hierarchical MA residual decomposition provides interpretable temporal bands and supports MKAN’s multi-branch design, reducing interference between fast ramps and slow trends. The seasonal consistency of this separation is further confirmed in Table 5.
KAN replaces fixed activation functions (e.g., ReLU, GELU) with learnable univariate edge functions, making nonlinear transformations explicit and interpreable. We analyze learned activation patterns and relate their shapes to PV forecasting behavior across seasonal regimes.
For an input feature (x_i) and output node (y_j), KAN learns an edge function (phi _{i,j}(cdot )) using cubic B-splines:
where (B_k(x)) are spline basis functions, (c_{i,j,k}) are learnable coefficients, and K denotes the number of spline control points. A KAN layer aggregates edge functions as:
This formulation allows each connection to learn a data-driven nonlinear mapping tailored to a specific input-output relation.
Figure 13 illustrates representative learned KAN activations and their differences from standard fixed activations. In PV forecasting, asymmetric nonlinear responses are useful: suppressing low-power noise (e.g., dawn/dusk or heavy haze) while preserving sensitivity during normal operating conditions.
Comprehensive analysis of KAN activation functions. The figure compares fixed activations with representative learned KAN activations and highlights how learnable nonlinearities adapt to different data regimes.
Figure 14 shows season-specific learned activations, indicating that KAN adapts its nonlinearity to seasonal PV dynamics.
Seasonal adaptation of learned KAN activation functions. Each panel shows a representative learned activation from seasonal data (colored) compared with a fixed baseline (gray). Shaded regions indicate deviation, highlighting season-specific nonlinear adaptation.
We quantify seasonal differences using three metrics over a fixed input range:
Table 6 indicates stronger nonlinear adaptation in more challenging regimes, supporting KAN interpretability: learned activation shapes reflect seasonal PV generation characteristics.
MKAN adopts an iTransformer-style inverted attention mechanism operating over the feature dimension, enabling dynamic feature-to-feature interaction modeling. We visualize seasonal feature importance, attention distributions, and cross-feature attention pathways to interpret how meteorological variables contribute under different atmospheric conditions.
Figure 15 presents normalized feature importance by season. Table 7 reports the corresponding scores (normalized to the maximum within each season), revealing clear seasonal reweighting between irradiance-driven and atmosphere-driven predictors.
Seasonal comparison of feature importance scores. Bars show normalized importance of each meteorological feature within a season.
Across seasons, DNI dominates in spring and winter, while GHI becomes most important in summer, reflecting stronger scattering/cloud effects. Autumn shifts toward atmospheric pressure and historical power, suggesting increased reliance on synoptic conditions and temporal persistence during transitional weather.
Winter vs. summer shift: Compared with summer, winter assigns substantially higher importance to RH (+0.5742) and DNI (+0.4962), and also increases reliance on historical power (+0.3096). In contrast, GHI becomes less dominant in winter (–0.1831), consistent with reduced diffuse-driven regimes and stronger sensitivity to beam irradiance availability.
Figure 16 shows attention weight distributions across features and seasons. Wider distributions indicate more frequent reallocation of attention, typically associated with more volatile atmospheric conditions.
Seasonal attention weight distributions across features. F1: Total solar irradiance, F2: Direct normal irradiance, F3: Global horizontal irradiance, F4: Air temperature, F5: Atmospheric pressure, F6: Relative humidity, F7: Power.
We summarize attention dispersion using entropy computed from mean attention weights:
where (bar{w}_i) is the mean attention weight of feature i. Table 8 reports the attention entropy and the seasonal prediction performance (RMSE in MW). Higher entropy indicates more distributed attention (i.e., no single dominant feature), reflecting more frequent reallocation of attention across variables under volatile atmospheric conditions.
Figure 17 visualizes seasonal cross-feature attention matrices. To highlight dominant interaction pathways, Table 9 lists the top-3 attention pairs (query (rightarrow) key) per season.
Seasonal cross-feature attention matrices. Rows are query features, columns are key features. F1: Total solar irradiance, F2: Direct normal irradiance, F3: Global horizontal irradiance, F4: Air temperature, F5: Atmospheric pressure, F6: Relative humidity, F7: Power.
These pathways are physically plausible: summer emphasizes humidity–irradiance coupling (cloud formation and scattering), while winter concentrates multiple queries onto DNI, indicating that beam irradiance penetration becomes a key bottleneck signal under haze/fog conditions.
We further quantify attention matrix structure using diagonal dominance:
and interaction diversity:
Table 10 confirms a seasonal shift between distributed attention (higher I, lower D) and focused attention (higher D, lower I), consistent with changes in atmospheric conditions and feature reliability.
This study has several limitations that should be acknowledged when interpreting the results.
Single-site evaluation. All experiments are conducted on data from a single PV plant. While the seasonal split provides a meaningful within-site distribution-shift test, the cross-site generalization of MKAN-iTransformer (e.g., different climates, terrains, PV technologies, and sensor configurations) is not verified here.
Daytime-only forecasting protocol. Nighttime samples are excluded (06:00–20:00) because PV generation is near-zero and the series becomes sparse and less informative for learning daytime dynamics. This choice improves training stability and focuses the evaluation on operationally relevant generation periods, but it limits applicability to round-the-clock settings. In particular, behavior during dawn/dusk transitions and full-day forecasting is not evaluated.
Dataset size and coverage. The dataset covers two years and yields a moderate number of samples after filtering and seasonal partitioning. Although sufficient for 15-minute single-step forecasting, larger multi-year and multi-site datasets may expose additional failure modes, especially rare extreme-weather ramps.
Lack of uncertainty quantification. This work reports point forecasting metrics (MSE/RMSE/MAE and ({R^{2}})) only. For grid operation and risk-aware scheduling, probabilistic forecasts (e.g., prediction intervals or quantiles) and calibration analyses are often required. Uncertainty quantification is not addressed in this paper.
These limitations motivate future work on cross-site evaluation, round-the-clock and multi-horizon forecasting protocols, and probabilistic forecasting with calibrated uncertainty estimates.
This paper studies robust and interpretable PV power forecasting under seasonal regime shifts and proposes MKAN-iTransformer, a cascaded hybrid framework that combines MKAN-based multi-scale temporal representation learning with iTransformer-style variable-wise attention for cross-variable dependency modeling. The model is evaluated under a unified protocol for 15-minute single-step forecasting with consistent preprocessing, hyperparameter tuning, and chronological splits within each seasonal subset.
Seasonal accuracy and robustness. Season-wise benchmarking (Table 3) shows that MKAN-iTransformer achieves consistent and competitive performance across all four seasons. It delivers the best overall results in spring, autumn, and winter across MSE/RMSE/MAE and ({R^{2}}), and remains highly competitive in summer with the lowest squared-error metrics. The typical-day prediction curves and error histograms further support these findings by showing closer tracking during ramps and peaks and more concentrated error distributions, indicating fewer large-deviation events under seasonal variability.
Component contribution validated by ablation. The ablation study (Table 4) isolates the effects of MKAN and iTransformer and confirms that their combination is beneficial. Comparing iTransformer, MKAN, and MKAN-iTransformer demonstrates that neither multi-scale temporal modeling nor variable-wise dependency modeling alone fully explains the observed improvements; rather, the gains arise from their complementarity. In addition, the inclusion of KAN-iTransformer reveals that not all KAN-style integrations are equally stable: KAN-iTransformer exhibits a pronounced degradation in winter, suggesting sensitivity to seasonal distribution shifts, whereas MKAN-iTransformer remains robust.
Interpretability evidence. Beyond performance, we provide a coherent interpretability analysis from three perspectives: (i) a multi-scale temporal decomposition aligned with MKAN branches and validated in the frequency domain, clarifying how fast ramps, intermediate variations, and slow diurnal trends are separated; (ii) inspection and quantification of learned KAN univariate edge/activation functions, showing season-dependent nonlinear adaptations; and (iii) feature-wise attention visualization, demonstrating seasonal reweighting of meteorological drivers and physically plausible cross-feature interaction pathways.
Implications. Overall, MKAN-iTransformer offers an effective balance among accuracy, seasonal robustness, and model transparency for short-horizon PV forecasting. The results indicate that coupling scale-aware temporal feature extraction with explicit inter-variable modeling is a practical strategy to mitigate seasonal degradation commonly observed in baseline architectures.
Future directions. Future work will extend the evaluation to multi-site datasets and round-the-clock settings, generalize the framework to multi-horizon forecasting, and incorporate uncertainty quantification to produce calibrated prediction intervals suitable for risk-aware operational decision-making.
The dataset used in this study is sourced from the State Grid Corporation of China New Energy Power Generation Forecasting Competition (^{?}). Comprehensive experiments were conducted on this authoritative dataset to verify the superior performance of the proposed model under different seasonal conditions. The dataset is publicly available and can be accessed at the following link: https://www.nature.com/articles/s41597-022-01696-6#citeas.
Intergovernmental Panel on Climate Change. Climate Change: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. In Masson-Delmotte et al. (eds), Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA (2021). https://doi.org/10.1017/9781009157896
Ahmad, E. et al. The influence of grid connectivity, electricity pricing, policy-driven power incentives, and carbon emissions on renewable energy adoption: exploring key factors. Renew. Energy 232, 121108 (2024).
Article CAS Google Scholar
Ullah, S. & Lin, B. Green energy dynamics: analyzing the environmental impacts of renewable, hydro, and nuclear energy consumption in pakistan. Renew. Energy 232, 121025 (2024).
Article Google Scholar
Perone, G. The relationship between renewable energy production and co2 emissions in 27 oecd countries: a panel cointegration and granger non-causality approach. J. Clean. Prod. 434, 139655 (2024).
Article CAS Google Scholar
Zhao, C., Wang, J., Dong, K. & Wang, K. Is renewable energy technology innovation an excellent strategy for reducing climate risk? the case of china. Renew. Energy 223, 120042 (2024).
Article Google Scholar
Dechezleprêtre, A. et al. A comprehensive overview of the renewable energy industrial ecosystem. In Documents de Travail de l’OCDE sur la Science, la Technologie et l’industrie (2024).
Renewable energy statistics 2023, International Renewable Energy Agency, Abu Dhabi (2023). https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2023/Jul/IRENA_Renewable_energy_statistics_2023.pdf
Agoua, X. G., Girard, R. & Kariniotakis, G. Short-term spatio-temporal forecasting of photovoltaic power production. IEEE Trans. Sustain. Energy 9, 538–546. https://doi.org/10.1109/TSTE.2017.2747765 (2018).
Article ADS Google Scholar
Antonanzas, J. et al. Review of photovoltaic power forecasting. Sol. Energy 136, 78–111. https://doi.org/10.1016/j.solener.2016.06.069 (2016).
Article ADS Google Scholar
Phan, Q.-T., Wu, Y.-K. & Phan, Q.-D. Tsmixer: an innovative model for advanced bias correction of nwp solar irradiance and one-day-ahead power forecasting. In 2025 IEEE/IAS 61st Industrial and Commercial Power Systems Technical Conference (I&CPS) 1–6 (2025). https://doi.org/10.1109/ICPS64254.2025.11030368.
Phan, Q.-T., Wu, Y.-K., Phan, Q.-D. & Tan, W.-S. A novel dual-focused temporal-spatial model for day-ahead pv power forecasting. In 2025 IEEE Industry Applications Society Annual Meeting (IAS) 1–5 (2025). https://doi.org/10.1109/IAS62731.2025.11061657.
Diagne, M., David, M., Lauret, P., Boland, J. & Schmutz, N. Review of solar irradiance forecasting methods and a proposition for small-scale insular grids. Renew. Sustain. Energy Rev. 27, 65–76. https://doi.org/10.1016/j.rser.2013.06.042 (2013).
Article Google Scholar
Zheng, X., Bai, F., Zhuang, Z., Chen, Z. & Jin, T. A new demand response management strategy considering renewable energy prediction and filtering technology. Renew. Energy 211, 656–668 (2023).
Article Google Scholar
Huang, S., Zhao, Z., Li, C. & Bai, L. Timekan: Kan-based frequency decomposition learning architecture for long-term time series forecasting. arXiv preprint arXiv:2502.06910 (2025).
Liu, Y. et al. itransformer: inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625 (2023).
Zamo, M., Mestre, O., Arbogast, P. & Pannekoucke, O. A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production, part i: Deterministic forecast of hourly production. Sol. Energy 105, 792–803. https://doi.org/10.1016/j.solener.2013.12.006 (2014).
Article ADS Google Scholar
Markovics, D. & Mayer, M. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew. Sustain. Energy Rev. 161, 112364. https://doi.org/10.1016/j.rser.2022.112364 (2022).
Article Google Scholar
Abdellatif, A. et al. Forecasting photovoltaic power generation with a stacking ensemble model. Sustainability 14, 11083. https://doi.org/10.3390/su141811083 (2022).
Article CAS ADS Google Scholar
Wang, F. et al. A satellite image data based ultra-short-term solar pv power forecasting method considering cloud information from neighboring plant. Energy 238, 121946. https://doi.org/10.1016/j.energy.2021.121946 (2022).
Article Google Scholar
Zhou, H. et al. Short-term photovoltaic power forecasting based on long short term memory neural network and attention mechanism. IEEE Access 7, 78063–78074. https://doi.org/10.1109/ACCESS.2019.2923006 (2019).
Article Google Scholar
Mellit, A., Pavan, A. M. & Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 172, 276–288. https://doi.org/10.1016/j.renene.2021.02.166 (2021).
Article Google Scholar
Wang, Y., Liao, W. & Chang, Y. Gated recurrent unit network-based short-term photovoltaic forecasting. Energies 11, 2163. https://doi.org/10.3390/en11082163 (2018).
Article ADS Google Scholar
Hu, Z., Gao, Y., Ji, S., Mae, M. & Imaizumi, T. Improved multistep ahead photovoltaic power prediction model based on lstm and self-attention with weather forecast data. Appl. Energy 359, 122709. https://doi.org/10.1016/j.apenergy.2024.122709 (2024).
Article Google Scholar
Li, Q. et al. A multi-step ahead photovoltaic power forecasting model based on timegan, soft dtw-based k-medoids clustering, and a cnn-gru hybrid neural network. Energy Rep. 8, 10346–10362. https://doi.org/10.1016/j.egyr.2022.08.180 (2022).
Article Google Scholar
Fu, H., Zhang, J. & Xie, S. A novel improved variational mode decomposition-temporal convolutional network-gated recurrent unit with multi-head attention mechanism for enhanced photovoltaic power forecasting. Electronics 13, 1837. https://doi.org/10.3390/electronics13101837 (2024).
Article Google Scholar
Agga, A., Abbou, A., Labbadi, M., Houm, Y. E. & Ou Ali, I. H. Cnn-lstm: an efficient hybrid deep learning architecture for predicting short-term photovoltaic power production. Electric Power Syst. Res. 208, 107908. https://doi.org/10.1016/j.epsr.2022.107908 (2022).
Article Google Scholar
Limouni, T., Yaagoubi, R., Bouziane, K., Guissi, K. & Baali, E. H. Accurate one step and multistep forecasting of very short-term pv power using lstm-tcn model. Renew. Energy 205, 1010–1024. https://doi.org/10.1016/j.renene.2023.01.118 (2023).
Article Google Scholar
Alharkan, H., Habib, S. & Islam, M. Solar power prediction using dual stream cnn-lstm architecture. Sensors 23, 945. https://doi.org/10.3390/s23020945 (2023).
Article PubMed PubMed Central ADS Google Scholar
Zhou, H. et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325.
Wu, H., Xu, J., Wang, J. & Long, M. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, vol. 34 (eds. Ranzato, M. et al.) 22419–22430 (Curran Associates, Inc., 2021).
Zhou, T. et al. FEDformer: frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the 39th International Conference on Machine Learning , vol. 162 of Proceedings of Machine Learning Research (eds. Chaudhuri, K. et al.) 27268–27286 (PMLR, 2022).
Moon, J. A multi-step-ahead photovoltaic power forecasting approach using one-dimensional convolutional neural networks and transformer. Electronics 13, 2007. https://doi.org/10.3390/electronics13112007 (2024).
Article Google Scholar
Xu, S., Zhang, R., Ma, H., Ekanayake, C. & Cui, Y. On vision transformer for ultra-short-term forecasting of photovoltaic generation using sky images. Sol. Energy 267, 112203. https://doi.org/10.1016/j.solener.2023.112203 (2024).
Article Google Scholar
Zhai, C. et al. Photovoltaic power forecasting based on vmd-ssa-transformer: multidimensional analysis of dataset length, weather mutation and forecast accuracy. Energy 324, 135971. https://doi.org/10.1016/j.energy.2025.135971 (2025).
Article Google Scholar
Deng, R. et al. A high-precision photovoltaic power forecasting model leveraging low-fidelity data through decoupled informer with multi-moment guidance. Renew. Energy 250, 123391. https://doi.org/10.1016/j.renene.2025.123391 (2025).
Article Google Scholar
Hu, K. et al. Short-term photovoltaic power generation prediction based on copula function and cnn-cosattentiontransformer. Sustainability 16, 5940. https://doi.org/10.3390/su16145940 (2024).
Article ADS Google Scholar
Li, Y. et al. Interpretable deep learning framework for hourly solar radiation forecasting based on decomposing multi-scale variations. Appl. Energy 377, 124409 (2025).
Article Google Scholar
Somvanshi, S., Javed, S. A., Islam, M. M., Pandit, D. & Das, S. A survey on kolmogorov-arnold network. ACM Comput. Surv. 58, 1–35. https://doi.org/10.1145/3743128 (2025).
Article Google Scholar
Dutta, A. et al. The first two months of kolmogorov-arnold networks (kans): a survey of the state-of-the-art. Arch. Comput. Methods Eng. https://doi.org/10.1007/s11831-025-10328-2 (2025).
Article Google Scholar
Toscano, J. D., Wang, L.-L. & Karniadakis, G. E. Kkans: Kurkova-kolmogorov-arnold networks and their learning dynamics. Neural Netw. 191, 107831 (2025).
Article PubMed Google Scholar
Vaca-Rubio, C. J., Blanco, L., Pereira, R. & Caus, M. Kolmogorov-arnold networks (kans) for time series analysis. In 2024 IEEE Globecom Workshops (GC Wkshps) 1–6 (IEEE, 2024). https://doi.org/10.1109/gcwkshp64532.2024.11100692.
Xu, K., Chen, L. & Wang, S. Kolmogorov-arnold networks for time series: bridging predictive power and interpretability (2024). arXiv:2406.02496.
Koenig, B. C., Kim, S. & Deng, S. Kan-odes: Kolmogorov–arnold network ordinary differential equations for learning dynamical systems and hidden physics. Comput. Methods Appl. Mech. Eng. 432, 117397. https://doi.org/10.1016/j.cma.2024.117397 (2024).
Article MathSciNet Google Scholar
Download references
This research was funded by the National Natural Science Foundation of China, grant number 51967004.
This research was funded by the National Natural Science Foundation of China, grant number 51967004.
College of Electrical Engineering, Guizhou University, Guiyang, China
Linjie Liu, Min Liu, Zhuangchou Han & HaiQiang Zhao
North Alabama International College of Engineering and Technology, Guizhou University, Guiyang, China
Min Liu
Guizhou Provincial Key Laboratory of Power System Intelligent Technologies, Guiyang, China
Min Liu
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
L.L. (first author) conceived the research idea, developed the MKAN-iTransformer model, implemented all experiments, and wrote the initial draft of the manuscript. M.L. (corresponding author) supervised the entire research process, provided key guidance on model design and result analysis, and substantially revised the manuscript. Z.H. and H.Z. contributed to data preprocessing, experimental support, and manuscript review. All authors have read and approved the final version of the manuscript.
Correspondence to Min Liu.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Liu, L., Liu, M., Han, Z. et al. Interpretable ultra-short-term photovoltaic power forecasting with multi-scale temporal modeling and variable-wise attention. Sci Rep 16, 10336 (2026). https://doi.org/10.1038/s41598-026-39797-6
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-39797-6
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.