Deep reinforcement learning-based controller for DC-link voltage regulation and voltage sag compensation in a solar PV-integrated UPQC system – Nature

Posted on March 1, 2026 by Now.Solar

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 15, Article number: 25800 (2025) Cite this article
2726 Accesses
7 Citations
Metrics details
Power quality problems including dynamic load variations, harmonic distortion, and voltage sags are significant when renewable energy sources, such solar photovoltaic (PV) systems, are integrated into modern distribution networks. These issues are often mitigated by devices known as Unified Power Quality Conditioners (UPQCs), which control both series and shunt power disturbances. However, traditional control systems, such as PQ theory-based PI controllers, usually fail to sustain good performance under quickly changing solar and grid conditions. The proposed work a Deep Reinforcement Learning (DRL) with PI Controller for DC-link voltage regulation and voltage sag correction in a solar PV integrated UPQC system. The DRL model can automatically adapt to changes in voltage and current in real time by learning the most effective compensating strategies. It has a unique reward system that gives priority to low total harmonic distortion (THD), quick voltage recovery, and low power losses. Variable temperature and irradiance conditions are used to model the solar PV system, which is then connected to the grid through the UPQC to provide both linear and non-linear loads. Maintaining DC-link voltage, minimising voltage sags, and guaranteeing clean power delivery are all improved by the proposed DRL-based control framework as compared to the conventional PQ theory-based technique. For a grid-integrated PV-UPQC system, the suggested DRL-PI controller greatly minimises power quality problems, attaining voltage THD of 1.01% and current THD of 1.63% as opposed to 3.13% and 10.64% with PQ-PI control. As compared to 0.95 s (PQ-PI), the DC-link settling time is significantly reduced to 0.25 s. The dynamic and harmonic performance of these results is superior to that of the traditional PQ-PI controller.
Grid-connected photovoltaic (PV) systems are becoming more and more popular in contemporary power networks because of its affordability and sustainability. However, their sporadic nature, which is impacted by abrupt changes in temperature and irradiance, frequently results in unstable voltage and subpar power quality (PQ)¹. The efficiency and dependability of grid-integrated PV systems are directly impacted by some of the most significant issues, including DC-link voltage variations and voltage sag during load transients. The Unified Power Quality Conditioner (UPQC) has become an exhaustive solution to these PQ problems². The UPQC may concurrently reduce voltage sags/swells and current harmonics by combining both series and shunt voltage source converters (VSCs). This improves the voltage profile and preserves DC-link stability. Because of their offline tuning and set control parameters, traditional control techniques like PI, fuzzy logic, and ANFIS have demonstrated low adaptability in highly dynamic grid situations³. Photovoltaic (PV) systems that are connected to the grid are extremely sensitive to changes in their surroundings, especially variations in temperature and solar irradiation. Voltage sags and DC-link voltage instability are commonly caused by these oscillations, endangering the power supply’s continuity and quality⁴. In these situations, it is essential to guarantee transient stability, which is the system’s capacity to continue operating synchronously and swiftly recover from disruptions (such as load switching or grid failures). The Unified Power Quality Conditioner (UPQC) offers a dual compensation mechanism through its series and shunt voltage source converters (VSCs), making it highly effective in increasing system stability and power quality. However, the effectiveness of the UPQC’s control strategy has a significant impact on its performance. Historically, DC-link voltage regulation and harmonic component compensation have been handled by proportional-integral (PI) controllers. Step response analysis and transient performance metrics like settling time, rising time, and % overshoot are used to properly adjust the PI controller gains in the suggested system⁵. In order to reduce steady-state error and guarantee quick dynamic response under changing load and grid conditions, the constants are adjusted by iterative simulation. The suggested system incorporates a Deep Reinforcement Learning (DRL)-based control technique with the PI controller in order to get over the drawbacks of static PI tuning⁶. DRL continually modifies the control action in real time, improving transient response and resilience, while PI guarantees reliable steady-state performance.
Complex power electronic systems benefit greatly from the real-time learning and adaptive control capabilities provided by recent developments in Deep Reinforcement Learning (DRL). By combining DRL with traditional PI control, a hybrid controller is created that guarantees PV-UPQC systems’ DC-link voltage regulation and voltage sag compensation are both robust and flexible. Through performance evaluation in terms of THD, settling time, and dynamic responsiveness under various operating circumstances, this research presents a DRL-PI-based control framework and shows its superiority over conventional techniques.
Voltage sag compensation and DC-link voltage control have been the subject of numerous studies employing a variety of clever strategies⁷. Despite being easy to construct, conventional PI controllers are difficult to adjust and perform poorly in dynamic environments⁸. Although sophisticated methods like Sliding Mode Controllers (SMC)⁹ and Fuzzy Logic Controllers (FLC) have demonstrated better dynamic responses, real-time deployment in large-scale systems is limited by their tuning difficulties and design complexity. For the best current control in UPQC systems, Model Predictive Control (MPC) has also been presented¹⁰, although it requires precise models and a lot of processing power. The use of intelligent control techniques for power electronic systems has been stimulated by recent developments in artificial intelligence and machine learning. Deep Neural Networks (DNNs) have been used to help regulate voltage and approximate nonlinear system behaviours¹¹.
However, their absence of real-time decision-making and dependence on sizable labelled datasets limit their ability to adjust to invisible disruptions. Because it may learn optimal control policies through interaction with the environment, Reinforcement Learning (RL), a model-free learning paradigm, has become popular¹². RL is used in power systems for frequency regulation, inverter control, and voltage control¹³. It has been demonstrated that Deep Reinforcement Learning (DRL), which combines neural networks and reinforcement learning, can successfully handle difficult control tasks and high-dimensional inputs¹⁴. DRL techniques, including Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG), have been used in a number of studies for intelligent fault detection¹³, adaptive control in inverters, and microgrid energy management¹⁵. However, the direct application of RL or DRL in UPQC systems has not been thoroughly investigated, mainly because of reward shaping complexity, safety concerns, and slower convergence in real-time settings. To close this gap, hybrid control techniques that combine reinforcement learning and supervised learning (such as DNN) are becoming more popular. These models make use of RL’s adaptive policy learning and DNNs’ quick approximation capabilities. Grid-forming converters¹⁶, energy storage systems¹⁷, and smart inverter voltage control¹⁸ have all used these hybrid techniques. Neural network-based adaptive control for DC-link voltage stabilisation has been proposed in some recent research in the context of UPQC for PV-integrated grids¹⁹, but their effectiveness under sag situations and transient grid failures is still limited. Additionally, current DRL-based techniques prioritise energy scheduling above DC-link voltage regulation and real-time sag correction²⁰.Table 1 represents recent literature analysis in grid integrated PV-UPQC system.
The design of a deep reinforcement learning-based controller specifically for UPQC systems, with the goal of dynamically regulating the DC-link voltage and offering quick and efficient voltage sag compensation in the face of variable solar irradiance and grid disturbance scenarios, is the gap in the literature that drives this study.
The sporadic nature of solar irradiation causes regular variations in the power supplied to the grid in solar PV-integrated distribution systems, making it difficult to maintain the intended DC-link voltage. Additionally, power quality can be severely deteriorated by grid disruptions like fault-induced voltage sags or abrupt load variations, which can affect sensitive equipment and system performance as a whole. Fuzzy Logic Controllers (FLC) and other traditional control techniques, such as PI, are often optimised operating conditions and their effectiveness declines when nonlinearities and uncertainties are present. In order to overcome these constraints, this research presents a hybrid learning-based control strategy that blends the adaptive learning qualities of reinforcement learning with the state estimation and generalisation powers of deep neural networks. In addition to adapting to time-varying input patterns from PV generation and load, the suggested Deep Reinforcement Learning (DRL) controller optimises control actions to reduce voltage deviation and enhance transient response.
A hybrid solution that improves power quality and guarantees dependable grid operation is presented by the combination of solar photovoltaic (SPV) systems and unified power quality conditioners (UPQC). In addition to adding renewable energy to the grid, a grid-integrated SPV-UPQC system depicted in Fig. 1 corrects for reactive power, sag/swell, harmonics, and distortions in voltage and current. An innovative hybrid system that addresses both renewable power injection and power quality issues is created by combining Solar Photovoltaic (SPV) systems with a Unified Power Quality Conditioner (UPQC). The correct control of the DC link voltage, which guarantees efficient compensation by the series and shunt Voltage Source Converters (VSCs), is one of the most important issues in such a system. Under dynamic and unpredictable grid conditions, traditional proportional-integral (PI) controllers frequently encounter problems. This is addressed by introducing a controller that uses Deep Reinforcement Learning (DRL) to intelligently modulate and stabilise the DC link voltage.
The coordination of modulation indices for the shunt and series Voltage Source Converters (VSCs) plays a critical role in achieving decoupled voltage and current compensation. Specifically, the series VSC mitigates supply voltage disturbances such as sags and swells, while the shunt VSC injects compensating currents to address load current harmonics and reactive power. This coordinated control ensures stable DC link voltage and maintains power quality under dynamic grid and load conditions and maximum power point tracking (MPPT) are all integrated into the system. The SPV system is connected via a boost converter, whose output is controlled by a PI-based boost controller under the direction of a P&O/MPPT controller to maximise the solar array’s power production. The DC link capacitor receives the boost converter output and acts as a shared energy buffer for the shunt and series active filters. At the Point of Common Coupling (PCC), the shunt VSC is in charge of addressing current-related power quality problems such reactive power compensation, harmonics, and imbalance. In the meantime, the series VSC adjusts for disturbances such as sags and swells and guarantees voltage regulation. The DRL controllers are used to adjust the DC link voltage and improve the UPQC system’s flexibility in the face of changing load and grid conditions.Table 2 represents the system specifications.
Mathematical design for Grid connected SPV-UPQC.
The PV system under consideration is rated for 50 kW, in order to achieve the required power and voltage levels for grid integration, the PV subsystem design is set up utilising a combination of series and parallel connected modules. The system is made up of 33 parallel strings, each of which has five PV modules connected in series. A string voltage of 273.5 V and a maximum power output of around 305.2 W per string are the results of each module’s rating at a maximum power point (MPP) voltage of 54.7 V and an MPP current of 5.58 A. Under standard test conditions (STC), the array achieves a total peak output of about 50 kW. In addition to providing enough energy to sustain the Unified Power Quality Conditioner (UPQC) functioning during grid disruptions, this arrangement guarantees optimal compatibility with the DC link voltage level (800 V). The PV subsystem design is made transparent and repeatable for upcoming studies and deployment by clearly describing the electrical characteristics and series-parallel configuration. Its main components are a DC-DC boost converter and a PV generator. Power generated by the PV system is fed into the grid via the UPQC’s DC-link as depicted in Fig. 2. The design and rating of the PV modules is a crucial component of this configuration. Usually, these solar panels are set up in parallel to generate the requisite current (I_pv) and in series to get the appropriate voltage (V_pv). The PV system’s output voltage might fluctuate depending on external factors like temperature and solar radiation. A DC-DC boost converter is used to maximise the power output from the PV panels and is controlled by the maximum power point tracking (MPPT) algorithm. The primary and secondary stages of the PV system within the PV-UPQC framework are comprised of this converter and a grid-tied inverter, which enable the injection of clean energy into the grid. The PV module’s (I_pv) produced current is specified as
where the numbers of parallel and series cells are denoted by Np and Ns, respectively. I_sc, or short-circuit current, is provided as
In this case, G and G_p the real and STC solar irradiance levels, T_r is the reference temperature, and _IscT is the short-circuit current under standard test conditions (STC). The model for the reverse saturation current is
Where E_g is the bandgap energy (usually 1.1 eV), η is the diode’s ideality factor, kb is the Boltzmann constant, and I_rsT is the reverse saturation current at STC. These formulas serve as the foundation for comprehending the electrical behaviour of the PV modules in the PV-UPQC system, emphasising how their configuration and environmental factors affect their performance.
PV system design configuration.
I-V and P-V characteristics for PV system.
The performance parameters of a photovoltaic array (Sun Power SPR-305E-WHT-D, set up as 33 parallel strings and 5 series modules) under various temperature and irradiation circumstances are shown in Fig. 3. The I–V and P–V characteristics for various irradiance levels, spanning from 0.5 kW/m² to 1 kW/m² With a peak power output of roughly 6 kW per string at 1 kW/m2, current and output power rise with irradiance as predicted. A partial shading scenario was included to the simulation in order to assess the suggested system’s resilience in real-world situations. In this instance, the power-voltage (P-V) characteristic showed several peaks as a result of distinct PV array portions being exposed to varied levels of irradiance. Under such circumstances, it was found that the traditional Perturb and Observe (P&O) approach, which works with local perturbations, converges more towards a local maximum than the global maximum. This behaviour results from the algorithm’s incapacity to distinguish between global and local peaks when shading is present. Even though P&O works effectively in homogeneous settings, partial shading reduces its tracking precision, which results in less than ideal power extraction. These results point to the necessity of more sophisticated MPPT algorithms with global optimisation capabilities, which is thought to be a path for system improvement in the future.
Graphical representation of P&O algorithm.
The Perturb and Observe (P&O) algorithm, which is used for Maximum Power Point Tracking (MPPT) in solar PV systems, is demonstrated in Fig. 4. The algorithm modifies the PV array’s operating voltage and tracks how the output power changes as a result. The algorithm proceeds in the same direction if the power increases after the perturbation; if not, it reverses the direction of the perturbation. The operating point might approach the Maximum Power Point (MPP) on the P-V curve due to this iterative procedure. The method dynamically modifies the duty cycle of the DC-DC converter to extract maximum power, as shown by the graphical flow, which illustrates the decision-making logic based on the signs of ΔP and ΔV.
Figure 5 shows the performance of a solar photovoltaic system running at constant temperatures (25 °C and 1000 W/m² of irradiance). About 50.3 kW of power is produced, which is in line with the system’s nominal rating, when the PV voltage stabilises at 273.5 V and the current stays steady at 184 A. In accordance with anticipated boost operation, the boost converter’s duty cycle is set at 0.65, resulting in a DC output voltage of about 800 V. The fact that all waveforms are flat validates the proper application of the model under optimal, fixed input conditions by demonstrating that the system maintains a stable operating point free from dynamic changes.
Output performance of PV system at constant irradiance and temperature condition.
An advanced machine-learning paradigm called Deep Reinforcement Learning (DRL) combines the function approximation power of deep learning (DL) with the decision-making skills of reinforcement learning (RL). By interacting with an environment in which it gets observations (states), takes actions, and receives feedback in the form of rewards, an agent in DRL learns to make sequential judgements. The agent’s objective is to gradually learn the best course of action that maximises the projected cumulative payoff. Deep neural networks are used in DRL to approximate complex functions like value functions, policies, or both because traditional RL techniques are ineffective in contexts with high-dimensional state or action spaces. Smart grid control, autonomous systems, robotics, and other fields have successfully used DRL algorithms such Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimisation (PPO), and Soft Actor-Critic (SAC). DRL is a viable option for real-time control applications in power systems, such as energy management, power quality improvement, and voltage adjustment. DRL is a particularly effective tool for contemporary power electronics and smart grid applications because it can adjust to uncertainties in load demand, solar generation, and grid disturbances by continuously learning from the dynamic behaviour of systems like as grid-integrated PV-UPQC setups. Through constant contact with their surroundings, agents can learn optimal control methods thanks to Deep Reinforcement Learning (DRL), which blends the function approximation capabilities of deep neural networks with the decision-making framework of reinforcement learning. Although DRL has been widely used in robots and autonomous systems, its promise in power electronics is becoming more widely acknowledged, especially for complex, nonlinear, and time-varying grid-integrated systems. DRL provides a number of benefits over traditional control methods in the context of PV-integrated Unified Power Quality Conditioner (UPQC) systems. Conventional controllers, such PI or rule-based logic, frequently have fixed tuning parameters and are not very flexible in unpredictable or quickly changing situations, including variable solar irradiance, nonlinear loads, and voltage sags or swells. DRL-based control frameworks, on the other hand, have the ability to dynamically learn and improve control policies that maximise system performance across a variety of goals, including real-time power balance, harmonic mitigation, and voltage regulation.
In particular, DRL can be used to modify the voltage source converters’ (VSCs’) DC-link reference voltage or modulation indices in response to real-time changes in load or grid circumstances, enhancing the UPQC system’s resilience and responsiveness. Furthermore, DRL can function well even in situations that aren’t specifically experienced during training thanks to its ability to generalise policies. DRL has been shown in recent research to improve transient performance, reduce total harmonic distortion (THD), and increase resistance to disturbances in grid-connected converters, active power filters, and renewable energy integration. The suggested study makes use of DRL’s adaptive control capabilities to improve PV-UPQC system performance in dynamic and unpredictable grid situations by placing it within this application space.
Series Controller structure based on DRL.
As illustrated in Fig. 6, the DRL model uses the source voltage (V_Sabc) and load voltage (V_Labc) as inputs to regulate the series compensator in the UPQC system. This configuration is intended to maximise the efficacy of the series compensator in reducing voltage-related problems in a power distribution system, such as sags, swells, and harmonics. The state space (S), which represents the electrical environment’s current condition, is continuously observed by the DRL system. In this case, the three-phase source and load voltages make up the state space, which is expressed as,
The DRL agent chooses the optimal course of action from its action space (A) based on this observed state; in this case, that involves modifying the injected voltage of the compensator for each phase. These modifications are presented as
The reference injected voltages for the series converter are output by the DRL policy:
A PWM generator receives these references and modulates the VSI (Voltage Source Inverter) for every phase. The incentive reduces total harmonic distortion (THD) and promotes voltage regulation:
Value function (:{Q}_{varphi:}({S}_{t},{A}_{t}))used for evaluating the action and improving the policy. By using learning, update for policy gradient
For actor –critic (DDPG/SAC)
Output voltage injection represents: (:{V}_{inj,i}={V}_{sei}^{*})
DRL-PI based shunt converter design.
The DC-link voltage has historically been controlled by PI controllers. PI controllers, however, are susceptible to changes in parameters and non-linearities in the system. In order to overcome this constraint, we suggest a controller based on Deep Reinforcement Learning (DRL), which provides resilient and adaptable control by continuously interacting with the environment to learn the best course of action. Let the shunt converter be interfaced with the grid via an inductive filter L_sh and a small resistance R_sh. The shunt converter injects current i_sh into the system to maintain a sinusoidal source current and regulate the DC-link voltage V_dc and reference voltage V^*_dc as depicted in Fig. 7. The current equation represents as
Voltage across shunt inductor
The power exchanged at dc link is
A stable DC-link voltage is critical for the proper functioning of both converters.To determine the best control strategy for the shunt converter, the DRL controller employs a policy-based algorithm (such as DDPG or PPO). It uses a self-trained neural strategy in place of the conventional PI control loop.
PWM creates these reference voltages in order to power the shunt converter.
Control structure of Deep Reinforcement learning.
Figure 8 illustrates the control structure of the proposed Deep Reinforcement Learning (DRL) framework used for real-time control in a PV-integrated UPQC system. The agent at the center of the architecture is implemented using a deep neural network, which learns the optimal control policy by interacting with the environment comprising the power system, load demand, voltage levels, and real-time grid parameters.
The input to the agent includes current state variables such as PV voltage, current, and historical variations (ΔV, ΔI). These form the state space. The agent uses this state information to predict the best possible action, i.e., the duty cycle or modulation index, which forms the action space. The internal neural network of the agent contains multiple layers:
Hidden Layers: Three hidden layers with 64, 128, and 64 neurons are used.
Activation Function: Each hidden layer uses the ReLU (Rectified Linear Unit) activation function, which provides efficient gradient propagation and sparsity.
Output Layer: For the actor network, the output layer uses a tanh activation function to generate a normalized continuous action (e.g., duty cycle in the range [-1, 1]). For the critic network, a linear activation is used to output the Q-value (expected future reward). The weights of the neural network are initialized using the He initialization method, which is suitable for ReLU activations.
During training, the agent receives a reward based on the system’s response—such as reduction in THD, quick settling of DC-link voltage, or sag compensation. This feedback is used by the learning algorithm, typically a DDPG method, to update the weights using backpropagation and the Adam optimizer.This continuous interaction and learning process enables the DRL agent to improve performance over time, ensuring robust control during highly dynamic grid conditions and transient disturbances.
Learning process for DRL-PI.
An output layer creates voltage references or control pulses for the voltage source inverters (VSIs), hidden layers extract features and decide on the best control policies, and an input layer receives the state. To control the series or shunt converters, PWM controllers apply these actions, which make up the action space. The system provides the agent with feedback in the form of a reward signal, which is determined by the DC-link voltage variation or the level of voltage sag mitigation. With the help of this incentive, the learning algorithm is updated, allowing the agent to gradually enhance its control policy. Even in the face of changing load or generation conditions, adaptive and intelligent compensation is guaranteed by the closed-loop interaction between the agent and the power system. The Fig. 9 represents a hybrid control approach that combines a traditional proportional-integral (PI) controller with Deep Reinforcement Learning (DRL) to regulate DC-link voltage and compensate for voltage sag in a PV-integrated Unified Power Quality Conditioner (UPQC) system. The environment, including system factors including grid voltage, load circumstances, and PV output, is continuously monitored by the DRL agent. The PI controller maintains precise and steady voltage regulation, while the DRL creates control actions to maximise system performance based on this state data. By combining the flexibility of DRL with the dependability of PI, this coordinated control technique guarantees better dynamic response, lower overall harmonic distortion, and increased voltage stability under a range of load and grid situations.
Design ratings of series and shunt converter in UPQC:
Where I_SE is the per phase current rating of the series VSC. P_PV^MPP is the PV power at MPP (50 kW). P_min load is the minimum load demand (minimum load considered 20 kVA 0.8 P.F). V_r is the rated line voltage of the system (415 V). P_series is the active power rating of series VSC, VSE is the per phase rms voltage injected by the series VSC. b is the maximum amount of voltage sag per phase (b = 0.5 p.u).
The rating of shunt VSC depends upon the PV-generated power at MPP, reactive power, and harmonic power to be compensated corresponding to peak load. In addition, it has to handle the power injected by the series VSC during voltage sag condition as the series VSC power flows through the shunt VSC.
Where Q_load is the maximum reactive power requirement of the load (as maximum load of 35 kVA 0.8 P.F is considered) and H_harmonics is the maximum harmonic power corresponding to maximum load.
A complete simulation of a grid-connected PV-integrated Unified Power Quality Conditioner (UPQC) system was conducted in MATLAB/Simulink in order to assess the effectiveness of the suggested Deep Reinforcement Learning (DRL)-based PI control technique. Maintaining DC-link voltage stability and compensating for voltage sags when both linear and non-linear loads were present was the goal. Under a variety of power quality disturbance scenarios, such as abrupt load fluctuations, voltage sag conditions, and harmonic disturbances, the DRL-based controller’s performance is contrasted with that of a traditional PQ theory based proportional-integral (PI) controller. A 50 kW solar PV system linked to a 415 V, 50 Hz grid was used for the simulation, which was carried out under conventional test settings. Analysis was done on key performance metrics as Total Harmonic Distortion (THD), grid voltage waveform quality, DC-link voltage stability, and voltage sag correction capabilities. The results demonstrate the efficacy of the suggested control method and are displayed both graphically and numerically.
PV-UPQC behaviour under voltage sag and swell conditions using conventional PQ theory-PI.
PV-UPQC behaviour under voltage sag and swell conditions using proposed DRL-PI.
The PV-UPQC system’s dynamic behaviour under voltage swell and sag disturbances using a traditional PI controller based on PQ Theory is depicted in Fig. 10. Between 0.2 and 0.4 s, the grid voltage rises, and between 0.5 and 0.7 s, it falls. The load voltage is successfully controlled in spite of these disruptions, proving the UPQC’s ability to preserve power quality. In order to combat both voltage sag and swell, the series converter’s proper compensation operations are confirmed by the injected voltage profiles. Small oscillations and brief variations in the injected voltage suggest areas where sophisticated control techniques can improve performance. Figure 11 illustrates how effectively the PV-integrated UPQC system performs under voltage swell and sag disturbances while using the suggested Deep Reinforcement Learning-tuned PI (DRL-PI) controller. As can be seen, the grid voltage experiences a voltage swell between 0.2 and 0.4 s and a voltage sag between 0.5 and 0.7 s. The load voltage is efficiently controlled in spite of these notable disruptions, preserving a sinusoidal waveform and almost constant amplitude throughout all three phases. This demonstrates the DRL-PI-based controller’s improved dynamic response and increased capacity to reject disturbances.
%THD analysis for Voltage (a) without controller (b) with PQ theory–PI (c) with DRL-PI.
The analysis of the Total Harmonic Distortion (THD) in three different scenarios is shown in Fig. 12 (a) no controller, (b) a PQ-PI controller, and (c) a DRL-PI controller. Poor power quality is indicated by the system’s high THD (14.52%) when it is not controlled. THD is reduced to 3.13% with the introduction of the PQ-PI controller, demonstrating enhanced performance. The DRL-PI controller exhibits greater harmonic suppression and improved power quality through intelligent control, achieving the greatest result with only 1.01% THD.
Behaviour of currents in nonlinear load condition based on PQ theory PI.
Behaviour of currents in nonlinear load condition based on DRL- PI.
With compensation carried out using various control schemes, the Figs. 13 and 14 show the present behaviour under nonlinear load situations. The PQ-PI controller successfully lowers distortions using PQ based PI controller, producing better but marginally flawed sinusoidal source currents. The shunt VSC based on DRL-PI, displays source currents that are even cleaner and more balanced following compensation. Better harmonic suppression and improved dynamic performance result from the more precisely tuned compensating currents in the DRL-PI case. The DRL-PI controller performs better than the traditional PQ-PI controller in preserving power quality under nonlinear load conditions.
%THD analysis for current (a) without controller (b) with PQ theory–PI (c) with DRL-PI.
The Total Harmonic Distortion (THD) analysis of current under three distinct control scenarios is shown in Fig. 15. The current spectrum shows a high THD of 23.87% in the uncontrolled scenario, showing considerable harmonic distortion brought on by nonlinear loads. The THD is lowered to 10.64% with implementation of a traditional PQ theory-based PI controller, indicating moderate harmonic correction. The THD is reduced to 1.63% with the suggested DRL-PI hybrid controller, demonstrating excellent harmonic suppression and a current waveform that is almost sinusoidal. When compared to conventional control techniques, these data demonstrate how well the DRL-PI strategy improves power quality by drastically lowering harmonic content.
DC-link voltage performance for grid connected PV-UPQC.
Figure 16 depicts the DC Link Voltage Regulation Performance of a PV-UPQC system in three different control scenarios: no controller, PQ Theory + PI, and the suggested DRL-PI controller. Instability is indicated by the DC voltage’s extreme fluctuations between 720 V and 860 V in the absence of control. The voltage stabilises at about 800 V with less oscillation when using the PQ Theory + PI controller, however there are still discernible variations. On the other hand, the suggested DRL-PI controller exhibits better dynamic response and voltage regulation performance by keeping the voltage extremely near to the reference value (800 V) with little variation and quicker settling.
Performance of active and reactive power in PV-UPQC.
In a grid-integrated PV-UPQC system, Fig. 17 compares the active and reactive power responses for two control strategies: Deep Reinforcement Learning with PI control (DRL-PI) and PQ Theory with PI control (PQ-PI). Significant overshoot and oscillations are shown in the PQ-PI active power curve before it stabilises at about 10 kW, suggesting worse damping and slower dynamic performance. On the other hand, the active power under DRL-PI (green) exhibits superior control efficiency as it rapidly settles to the target value with less overshoot and a quicker transient reaction. PQ-PI exhibits slight variations in reactive power, but DRL-PI consistently maintains almost zero reactive power, indicating superior reactive power compensation. In comparison to the traditional PQ-PI method, the DRL-PI controller shows more reliable and effective power regulation overall.
(a) Power factor comparison, (b) in phase relation between voltage and current.
A comparison of the power factors of PQ-PI and DRL-PI controllers is shown in Fig. 18a. The PQ-PI controller takes more than two seconds to obtain a near-unity power factor, but the DRL-PI controller does so in less than 0.3 s. This illustrates how well the DRL-PI controller can swiftly match voltage and current, guaranteeing effective energy transfer and low reactive power. The voltage and current waveforms under unity power factor circumstances are displayed in Fig. 18b. Since both waveforms are in phase and completely sinusoidal, reactive power-free operation is confirmed. When combined, these graphs demonstrate how well the DRL-PI controller works to swiftly and precisely reach and maintain unity power factor, which enhances system stability and energy efficiency.
Power balance in situations where PV electricity generation exceeds load.
The power balance between PV, load, and grid is achieved by the shunt controller, as shown in Fig. 19. The grid is receiving power assistance from the PV array, which supports the grid and lessens its load by feeding the load, as indicated by the grid power’s negative number.Table 3 represents the %Total Harmonic Distortion (THD) for voltages and currents. The graphical representation of THD analysis for voltage and current for conventional PQ theory-PI and DRL-PI as depicted in Fig. 20.
Graphical representation of %THD analysis.
For DC-link voltage regulation and voltage sag compensation in a solar PV-integrated Unified Power Quality Conditioner (UPQC) system, this work introduces a Deep Reinforcement Learning (DRL)-based PI controller. Analysis of Total Harmonic Distortion (THD) shows that the suggested DRL-PI method significantly improves THD. In particular, the percentage THD for voltage reduced from 14.52% (uncontrolled) and 3.13% (PQ-PI) to just 1.01%, while the percentage THD for current decreased from 23.87% (uncontrolled) and 10.64% (PQ-PI) to 1.63%. This demonstrates that the DRL controller can successfully improve power quality and reduce harmonic pollution. The DC voltage fluctuates unpredictably between 720 V and 860 V when there is no control. Although there are visible fluctuations, the PQ Theory + PI controller stabilises the voltage closer to the 800 V reference. The suggested DRL-PI controller, on the other hand, exhibits more flexibility and resilience under dynamic grid and load situations by maintaining the DC-link voltage precisely around the reference (800 V) with little fluctuation and a quicker dynamic response. In the final analysis, the DRL-PI controller is a very smart and efficient option for contemporary PV-integrated UPQC systems since it not only guarantees better harmonic suppression and voltage regulation but also improves system stability, response time, and overall power quality.
The data used to support the findings of this study are included in the article.
Grid side three-phase voltages
Load side three-phase voltages
Load side current components
DC link capacitor voltage
Reference voltage for DC link regulation
Solar photovoltaic array
Maximum power point tracking
Proportional-integral
Deep reinforcement learning
Pulse width modulation controller generating gate signals
Voltage source inverter for series compensation
Voltage source inverter for shunt compensation
Reactive power theory used for reference generation
Energy storage element between series and shunt converters
Switching pulses to converters (from PWM unit)
Output power from PV array at MPP
Real power supplied to grid
Voltage disturbance events (sag/swell)
Total harmonic distortion
Phase-locked loop for grid synchronization
Han, B., Bae, B., Baek, S. & Jang, G. New configuration of UPQC for medium-voltage application. IEEE Trans. Power Deliv. 21 (3), 1438–1444. https://doi.org/10.1109/TPWRD.2005.860235 (2006).
Mahar, H. et al. Implementation of ANN controller based UPQC integrated with microgrid. Mathematics 10 , 1989 (2022). https://doi.org/10.3390/math10121989
Hu, D. et al. Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization. IEEE Trans. Smart Grid 13(6), 4873–4886 https://doi.org/10.1109/TSG.2022.3185975 (2022).
Barbalho, P.I.N., Lacerda, V.A., Fernandes, R.A.S. & Coury, D.V. Deep reinforcement learning-based secondary control for microgrids in islanded mode. Electr. Power Syst. Res. 212, 108315. https://doi.org/10.1016/j.epsr.2022.108315 (2022). ISSN 0378–7796.
Lilia Tightiz, L. et al. Novel deep deterministic policy gradient technique for automated micro-grid energy management in rural and islanded areas. Alex. Eng. J. 82, 145–153. ISSN 1110-168. https://doi.org/10.1016/j.aej.2023.09.066 (2023).
Rajamallaiah, A., Karri, S. P. K., Alghaythi, M. L. & Alshammari, M. S. Deep reinforcement learning based control of a grid connected inverter with LCL-filter for renewable solar applications. IEEE Access 12, 22278–22295. https://doi.org/10.1109/ACCESS.2024.3364058 (2024).
Kumar, K., Kwon, S. & Bae, S. Deep reinforcement learning-based control strategy for integration of a hybrid energy storage system in microgrids. J. Energy Storage 108. 114936, 2352–2152. https://doi.org/10.1016/j.est.2024.114936 (2025). X.
Article Google Scholar
Sofla, M. A. & Gharehpetian, G. B. Dynamic performance enhancement of microgrids by advanced sliding mode controller. Int. J. Electr. Power Energy Syst. 33 (Issue 1), 1–7. ISSN 0142-0615. https://doi.org/10.1016/j.ijepes.2010.08.011 (2011).
Bindu, A. et al. Reinventing power quality enhancement: deep reinforcement learning control for PV-UPQC in microgrids. Electr. Eng. 107, 4979–4999. https://doi.org/10.1007/s00202-024-02778-x (2025).
Article Google Scholar
Yang, D., Ma, Z., Gao, X., Ma, Z. & Cui, E. Control strategy of intergrated Photovoltaic-UPQC system for DC-Bus voltage stability and voltage Sags compensation. Energies 12, 4009. https://doi.org/10.3390/en12204009 (2019).
Article Google Scholar
Devassy, S. & Singh, B. Design and performance analysis of threephase solar PV integrated UPQC. IEEE Trans. Ind. Appl. 54 (1), 73–81 (2018). Jan.-Feb.
Article Google Scholar
Mangalapuri Sravani, Polamraju, V. S. & Sobhan Performance evaluation of solar PV integrated with custom power device under various load conditions. e-Prime – Adv. Electr. Eng. Electron. Energy 10,100843. ISSN 2772–6711. https://doi.org/10.1016/j.prime.2024.100843 (2024).
Nishanthi, B. & Kanakaraj, J. Enactment of deep reinforcement learning control for power management and enhancement of voltage regulation in a DC Micro-Grid system. Electr. Power Compon. Syst. 52 (4), 555–565. https://doi.org/10.1080/15325008.2023.2227200 (2023).
Article Google Scholar
Ashok Babu, P. et al. G., Power control and optimization for power loss reduction using deep learning in microgrid systems. Electr. Power Compon. Syst. 52(2), 219–232. https://doi.org/10.1080/15325008.2023.2217175 ( 2023).
Jung, Y., Han, C., Lee, D., Song, S. & Jang, G. Adaptive Volt–Var control in smart PV inverter for mitigating voltage unbalance at PCC using multiagent deep reinforcement learning. Appl. Sci. 11, 8979. https://doi.org/10.3390/app11198979 (2021).
Article CAS Google Scholar
Helou, R. E., Kalathil, D. & Xie, L. Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power. In 2022 IEEE Power & Energy Society General Meeting (PESGM), Denver, CO, USA. 1–1. https://doi.org/10.1109/PESGM48719.2022.9917150 (2022).
Pei, Y., Yao, Y., Zhao, J., Ding, F. & Wang, J. Two-stage deep reinforcement learning for distribution system voltage regulation and peak demand management. In 2023 IEEE Power & Energy Society General Meeting (PESGM), Orlando, FL, USA. 1–5. https://doi.org/10.1109/PESGM52003.2023.10252567 (2023).
Duan, J. et al. Deep-reinforcement-learning-based autonomous voltage control for power grid operations. IEEE Trans. Power Syst. 35 (1), 814–817. https://doi.org/10.1109/TPWRS.2019.2941134 (2020).
Ratnakaran, R., Rajagopalan, G. B. & Fathima, A. Artificial ecosystem optimized neural network controlled unified power quality conditioner for microgrid application. Energy Inf. 6, 45. https://doi.org/10.1186/s42162-023-00301-3 (2023).
Article Google Scholar
Singh, M. K., Kumar, A. & Gupta, A. R. Distribution system analysis with UPQC allocation considering voltage dependent Time-Variant and invariant loads including load growth scenario. J. Inst. Eng. India Ser. B. 103, 791–807. https://doi.org/10.1007/s40031-021-00695-2 (2022).
Article Google Scholar
Download references
Department of EEE, Vignan’s Foundation for Science Technology and Research, Guntur, India
Mangalapuri Sravani & Polamraju V. S. Sobhan
Search author on:PubMed Google Scholar
Search author on:PubMed Google Scholar
Sravani Mangalapuri developed methodology, wrote original draft and performed the simulation and data analysis.V.S.Sobhan Polamraju reviewed and edited the manuscript.
Correspondence to Mangalapuri Sravani.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Sravani, M., Sobhan, P.V.S. Deep reinforcement learning-based controller for DC-link voltage regulation and voltage sag compensation in a solar PV-integrated UPQC system. Sci Rep 15, 25800 (2025). https://doi.org/10.1038/s41598-025-08729-1
Download citation
Received: 03 May 2025
Accepted: 23 June 2025
Published: 16 July 2025
Version of record: 16 July 2025
DOI: https://doi.org/10.1038/s41598-025-08729-1
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
International Journal of Dynamics and Control (2026)
Scientific Reports (2025)
Iranian Journal of Science and Technology, Transactions of Electrical Engineering (2025)
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

source

This entry was posted in Renewables. Bookmark the permalink.

Deep reinforcement learning-based controller for DC-link voltage regulation and voltage sag compensation in a solar PV-integrated UPQC system – Nature

Like this:

Leave a ReplyCancel reply

Links

WebSite

Follow Now.Solar via Email

Solar Now

Top Posts & Pages

New Posts

Calendar

Archives

Categories

Meta

Blog Followers

Deep reinforcement learning-based controller for DC-link voltage regulation and voltage sag compensation in a solar PV-integrated UPQC system – Nature

Share this:

Like this:

Leave a ReplyCancel reply

Links

WebSite

Follow Now.Solar via Email

Solar Now

Top Posts & Pages

New Posts

Calendar

Archives

Categories

Meta

Tags

Blog Followers

Discover more from Solar Now