Satellite-based analysis uncovers uneven solar PV distribution across Japan and its consumption of forest and agricultural lands – Nature

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 15, Article number: 26671 (2025)
2225 Accesses
1 Citations
10 Altmetric
Metrics details
The recent development of solar photovoltaics (PV) has generated considerable interest in energy management, appropriate environmental impact assessment, and the seamless integration of PV technology into society. A critical first step in exploring these research opportunities is the creation of a comprehensive PV database that describes the locations and extents of existing PV installations. Automated solar PV detection in satellite remote sensing, based on a machine learning approach, is particularly suitable for studying the characteristics of national-scale solar PV distribution and its impact on the environment. In our study, we first proposed an XGBoost-based solar PV detection with post-processing procedures supported by a dedicated solar PV spectral index. This approach was applied to Sentinel-2 images acquired in 2022 to create a national solar PV database in Japan. The resulting solar PV map showed a high degree of accuracy, with an overall accuracy of 0.984. Our dataset revealed the presence of solar PVs covering a total area of 571 km2 in Japan. The comparison of PV extents with the land cover map showed that the megawatt-scale solar PV facilities were predominantly located in forested areas, suggesting potential changes to existing forest ecosystems and the local environment at these facility locations. Conversely, smaller megawatt-scale PV systems showed a similar preference for both farmland and forest. PV expansion also contributed to forest fragmentation at forest edge areas. To further investigate these findings, we did the clustering analyses to identify high-concentration PV areas and analyzed the distribution of solar PVs alongside socio-economic and environmental factors using an explainable AI approach based on Shapley values. Through the study, we showed how the established PV dataset can be used to uncover spatial patterns and driving factors of PV deployment. Our results indicate that site selection is influenced by a multitude of variables—such as local environmental conditions, power demand, and installation costs—highlighting the need for well-informed strategies when deploying solar PV. Overall, this study demonstrates the efficacy of integrating machine learning models, spectral indices, and post-processing techniques with satellite remote sensing data to accurately map and analyze solar PV installations. Regular updates of these maps from freely available satellite datasets provide valuable insights for policymakers and stakeholders, enabling data-driven decisions regarding the placement, monitoring, and management of PV systems, and supporting a timely transition to a renewable-powered society.
As the world seeks to accelerate the transition toward renewable energy sources, solar photovoltaic (PV) based power production has become a significant alternative as its power capacity is predicted to surpass Coal, Natural gas, Hydropower, Wind, and Bioenergy in 20271. Many countries implemented a feed-in-tariff (FIT) mechanism which subsidises the installation of renewables by ensuring a fixed pricing on electricity generated. It has led to a dramatic increase in PV installation started around 20102, and the amount of newly installed PV capacity reached was estimated to be 240 GW in 20222. The sudden surge of installed solar PV capacity has contributed to replacing fossil fuel-based power generation, but new challenges have also emerged. One aspect is the unstable energy generation from renewable energy resources, such as the wind3 and solar4. Therefore, it is becoming a challenging task for keeping the balance between the demand for energy and the supply from unstable PVs5,6,7. In addition, solar PV is a very land-intensive energy resource as it may need a vast amount of land to place enough PVs to capture the energy from the sun. Those utility-scale-solar energy (USSE) were found to be the source of land change driver8. It is also predicted to occupy 0.5–5% of land by 20509. To quantitatively evaluate the PV related issues, a solar PV inventory, or a database containing the locations and the extent of PVs, is necessary.
To make a solar PV database, manual data curation could be an option. Internet-based data search was utilized for establishing a power-generation plant database in Japan10, but it has been a very challenging task due to the complex procedure of data unification10. Another option is to manually delineate PV extent on Maps and satellite images11. Those manually curated PV information shall be the most trustable information source, but their data coverage would be limited due to the work-intensive procedure for finding and locating PVs distributed in a large area. Open-Street-Map (OSM) was also used for cloud-sourcing information about PV installations in the UK, and 260,000 PVs were found in the country12. However, since OSM is a voluntarily managed dataset, the completeness of the information could be in question as one study found a complex and unequal pattern in the OSM building dataset13. Therefore, automation of the PV database generation processes may improve both data comprehensiveness and the accuracy. The combination of remote-sensing imagery and machine-learning models has already been shown to facilitate the establishment of PV databases. Machine-learning models utilized for PV can generally be divided into two groups, one is the convolutional neural network (CNN) families, and others.
A national-scale PV map produced by applying a CNN model on high-resolution satellite imagery was established in the U.S.14. The CNN models have also successfully been applied to make national-scale solar PV maps in Vietnam from 2019 to 2022 in Sentinel-2 imagery15. The performance of the CNN models on detecting PVs was demonstrated for PVs in Brazil, where the combination of EfficientNet-b7 encoder and UNet decoder was found to be the best-performing model architecture16. Global-solar PV map from 2016 to 2018 was generated by combining the CNNs and recurrent neural networks (RNNs) from Sentinel-2 and SPOT imagery17. Solar PV category was started to be included in the national land-cover map of Japan since the version 21.03, which was released in March 2021. The algorithm for the land-cover classification focuses on the temporal dimension featuring the time-series changes in satellite images such as Sentinel-2, and ALOS-2/PALSAR-2, in which the CNN model is applied on a stack of spectral features taken in different seasons, and it achieved a significant accuracy in solar PV category18. Those CNN-based approach focusing on the spatial characteristics of the PVs usually requires many labelled images with a fixed shape, and Graphical Processing Units (GPUs) are needed to run the model17, which would also be a challenge for a time-efficient and cost-effective solar PV mapping. In addition, the use of commercial and long-time series data may hamper the frequent update in a large-scale target area, where the PVs are rapidly being installed in recent years. The shape of PVs shall also be recognizable in remote sensing images if CNN models are used, which is not necessarily the case when medium-resolution images such as the 10 m resolution Sentinel-2 images are used.
Non-CNN machine learning models, which usually focus on the spectral features of solar PVs, do not require the explicit spatial information. For instance, Random Forest, which was developed in 200119, was employed to make a national-scale solar PV map in China using the Google Earth Engine (GEE) cloud computing environment20,21. These Decision Tree-based models offer the advantage of interpretability in the internal structure of the trained model and reduce the burden of collecting training data, as they can be trained on point-basis samples, unlike CNN-based segmentation models. In addition, if a solar PV has distinct spectral features, it can be detected regardless of its spatial scale. However, their applicability to the PV detection has not been fully explored compared to the CNNs. It was reported that the Random Forest model’s pixel-wise classification results often contain significant salt-and-pepper noise20, necessitating an effective approach to reduce noise while preserving important features likely corresponding to actual solar farms. Furthermore, the use of spectral indices characterizing the PVs, is not common in PV detection studies compared to other traditional land-cover types such as water, built-up areas, and vegetation.
Once the PV database is established, it is possible to quantitatively evaluate the issues surrounding solar PV installations. Land-cover changes are the apparent impacts following an introduction of solar farms. A satellite derived land-cover change analysis revealed that the barren lands were popular PV installation choices in Vietnam from 2019 to 2022, but an increasing trend in the use of forest area use was also observed15. Croplands and barren lands were chosen for PV installations in China based on a satellite-based PV map covering the entire nation20. The same dataset was further utilized for assessing the flood and sediment-related hazard risk on solar PVs by comparing the locations of the assets to existing hazard maps22. Manually collected PV database was used for assessing the current and future solar PV installation condition site selection in Japan and Korea11. It was also found that a comparable amount of natural and semi-natural habitats was lost due to the recent installation of medium solar facilities11. Hence, combining remote-sensing derived PV map with existing land-cover map, a comprehensive view of PV impacts on the local environment can be evaluated. Also, the understanding of the solar PV distribution pattern is a growing research topic to deepen the understanding how the PVs distributed in a region given the specific environmental, and social conditions11,23,24. Remote-sensing based PV maps are more comprehensive than the manually curated PV database, hence the nation-wide PV distribution pattern can be better understood.
In this study, we first aim to develop a national-scale solar PV mapping scheme based on a machine learning model and satellite imagery of Japan. We use a non-CNN machine learning model supported by a dedicated solar PV spectral index, freely available medium-resolution satellite imagery, and post-processing operations to accurately detect PVs with low noise. We also discuss how comprehensive a PV map can be when using the medium-resolution satellite data by comparing the output PV map with government FIT-registered PV statistics. The impact of solar PV on the environment was evaluated from two aspects; one is the land cover changes due to PV installations, and the other is the forest fragmentation effects. These results were further studied together with the socio-economic and environmental variables to explain how the solar PVs are distributed in the country by using an explanatory AI approach with Shapely-based methods.
Following hyperparameter tuning with the Optuna library, the optimal parameters for the XGBoost model, resulting in the highest AUC value from the ROC curve, were determined as “max_depth” = 8, and “eta” = 0.01717. The AUC value on the test dataset calculated from the PV probability with the fitted parameter in the XGBoost model was 0.94972. To identify the most influential input variable in PV detection, the feature importance of the input variables, measured by “gain”, was calculated. PVSI index emerged as the highest importance factor among the nine variables considered (see details in Supplementary Fig. S1). The raw output generated by the trained XGBoost model underwent post-processing including morphological filtering, image binarization through an unsupervised segmentation method, and the removal of very small polygons in Densely-Inhibited-District (DID) regions. Figure 1 provides an illustration of the input Sentinel-2 imagery, the probabilistic map of the solar PV category, and the post-processed PV polygons. As a comparison to the traditional machine learning output, the discrete categorical prediction result from the same trained XGBoost model is shown in Fig. 1b. The discrete prediction output in Fig. 1b exhibits contamination with salt-and-pepper-like noise, whereas the post-processed output in Fig. 1d, derived from probabilistic predictions in Fig. 1c, retains the shapes of the PV installations with significantly reduced noise levels. The output PV polygons have the median value of the “PV probability” inside the perimeter, which could be utilized to show their confidence level as actual solar farms. The temporal scalability of the established approach was qualitatively evaluated by applying the trained model on a set of Sentinel-2 images from 2019 to 2024, where the consistent results were observed through time (see details in Supplementary Fig. S2).
(a) Input Sentinel-2 imagery (contains modified Copernicus Sentinel data processed by Google Earth Engine). (b) Categorical solar PV classification map from a trained XGBoost model as an example of traditional approach, (c) Probability map of the solar PV class calculated by the trained XGBoost model. (d) Estimated solar PV extent from the post-processed probabilistic output overlayed on the Sentinel-2 image. These figures were generated by QGIS (https://qgis.org/) 3.38.0-Grenoble.
Examples of PV detection results are shown in Fig. 2. The post-processed solar PV dataset contains polygons with a total area of 571 km2. This metric significantly surpasses the previously reported values of the machine-learning-based global solar PV dataset in 2018 (total area of 215 km2)17, and the manually delineated PV dataset in 2020 (total area of 352 km2)11, while it was smaller than the solar PV shapes extracted from the “High-Resolution Land Use and Land Cover Map (Japan) Version 23.12” map in Japan in 2022, which utilizes optical, high-resolution SAR datasets, and time-series deep-learning techniques (total area of 646 km2)25.
Examples of the PV detection cases by the proposed methodology. Manually delineated abstract extent of the PV sites are overlayed on each figure. (a1d1) Google Earth Satellite imagery (Google Earth 2024). (a2d2) true-color (R/G/B = B4/B3/B2) of the Sentinel-2 imagery (contains modified Copernicus Sentinel data processed by Google Earth Engine). (a3,b3) Successful PV detection cases. (c3,d3) Unsuccessful PV detection cases. These figures were generated by QGIS (https://qgis.org/) 3.38.0-Grenoble.
Four accuracy metrics were calculated based on the actual category of the 1500 verification points, and the predicted classes in the post-processed solar PV map. The accuracy metrics were calculated, and they are summarized in Table 1. The comparison to the older PV dataset covering Japan made from manual delineation in 202011, the machine-learning-based PV map in 201817, and the extraction from LULC map of Japan in 202225, showed that 97.1%, 97.7%, and 82.3% of those PV polygons were found to be covered by our PV map measured in area, respectively. The governmental statistics classified the utility-scale PVs into 5 categories according to the capacity, (1) below 50 kW, (2) 50–500 kW, (3) 500–1000 kW, (4) 1000–2000 kW, and (5) over 2000 kW. There is an almost linear relationship between the capacity and the PV area (0.86 MW/ha) based on the 50 sample PV site information (see details in Supplementary Fig. S3). This formula was used to convert the detected PV areas in remote sensing imagery into the power generation capacities to be comparable with the aggregated governmental statistical data. The comparisons of the remote-sensing-based PV capacity estimate to the governmental statistics at 47 prefectures level were done with and without the smallest PV capacity (capacity < 50 kW) category, and the results are summarized in Fig. 3. The total PV capacity based on out PV map was 49.09 GW, while it was 54.10 GW and 36.16 GW with and without the below 50 kW PV category as of September 2022, respectively.
This scatter plot contrasts photovoltaic (PV) capacity estimates derived from a remote-sensing-based map with governmental statistics for each prefecture. Red cross-shaped points represent data excluding the smallest utility-scale PV capacities (under 50 kW) in the governmental statistics, while blue star-shaped points include these capacities.
The land-cover use for solar PV installation was assessed by calculating the mode value of the land-cover values within our PV polygon maps depending on their category (megawatt-scale or small-scale). The categorization was done based on the estimated capacity of each PV site derived from the 50 sample site data (0.86 MW/ha). If the PV area is lower than the 1 MW capacity equivalent area estimated from the area-capacity relationship, it is categorized as “small-scale” PV, and if the area is larger than that area threshold, the facility is in the “Megawatt-PV” category. The result is summarized in Fig. 4. It was shown that the forest is the predominant land-cover class in Japan utilized for Megawatt-scale PV installation. “Other lands”, which are defined as vacant spaces such as reclaimed lands or airfields, were also favoured for accommodating many panels for megawatt-scale solar farms. Interestingly, golf course was also found to be significantly used for the megawatt-PVs. While forest remained the most popular PV installation site across the 12 land-cover categories, small-scale PVs were also installed on agricultural lands (paddy fields and other types of agricultural fields combined). Buildings were much favoured for accommodating small-scale PVs compared to megawatt-scale PVs.
The land-cover categories used for solar PV installations are calculated from the 2009 land-cover map and the detected PV polygons.
The FFI changes highlighted in Fig. 5 showed that the forest fragmentation due to solar PV installations is concentrated like a linear pattern, rather than a uniform distribution in the forested zones. Proportion of the forest cover at each mesh is also shown in Fig. 5 as a reference of the forest distribution in the region. A comparison of the ΔFFI to those indicated that the change of FFI is concentrated along the forest edge.
The forest fragmentation status change with PV installations calculated at a 10 km mesh scale by taking the difference of FFI value with solar PVs, and the reference FFI condition derived from the 2009 land-cover map. (a) The national ΔFFI map. (b) Expanded view of ΔFFI around Kanto-area. (c) Forest cover ratio. These figures were generated by QGIS (https://qgis.org/) 3.38.0-Grenoble.
The global Moran’s I of the proportion of solar PV area at each 10 km mesh was 0.386, with a p value of 0.001, suggesting a strong spatial autocorrelation. A clustering approach was carried out to find statistically significant PV concentration locations, and they were used as the target variables of an XGBoost model taking land-cover, FFI in the reference condition (2009), and other socio-economic and environmental factors. The result of the solar PV distribution clustering, the output of the XGBoost model predicting the PV clusters given various input variables, and the SHAP results are summarized in Fig. 6. The clustering output highlighted the high-solar PV penetration areas as solar PV concentration hotspots. The fitted XGBoost model showed an AUC value of 0.938 against the test dataset of predicting the solar PV hotspot probabilities. The visual assessment of the probabilistic output shown on Fig. 6c showed an agreement with the distribution of the solar PV hot spots. The SHAP outputs, summarized in Fig. 6d, showed that the snow depth is the most important variable to predicting the solar PV hotspots, while other socio-economic variables, such as the population density, distance to the DIDs could be responsible factors for the observed solar PV distributions. SHAP analysis also revealed how each variable contributed to the model prediction as shown in Fig. 6e. The larger maximum snow depth, slope, and the distance from DIDs, resulted in the negative SHAP value, while the smaller values were related to the higher SHAP value. On the other hand, the proportion of the golf course, FFI values in the reference land-cover condition year, agricultural land, showed an opposite contribution. The population density showed vague result, while the individual scatter plot showed that the SHAP value increased up to 500 people/sq. km, while it decreased after 500 people/sq. km (see details in Supplementary Fig. S4).
The result of the solar PV distribution pattern analysis with a clustering method and a machine-learning model. (a) The solar PV occupancy per mesh. (b) The solar PV hotspot distribution quantified as a result of local-Moran’s I based clustering analysis. (c) The calculated solar PV hotspot probability by a trained XGBoost model. (d) The mean absolute SHAP values of the candiate explanatory variables. (e) The SHAP summary plot showingthe contribution of each factor to the model prediction. These figures were generated by QGIS (https://qgis.org/) 3.38.0-Grenoble.
In this study, we conducted a national-scale solar PV mapping from the Sentinel-2 imagery in 2022, with the aim of achieving a high accuracy by an XGBoost model combined with dedicated spectral indices and post-processing and investigating the land-cover consumption of PVs from the produced PV map. The proposed methodology, which integrates the XGBoost model and post-processing operations, demonstrated strong performance in detecting solar PVs with high accuracy metrics, and low-noise level in the output, even though the model does not use spatial information unlike CNN models. Comparisons with governmental statistics revealed that Sentinel-2-based PV detection could enhance the comprehensiveness of existing PV datasets, particularly for PVs with a capacity larger than 50 kW. Land-cover types used for megawatt-PV sites and small-PV facilities were different, as the former is mostly installed in forested areas, while the latter is also found in farmlands. The forest fragmentation was found to be concentrated around the forest edges, indicating the possible impacts of solar PV installations to the forest environment and ecosystems. Land-cover, forest fragmentation index, and other socio-economic and environmental variables were used to predict the solar PV concentration hotspots. The fitted model showed a high-performance in predicting the PV hotspots, and the SHAP-based analysis revealed the importance in both environmental and human-related activities to the existing solar PV patterns. The proposed solar PV detection method would enable frequent updates of the fast-growing solar PVs in a large area from freely available satellite data source, enabling the up-to-date characterization and the monitoring of solar PV distributions. Additionally, the consumption and fragmentations of forested areas by the PV sites emphasizes the importance of regulating PV site construction in ecologically significant areas. Appropriate policy-driven approach guiding the future solar PVs close to the demand sites, such as the rooftops of factories, garbage dump sites, or reclaimed lands, rather than cutting down trees and replacing agricultural lands with PVs, may enable a balanced solar PV expansion in the long run.
Our modified version of the photovoltaic spectral index was the most important feature of the 9 input bands and spectral indices. This finding underscores the importance and effectiveness of leveraging a well-designed combination of spectral information to enhance the presence of the target feature in satellite imagery. The probability-based PV delineation approach allows us to select the high-confidence pixels belonging to PVs, deviating from traditional binary PV-NonPV classification strategies20. As shown in Fig. 1c, the cluster of PV panels is highlighted with the high-PV probability. In addition, the level of salt-and-pepper-like noise pixels, which are sometimes found in the existing PV mapping studies20, was well compensated by the post-processing process in the vectorized solar PV shapes in Fig. 1d compared to the categorical classification results in Fig. 1b, even though the same trained model was used. The low false positive level of our PV detection result was further supported by the high-precision value shown in Table 1. Importantly, our map covers most of the features in the manually delineated, and automatically produced global PV datasets11,17. The level of agreement of our result with the PV map extracted from local land-cover map in 2022 (JAXA lulc map)25 was lower compared to other two datasets, which could be due to the use of commercial high-resolution SAR data enabling the detection of many small PVs under the detectable limit of our current approach. The temporal scalability of the proposed approach also opens a repeated and consistent solar PV map updates at a nation-wide scale.
As shown in Fig. 2a1–a3 and b1–b3, the successful PV detection results well represented the actual solar farm perimeter. The proposed PV detection could not detect PVs in some cases as shown in Fig. 2c1–c3 and d1–d3. In addition to the crystalline silicon (c-Si), amorphous silicon (a-Si), cadmium telluride (CdTe), and copper-indium gallium diselenide (CIGS) PVs were also developed and, on the market26. According to the official website information (https://www.kaneka.co.jp/topics/news/n20131010/), the large-scale solar farm shown in Fig. 2c1–c3 is made from thin-film Si PVs. Since the material features and physical structure of those thin-film PVs differ from the c-Si PVs, panels which are the most common PV types on the market27, their spectral responses could also be different. Hence, the XGBoost model trained mostly on c-Si PV spectral responses could fail to recognize the PVs. In the case of PVs in Fig. 2d1–d3, no relevant web sources show whether those PVs are c-Si panels or others, but their color is different compared to the PVs in Fig. 2a1–a3 and b1–b3. Therefore, they might also be made from non-c-Si materials which could have contributed to the lower-detectability by the trained model. Another possible reason of the failed PV detection could be the arrangement if the PV panels. In Fig. 2d2, the color of the panels is blurred in Sentinel-2 imagery, which might be caused by the inclusion of the background soil information over the panels due to the relatively large gaps among the panels. The limitations in recognizing PVs made from diverse materials and arrangements underscore the importance of considering and adapting to the spectral variations associated with different PV technologies and panel arrangement methods during model training for further improving the PV detection performance.
The established PV dataset generally underestimated the total PV capacity per prefecture when including the smallest PV category (capacity < 50 kW). In contrast, the dataset exceeded the aggregated PV capacity statistics if facilities smaller than 50 kW were excluded, as illustrated in Fig. 3. The smallest category, having PVs smaller than 50 kW capacity (≈ 600 m2), may contain PVs falling below the detectable size limit from Sentinel-2 imagery. Given that their size of those smallest scale PVs could be just a few pixels in the 10 m resolution bands (B4, B3, B2, and B8) or 20 m resolution bands (B11 and B12) used as input data for the model, Sentinel-2 imagery proves suitable for mapping medium to large-scale solar PVs. However, finer resolution input data would be necessary for accurately delineating PVs smaller than 50 kW capacity.
We observed that the preferred land-cover types for PV installation are different in megawatt-scale facilities and small-scale PVs as shown in Fig. 4. The predominant approach to creating solar PV installation space might be to involve the clearance of trees, which could have significant negative effects on the local ecosystems, and environment. The result of the FFI changes shown in Fig. 5 showed that the solar PV induced forest fragmentation effects possibly happen around the forest edges. Even though the species richness is reportedly higher at around edges than the interior of the forest in temperate climate28, solar PVs are controlled artificial environments, they may have negative impacts on the animals, plants, birds, and insects. Small-scale solar farms primarily chose facility sites in agricultural fields, especially the paddy fields. Those small-scale PVs would not require as much land compared to the megawatt-scale PVs, which could have allowed more flexibility for site selection. In Japan, the issue of farmland abandonment has arisen, contributing to the attractiveness of these abandoned or nearly abandoned agricultural spaces for new PV facility construction29. Farmlands were already shown to be suitable land cover for PVs30. Thus, selling those agricultural fields to solar PV operating companies could be beneficial for both landowners and efficient electricity production from PVs. Buildings were also predominantly chosen for small-scale PV installations. Placing PVs on the rooftop of factories or warehouses increases the share of renewable energy in a company, and those panels neither consume forest nor fall in high-disaster areas, which might be a suitable choice for further increasing solar capacity in the country. The use of golf-courses for solar PV installation emerged as sites for PV installation. There were numerous golf courses in Japan due to the high demand from 1980 to 199031. Some Japanese golf course companies faced economic problems due to falling demand31. Therefore, companies might have sold their lands to the PV operators who needed such open space for large-scale solar-farm installations.
The discussion of solar PV site selection strategies can be further elaborated by the XGBoost model trained to predict the solar PV hotspots. These hotspots consistently represented the high solar PV penetration areas near the largest cities in Japan, namely Tokyo, Nagoya, Osaka, and Fukuoka. The high AUC values of the calculated solar PV hotspot probabilities from the trained XGBoost model indicated that the model was well trained on the relationships between the preferred solar PV installation locations, environmental, and socio-economic variables. It is intuitive to infer the negative impact of snow on the solar PV site selection indicated by the SHAP outputs, as the snow both blocks sunlight during winter and the heavy load of snow could destroy the PV structures. The higher slopes are not a favorable condition for PV installations as they may increase the construction cost and the risk of landslides. The distance to the DIDs could be related to the cost of transmitting the electricity to the demand sites, as the longer the distance, the higher the loss of electricity and the lower the availability of the power grid. The clear positive contribution of the Golf Course, and Other Land categories can be attributed to the availability of PV installation sites. The magnitude of FFI at the reference year had a positive effect on the PV hotspo probability, suggesting that the already highly fragmented forest edge areas due to human activities, might have been further developed into PVs due to the high accessibility from the nearby settlements. An interesting phenomenon was observed for population density. Low SHAP values were observed at both the very high and low population densities, while the maximum positive SHAP value occurred around 500 people/sq km. This may indicate that the PVs can’t be installed in the sparsely populated areas due to the lack of maintenance workers, while the high land price in the densely populated areas is not feasible for the use due to the high land price. Based on these observations, different approaches may be needed to facilitate solar PV expansion while preserving the local environment. For example, in rural areas, it may be possible to use large-scale solar PV with batteries, which allows better utilization of the produced electricity by storing and selling the excess electricity during the night, thus contributing to the efficient use of the available land and the power-grid without the need for simply installing more PVs. In and around urban areas, the introduction of rooftop solar systems and the local use of the energy produced can be mandated or supported by the government to facilitate the use of vacant land. In addition, rooftop solar could be a promising option as there will be more and more electric vehicles that need to be charged by the solar PVs.
The proposed methodology has demonstrated high performance in detecting solar PVs in Japan. However, there is still room for further improving the accuracy and dataset comprehensiveness. One approach could the texture information32, since the current approach only considers spectrum information of the Sentinel-2 data. The 10–20 m resolution Sentinel-2 imagery is limiting the detection of PVs smaller than 50 kW capacity. Overcoming this limitation would necessitate the use of satellite data with better spatial resolution, but it may also increase the financial limitations and less-frequent PV dataset update opportunities, hence the careful selection of the spatial resolution and the cost of image acquisition is needed. The observed pattern derived from the remote-sensing based PV map can be explained with a quantitative modelling method to better understand the background factors of the observed PV distribution patterns. Furthermore, hazard risks on existing solar PVs would also be crucial parts of using PVs through their long lifetime (> 20 years). Hence, up-to-date remote-sensing-based PV map provide a comprehensive view of the potential hazard risks on existing PV facilities compared to the analysis based on the manually produced PV database22. The remote sensing derived PV map can also be used as an input parameter together with the power grid map for seeking the optimized locations of battery systems33, which may be a necessary tool in near future to harness the instability of grid-connected large scale solar farms. Coupling the large-scale solar PV distribution data derived from remote sensing imagery with the PV parameters34 would enable a better prediction performance of solar power generation. This holistic approach would contribute to a more informed understanding of the environmental impact, risk mitigation, and future planning in the context of solar PV installations.
Our study introduced a novel methodology for mapping solar photovoltaic (PV) installations using the probabilistic output in 10 m resolution Sentinel-2 imagery by utilizing an XGBoost model, spectral indices, and post-processing. Despite the inferred limitation in detecting small PVs due to the limited spatial resolution of Sentinel-2 imagery, the PV map we produced showed a satisfactory performance in PV identification. This highlights the potential for frequent and periodic updates of PV maps using freely available satellite imagery to ensure a comprehensive and accurate representation of solar PV installations. The land cover analysis showed that different PV siting strategies were used, as the megawatt-scale PVs were mostly introduced in cleared forests, while small-scale PVs were also built in farmlands and the urban environment. Possible forest fragmentation was inferred by comparing the baseline forest map and the post-PV expansion situations, indicating the impact on the local environment and ecosystems after the PV boom. The land cover map, forest fragmentation index, and socio-economic factors were used with the solar PV concentration cluster maps to discuss the potential relationships among these variables using an XGBoost model and the SHAP approach. The fitted model and its characteristics showed the potential interactions and effects that determine the current solar PV distribution pattern, suggesting the better and more informed way to increase PV penetration while balancing environmental conditions. There is still room to improve the comprehensiveness of the PV map by utilizing higher resolution satellite products, and the use case of the satellite-derived PV map will be explored in hazard risk assessment and regional-scale PV battery system planning. As solar energy is an indispensable part of achieving a green society, our findings would contribute to the further utilization of available land where the potential is high, but the infrastructure is scarce, while minimizing the negative externality resulting from the expansion of solar farms into ecologically important zones in the future.
The methodological flowchart of the study is shown in Fig. 7. The method is composed of four steps. (1) Image preprocessing; (2) Solar PV detection machine learning model training; (3) Solar PV database generation in Japan with the trained model and post-processing to reduce false positives; (4) Verification of the PV detection accuracy, and validation of the generated PV map comprehensiveness by using governmental statistics; (5) Land-cover change characteristics quantification, (6) Forest Fragmentation assessment; (6) Solar PV distribution pattern analysis by combining clustering methods, machine-learning model, and a SHAP approach.
The overall workflow of the solar PV mapping in sentinel-2 using the XGBoost model for quantifying land-cover changes, and the quantitative analysis of the spatial distribution pattern with a logistic regression model.
The study area covers the entire national territory of Japan. Japan is chosen for the target area of analysis because (1) it is the third biggest solar PV operator in the world as of 2022 with the estimated amount of 84.9 GW equivalent of PVs in the country2, (2) FIT registered solar PV statistics have been managed and open to public, hence they are comparable to remote-sensing based data, and (3) local land-cover maps before the increase of solar PVs with unique land-cover categories not available in global-scale land-cover map (e.g., golf-course, railroads, etc.) is available. Most of the country is in temperate climate regions, while there are also subtropical and subarctic areas in southern islands and northern parts of the country, respectively. The summer season (July and August) is characterized by hot and humid weather, while heavy snow is expected in northern areas and prefectures along the Japan Sea coast in the winter (December to February). The population is mostly concentrated in the flat lowlands. Given the common practice of locating solar PVs near human settlements for convenient maintenance17, this study also followed the strategy of limiting the target areas around the populated areas17. We took the population dataset from GHS-POP R2023A showing the number of people per cell available on Google Earth Engine (GEE) platform35. It was resampled to 1000 m resolution for efficient vector conversion on GEE. After the reprojection, pixels having the value larger than 1 person/cell were taken as “populated” regions, and those areas were converted into polygons. 7-km buffer was applied around those polygons which was used as the target region of the analysis. The buffered region was found to contain 99.9% of manually delineated 9250 polygons in Japan as of 202011, hence it was assumed to cover most of the existing PVs in the country, so we only retained detected solar PV polygons within the above-mentioned buffered area.
In this study, we employed Sentinel-2 Level-2A products, representing atmospherically corrected surface reflectance, accessible on the Google Earth Engine server. Snow affects the image quality during winter and early spring (November–April). Rainy seasons in June and the summer period were also excluded since humidity is quite high during those periods, which could degrade the atmospheric correction process. Hence, the study period was specifically set from September 1, 2022, to October 31, 2022, focusing on the autumn season. To address cloud interference, each image from the “COPERNICUS/S2_SR_HARMONIZED” product was paired with the corresponding cloud probability product “COPERNICUS/S2_CLOUD_PROBABILITY”. The cloud-probability product gives the likelihood of clouds at pixel level as calculated from the lightGBM model and 10 Sentinel-2 bands36. The pixels having a cloud probability larger than 50% were masked. To ensure dataset robustness, a median value approach was adopted for time-series data, recognized for its resilience against extreme values20. Processing was applied to Sentinel-2 images covering Japan, organized into 0.25° × 0.25° square regions.
From the 13 Sentinel-2 bands, ‘B4’, ‘B3’, ‘B2’, ‘B8’, ‘B11’, and ‘B12’ were chosen and renamed to ‘Red’, ‘Green’, ‘Blue’, ‘NIR’, ‘SWIR1’, and ‘SWIR2’, respectively. Additionally, the study conducted computations for four spectral indices, either incorporating them into the input data or integrating them within the postprocessing stages. The Normalized Difference Vegetation Index (NDVI, Eq. (1)) Modified Normalized Difference Water Index (MNDWI, Eq. (2)) were chosen for characterizing the presence of vegetation and water surface, respectively37,38. In addition to those two commonly used indices, the study introduced two novel variables aimed at refining solar PV detection and mitigating the misclassification of bluish synthetic materials as PV panels. The first index is named as Photovoltaic Spectral Index (PVSI, Eq. (3)), which is a modified and simplified version of the previously proposed formula39. This index is tailored to capture the distinct characteristics of the crystalline silicone solar PV reflectance curve40, particularly focusing on higher reflectance at the SWIR 1 region (1610 nm) compared to both NIR (842 nm) and SWIR2 (2190 nm). The ratio between Blue and Red was introduced to the PVSI values from reddish land covers such as exposed bare soils. Another simple spectral index, named as Blue Spectral Index (BSI, Eq. (4)), was also defined to remove bluish synthetic materials from the solar PV detection outputs during the post-processing.
To ensure a diverse representation of environments and adequate sampling from the solar PV class, the following training data collection strategy was taken. Initially, 600 locations were randomly distributed in a manually delineated solar PV shape dataset in 202011. Subsequently, 300 points were randomly selected within the 7 km buffered region around populated areas, and an additional 100 points were manually chosen from predominantly urban areas to enrich the model’s understanding of spectral variations in non-PV classes. These 1000 points, as shown in Fig. 8a, served as centre locations for 1000 m × 1000 m squares. Within each square area, any PVs not represented in the solar farm dataset11 were manually added to prevent the collection of “non-PV” category points from the PV sites in the subsequent random sampling process. The modified solar PV shape dataset including the PVs outside the 1000 m × 1000 m squared was utilized for collecting the PV category sample points. The difference between the square areas and the PV shapes facilitated the collection of non-PV category samples. From those PV and non-PV areas, 20,000 and 40,000 points were randomly placed. An example of the allocated points is shown in Fig. 8b. At each location, ‘Red’, ‘Green’, ‘Blue’, ‘NIR’, ‘SWIR1’, ‘SWIR2’, ‘NDVI’, ‘MNDWI’, and ‘PVSI’ were extracted as the input data to the machine-learning model. The data extraction process was not executed at 88 locations due to missing values in the Sentinel-2 imagery. As a result, data from 19,969 points and 39,943 points were used for PV and non-PV classes, respectively. An 80–20 split was applied, allocating 80% of the total samples for parameter tuning and training, while the remaining 20% were reserved for evaluating the model’s performance based on the ROC-AUC (“area under the curve (AUC)” value of the “receiver operating characteristic (ROC)” curve). The model output was subject to post-processing as described in the following sections. Therefore, independent verification points were also taken for the accuracy assessment of the post-processed PV dataset. 750 points were randomly generated within the study area, and 750 additional points were manually added from PV sites. In total, 1500 points were prepared for the post-processed PV dataset accuracy metrics calculation (see the details in Supplementary Fig. S5). The resulting validation dataset contains 752 points for the PV class, and 748 points for the non-PV (background) class.
(a) The target area of the study, and the distribution of the 1000 m square areas for training data sampling. (b) An example of the distributed sampling points inside the square shaped polygon, and the reference PV shapes. The background satellite map is based on the Google Satellite Map (Google Earth 2024). These figures were generated by QGIS (https://qgis.org/) 3.38.0-Grenoble.
Various machine-learning techniques have been developed, and the decision-tree based algorithms are one of the most popular options. They also allow the interpretation of the contribution of input variables to the prediction based on the feature importance values, which is a suitable functionality of discussing the effectiveness of PVSI in highlighting a presence of PVs in satellite imagery. In this study, we opted for the XGBoost41, a subtype of gradient-tree boosting techniques. It has gained prominence for its state-of-the-art performance in various machine learning application41. It is reported that the XGboost has better generalization capabilities compared to other machine learning models for detection clouds and land cover classification42,43, and its output is more consistent with reference land-cover map compared to lightGBM and Random Forest models43. The probability of solar PV class was calculated by the “xgboost.predict_proba” method, implemented in the Python library. We first set the “n_estimators” parameter to 1000, which is increased 10 times to the default setting (100) based on our previous experience of the computational cost of calculating results and model performance. Subsequent to this, we conducted hyperparameter tuning by the Optuna library44. The “max_depth” and “eta (learning rate)” parameters were subject to the parameter tuning process with 100 iterations. Default settings were retained for other parameters. The performance indicator utilized was the average ROC-AUC score derived from a fivefold cross-validation on the training data. The set of parameters that yielded the highest ROC-AUC score during the tuning process was selected for the solar PV mapping phase. To assess the performance of the parameter-tuned model, the spared test data was employed, with the ROC-AUC score serving as the evaluation metric.
Prior to applying the trained model for solar PV probability calculation, pixels unlikely to contain PVs were masked out by using three spectral indices (NDVI, MNDWI, and BSI). For NDVI and MNDWI, the criteria for not applying the PV detection model were NDVI > 0.5 and MNDWI > 0 based on the literature survey since those locations are likely to have dense vegetation and water surfaces, respectively38,45. Additionally, pixels with BSI > 0.2, associated with bluish synthetic materials, were eliminated, with the threshold determined through manual adjustment (see the supplementary Fig. S6 for the details). The trained XGBoost model was applied on the input imagery to calculate the probabilistic values of the solar PV category.
The probabilistic map was subject to post-processing to produce low-noise level solar PV map. First, “white-top hat” morphological operation with a 1-pixel radius disk-shaped footprint was applied to the probabilistic output to make an image containing small elements which could be the source of “salt-and-pepper” style noise. It was subtracted from the original solar PV probability map to reduce the noise level. To find meaningfully connected pixels, which are likely to be actual solar farms, we applied “Efficient Graph-Based Image Segmentation”46 method on the noise-reduced probability map. Pixels where the probability value is larger than 0.6 generally correspond to the solar PV clusters based on the visual interpretation, so the segmentation was only applied on those pixels. The parameters for the segmentation algorithm were set to min_size = 1, scale = 100, and sigma = 0.5. Input images were first divided into 348 by 348 square-shaped patches, and the results were recombined after applying the segmentation algorithm. If all the pixel values are zeros, the algorithm was not applied. The segmentation outputs were converted into polygons. The median of solar PV probabilities within the polygonised shapes was calculated and added to each polygon as a new property which indicates the reliability of the result. The holes within the polygons were filled to only retain the exterior shape of the solar PV sites. Polygons not included in the 7 km buffer region from inhabited areas were removed from the analysis to reduce the likelihood of false detection. In addition, small polygons (area < 500 m2) within densely-inhibited-district (DID), which is defined as an area of a city, town, or village that is composed of a group of contiguous basic unit blocks each of which has a population density of about 4000 inhabitants or more per square kilometer, and whose total population exceeds 5000 by the Statistics Bureau of Japan, were subsequently removed. In this process, large-scale polygons (area > 500 m2) were retained since they are likely large-scale solar PVs on top of factories or storage. The DID extent as of 2015 was downloaded from the Digital National Land Information database47. If a polygon is smaller than 500 m2 and intersects with the DID extent, it is considered a misclassified object and removed from the analysis.
The performance of the post-processed solar PV database in 2022 was evaluated based on the 1500 verification points. Five common accuracy metrics, overall accuracy, precision, recall, f1-score, and Cohen’s Kappa coefficient were used to assess the performance of the post-processed PV map. Given that PVs typically have a lifespan of 25 years48, our PV map as of 2022 should ideally encompass PVs from older datasets. The percentage of inclusion of the existing PV datasets in our results may provide a measure of the completeness of PV detection at a facility level. By considering the methodological variations, input-data richness, and the coverage of the datasets, we took the PV shapes from three data sources. They are the manually labelled PV database as of 202011, machine-learning-based global-scale PV maps made from SPOT and Sentinel-2 imagery from 2016 to 201817, and the LULC maps produced from various satellite resources covering Japan in 202225. The input datasets for the LULC map in Japan are Sentinel-2 annual time-series images, PALSAR-2 polarimetric decomposition results, PALSAR-2 high-resolution (3 m) texture data, road network from Open Street Map (OSM), agricultural field shapes, AW3D DSM, and the slope derived from DSM. For each dataset, PV polygons within Japan were identified, and the ratio of the total solar PV polygons intersecting our PV map to the total PV area was calculated as the metric of the agreement.
In addition to the verification process of the output PV dataset, we also validated our map by comparing the estimated PV capacity from the dataset to the governmental statistics. The Japanese government maintains records of PV facilities participating in the FIT program. Recognizing the nearly linear relationship between PV capacity and facility area49, we established the conversion formula by collecting the capacity information from the relevant web sources and the area of the solar farms by manually delineating the perimeter at 50 solar PV sites to construct a linear model ((y=ax)), where (y) is the capacity [MW] of a solar farms, while (x) shows the area [ha] of the PVs. The coefficient of the model is used to convert the area of a PV polygon to its estimated power capacity. The estimated capacity from the remote-sensing dataset was aggregated at the prefecture level (47 prefectures), and this was compared to the governmental utility-scale PV capacity installation statistics as of September 2022, available on the information disclosure website (https://www.fit-portal.go.jp/PublicInfoSummary).
Land-cover changes are one of the most apparent environmental impact due to the construction of solar farms which usually require a large swath of land for placing many panels. Given the surge in PV development in Japan following the introduction of the FIT policy in 2012, we used 100 m resolution land cover data from 2009 as a reference status, representing conditions before PV installation. The land-cover map was taken from the digital national land information website managed by the Japanese Ministry of Land, Infrastructure, Transport, and Tourism47. The original land-cover dataset was prepared in vector format, so they were converted into raster format for an efficient processing. The mode value of the land-cover class pixels intersecting the PV polygons was calculated as the representative surface condition used for PV installation.
In addition to land cover changes, forest fragmentation levels were quantified using a forest fragmentation index (FFI) to assess the potential ecological impact of PV panel deployment. The FFI was introduced to quantify the level of forest fragmentation on a global scale50. It is calculated by averaging three indices, edge density (ED), patch density (PD), and mean patch area (MPA) after normalizing each variable50.
where i indicates the i-th land-cover category, A is the total area, N is the total number of patces, ({e}_{i,k}) is the total edge between the class i and class k, and ({a}_{i,j}) is the area of the j-th patch of the class i51. Normalization is done using the minimum and maximum values of each variable, after dealing with outliers based on the first and third quartiles and the quartile distance50. FFI ranges from 0 to 1, with higher values indicating a significant state of forest fragmentation.
The calculation of FFI requires a standardized unit for calculating the indices. We chose a “second-level grid” defined by the Japanese government. It consists of a predefined area at about 10 km in longitude and latitude directions. The FFI was calculated within each grid. First, we extracted the forest category from the rasterized land cover map in 2009 to obtain the baseline forest condition before the expansion of solar PV in Japan. The generated solar PV map was superimposed on the baseline forest map, and the extent of forest overlapping with the PV areas was removed to represent the forest condition after the solar PV installations. ED, PD, and MPA were calculated for the forest category class using the PyLandstats library51. Following the calculation steps described in Ma et al.50, FFI values were calculated for both the baseline year and the post-solar PV installation period. The difference in FFI (∆FFI) was calculated to highlight locations affected by PV-induced forest fragmentation.
The relationship between solar PVs, land cover, and forest fragmentation was quantitatively evaluated using a clustering method, a machine learning model, and the model interpretation model to deepen the discussion on how solar PVs might have spread in Japan. This analysis consists of three steps; (1) clustering approach to find the solar PV installation concentration zones (PV hotspots), (2) fitting an XGBoost model to predict the solar PV hotspots given the land-cover data, forest fragmentation condition, and socio-economic factors, and (3) interpreting the model using Shapely Additive explanation (SHAP).
The proportion of the area covered by solar PVs was calculated at each second-level grid. These areas can be grouped into four category clusters based on the Local Indicators of Spatial Association (LISA). The presence of general clusters can be evaluated using the Global Moran’s I (GMI)52. It is defined by the Eq. (9), where the higher values indicate the presence of clusters.
where n indicates the number of meshes, ({w}_{i,j}) denotes the spatial weight matrix, ({x}_{i}) is the target variable (i.e., the proportion of solar PV area at each mesh), and (overline{x }) is the average of the target variable. A p value can be computed by permutating the input values to test whether the presence of clusters is statistically significant level. To find the local clusters, the local Moran’s I (LMI) can be computed53. LMI is defined by Eq. (10), and the p values can also be computed locally to find the statistically significant clusters.
We used p = 0.05 to find statistically significant solar PV concentrations, or clusters. In addition, the characteristics of each cluster can be divided into four categories, High–High, Low–Low, High–Low, and Low–High. High–High (Hotspot) clusters have higher values compared to the global average, and the nearby features also tend to have higher values. These metrics were calculated by using the PySAL python library54. We used these clusters as the objective variable for the machine learning based analysis.
The XGBoost model used to identify solar PVs was adapted to classify each patch as a PV hotspot or background, given a list of input variables. In addition to the land cover metrics and FFI as environmental indicators, additional factors were introduced to explore the distribution pattern of solar PVs found by the clustering technique. First, we acquired different data sources (see Supplementary Table S1). Since our goal is to explore the relationships between the PV pattern and the input variables, highly correlated variables were removed using the variance of inflation factor (VIF), and then variables were removed until all variables had a VIF below 10. The resulting list of variables is presented in Table 2. 80% of the dataset was used for training, and remaining 20% was utilized for testing. An XGBoost model, with parameters of use_label_encoder = False, eval_metric = ‘logloss’, was fitted to the training dataset, and the performance of the fit was evaluated by using the ROC-AUC value of the test dataset.
The trained model was further investigated by using the Shapely Additive Explanation (SHAP) method. It is based on a concept of game theory to explore the contribution of each feature to the model prediction55. The importance of each feature, measured by the mean of the absolute SHAP values, and the positive/negative contribution of each feature to the prediction were investigated by using a TreeExplainer module of the shap python library56.
The solar PV dataset produced in this study is available at https://zenodo.org/doi/https://doi.org/10.5281/zenodo.10674681.
IEA. Renewables 2022. https://www.iea.org/reports/renewables-2022 (2022).
IEA-PVPS. Snapshot of Global PV Markets 2023 (2023).
Ge, Y. et al. A novel hybrid model based on multiple influencing factors and temporal convolutional network coupling ReOSELM for wind power prediction. Energy Convers. Manag. 313, 118632 (2024).
Article  Google Scholar 
Gutiérrez, L., Patiño, J. & Duque-Grisales, E. A comparison of the performance of supervised learning algorithms for solar power prediction. Energies (Basel) 14, 4424 (2021).
Article  Google Scholar 
Hou, Q. et al. Probabilistic duck curve in high PV penetration power system: Concept, modeling, and empirical analysis in China. Appl. Energy 242, 205–215 (2019).
Article  ADS  Google Scholar 
Komiyama, R. & Fujii, Y. Optimal integration assessment of solar PV in Japan’s electric power grid. Renew. Energy 139, 1012–1028 (2019).
Article  Google Scholar 
Wong, L. A., Ramachandaramurthy, V. K., Walker, S. L. & Ekanayake, J. B. Optimal placement and sizing of battery energy storage system considering the duck curve phenomenon. IEEE Access 8, 197236–197248 (2020).
Article  Google Scholar 
Hernandez, R. R., Hoffacker, M. K., Murphy-Mariscal, M. L., Wu, G. C. & Allen, M. F. Solar energy development impacts on land cover change and protected areas. Proc. Natl. Acad. Sci. 112, 13579–13584 (2015).
Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 
van de Ven, D.-J. et al. The potential land requirements and related land use change emissions of solar energy. Sci. Rep. 11, 2907 (2021).
Article  ADS  PubMed  PubMed Central  Google Scholar 
Asanobu, K. Electrical Japan: Visualization for situational awareness on public data archives. J. Jpn. Soc. Digit. Arch. 3, 295–299 (2019).
Google Scholar 
Kim, J. Y., Koide, D., Ishihama, F., Kadoya, T. & Nishihiro, J. Current site planning of medium to large solar power systems accelerates the loss of the remaining semi-natural and agricultural habitats. Sci. Total Environ. 779, 146475 (2021).
Article  CAS  PubMed  Google Scholar 
Stowell, D. et al. A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK. Sci. Data 7, 394 (2020).
Article  PubMed  PubMed Central  Google Scholar 
Herfort, B., Lautenbach, S., Porto de Albuquerque, J., Anderson, J. & Zipf, A. A spatio-temporal analysis investigating completeness and inequalities of global urban building data in OpenStreetMap. Nat. Commun. 14, 3985 (2023).
Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 
Yu, J., Wang, Z., Majumdar, A. & Rajagopal, R. DeepSolar: A machine learning framework to efficiently construct a solar deployment database in the United States. Joule 2, 2605–2617 (2018).
Article  Google Scholar 
Shimada, S. & Takeuchi, W. Revealing a shift in solar photovoltaic planning sites in Vietnam from 2019 to 2022. Remote Sens. (Basel) 15, 2756 (2023).
Article  ADS  Google Scholar 
da Costa, M. V. C. V. et al. Remote sensing for monitoring photovoltaic solar plants in Brazil using deep semantic segmentation. Energies (Basel) 14, 2960 (2021).
Article  Google Scholar 
Kruitwagen, L. et al. A global inventory of photovoltaic solar energy generating units. Nature 598, 604–610 (2021).
Article  ADS  CAS  PubMed  Google Scholar 
Sota, H. et al. Generation of high-resolution land use and land cover maps in JAPAN version 21.11. J. Remote Sens. Soc. Jpn. 42, 199–216 (2022).
Google Scholar 
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article  Google Scholar 
Zhang, X., Xu, M., Wang, S., Huang, Y. & Xie, Z. Mapping photovoltaic power plants in China using Landsat, random forest, and Google Earth Engine. Earth Syst. Sci. Data 14, 3743–3755 (2022).
Article  ADS  Google Scholar 
Feng, Q. et al. A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020. Sci. Data 11, 198 (2024).
Article  PubMed  PubMed Central  Google Scholar 
Hao, K., Ialnazov, D. & Yamashiki, Y. GIS analysis of solar PV locations and disaster risk areas in Japan. Front. Sustain. 2, 815986 (2021).
Article  Google Scholar 
Tao, L., Hayashi, K., Shiraki, H., Huang, X. & Dem, P. Exploration of determinants underlying regional disparity in rooftop photovoltaic adoption: A case study in Nagoya, Japan. Appl. Energy 367, 123469 (2024).
Article  Google Scholar 
Sun, Y., Zhu, D., Li, Y., Wang, R. & Ma, R. Spatial modelling the location choice of large-scale solar photovoltaic power plants: Application of interpretable machine learning techniques and the national inventory. Energy Convers. Manag. 289, 117198 (2023).
Article  Google Scholar 
JAXA Institute of Space and Astronautical Science. High-Resolution Land-Use and Land-Cover Map of Japan [2022]. https://www.eorc.jaxa.jp/ALOS/en/dataset/lulc/lulc_v2312_e.htm (2023).
Zeman, M. Thin-film silicon PV technology. J. Electr. Eng. 61, 271–276 (2010).
Google Scholar 
Fraunhofer Institute for Solar Energy Systems. Photovoltaics Report. https://web.archive.org/web/20231228050828/https://www.ise.fraunhofer.de/content/dam/ise/de/documents/publications/studies/Photovoltaics-Report.pdf (2023).
Willmer, J. N. G., Püttker, T. & Prevedello, J. A. Global impacts of edge effects on species richness. Biol. Conserv. 272, 109654 (2022).
Article  Google Scholar 
Su, G., Okahashi, H. & Chen, L. Spatial pattern of farmland abandonment in Japan: Identification and determinants. Sustainability 10, 3676 (2018).
Article  Google Scholar 
Adeh, E. H., Good, S. P., Calaf, M. & Higgins, C. W. Solar PV power potential is greatest over croplands. Sci. Rep. 9, 11442 (2019).
Article  ADS  PubMed  PubMed Central  Google Scholar 
Osamu, S. Environmental and economic scenario analysis of the redundant golf courses in Japan. In Proceedings of Annual Meeting of Environmental Systems Research, 73–80 (Hino-city, Japan, 2009).
Zhang, X., Zeraatpisheh, M., Rahman, M. M., Wang, S. & Xu, M. Texture is important in improving the accuracy of mapping photovoltaic power plants: A case study of Ningxia Autonomous Region, China. Remote Sens. (Basel) 13, 3909 (2021).
Article  ADS  Google Scholar 
Nazir, M. S. et al. Optimized economic operation of energy storage integration using improved gravitational search algorithm and dual stage optimization. J. Energy Storage 50, 104591 (2022).
Article  Google Scholar 
Liu, Q. et al. Multi-strategy adaptive guidance differential evolution algorithm using fitness-distance balance and opposition-based learning for constrained global optimization of photovoltaic cells and modules. Appl. Energy 353, 122032 (2024).
Article  Google Scholar 
Schiavina, M., Freire, S., Alessandra Carioli & MacManus, K. GHS-POP R2023A—GHS Population Grid Multitemporal (1975–2030). http://data.europa.eu/89h/2ff68a52-5b5b-4a22-8f40-c41da8332cfe (European Commission, Joint Research Centre (JRC),2023).
Anze, Z. Improving Cloud Detection with Machine Learning. http://web.archive.org/web/20221130122809/https://medium.com/sentinel-hub/improving-cloud-detection-with-machine-learning-c09dc5d7cf13 (2017).
Jw, R., Rh, H., Ja, S. & Dw, D. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ 351, 309 (1974).
Google Scholar 
Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 27, 3025–3033 (2006).
Article  Google Scholar 
Shimada, S. & Takeuchi, W. A machine-learning based scheme for solar PV detection using medium-resolution satellite images in Vietnam. In IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium 255–258. https://doi.org/10.1109/IGARSS46834.2022.9884162 (IEEE, 2022).
Ji, C. et al. Solar photovoltaic module detection using laboratory and airborne imaging spectroscopy data. Remote Sens. Environ. 266, 112692 (2021).
Article  PubMed  PubMed Central  Google Scholar 
Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. https://doi.org/10.1145/2939672.2939785 (2016).
Singh, R., Biswas, M. & Pal, M. Cloud detection using sentinel 2 imageries: A comparison of XGBoost, RF, SVM, and CNN algorithms. Geocarto Int. https://doi.org/10.1080/10106049.2022.2146211 (2022).
Article  Google Scholar 
Park, J., Lee, Y. & Lee, J. Assessment of machine learning algorithms for land cover classification using remotely sensed data. Sens. Mater. 33, 3885 (2021).
Google Scholar 
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework (2019).
Hashim, H., AbdLatif, Z. & Adnan, N. A. Urban vegetation classification with ndvi threshold value method with very high resolution (VHR) Pleiades imagery. Int. Arch. Photogram. Remote. Sens. Spatial Inf. Sci. XLII-4/W16, 237–240 (2019).
Google Scholar 
Felzenszwalb, P. F. & Huttenlocher, D. P. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181 (2004).
Article  MATH  Google Scholar 
Ministry of Land, I. T. and T. Digital National Land Information. https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-A16-v2_3.html.
Sharma, V. & Chandel, S. S. Performance and degradation analysis for long term reliability of solar photovoltaic systems: A review. Renew. Sustain. Energy Rev. 27, 753–767 (2013).
Article  Google Scholar 
Sakamura, K., Kaneko, T., Nakai, N. & Numata, M. Study on the site character of the ground standing photovoltaic power generating system. J. City Plan. Inst. Jpn. 49, 633–638 (2014).
Google Scholar 
Ma, J., Li, J., Wu, W. & Liu, J. Global forest fragmentation change from 2000 to 2020. Nat. Commun. 14, 3752 (2023).
Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 
Bosch, M. PyLandStats: An open-source Pythonic library to compute landscape metrics. PLoS ONE 14, e0225734 (2019).
Article  CAS  PubMed  PubMed Central  Google Scholar 
Moran, P. A. P. Notes on sthocastic phenomena. Biometrika 37, 17–23 (1950).
Article  MathSciNet  CAS  PubMed  Google Scholar 
Anselin, L. Local indicators of spatial association—LISA. Geogr. Anal. 27, 93–115 (1995).
Article  Google Scholar 
Rey, S. J. & Anselin, L. PySAL: A Python library of spatial analytical methods. In Handbook of Applied Spatial Analysis (ed. Anu, A.) 175–1934 (Springer, 2010). https://doi.org/10.1007/978-3-642-03647-7_11.
Chapter  Google Scholar 
Lundberg, S. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions (2017).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article  PubMed  PubMed Central  Google Scholar 
Download references
This research was partially funded by JST SPRING, Grant Number JPMJSP2108. Three AI assisted tools, Grammarly, ChatGPT, and Google translate were utilized to improve the grammatical quality of the manuscript.
Institute of Industrial Science, The University of Tokyo, Meguro-ku, 153-8505, Japan
Shoki Shimada & Wataru Takeuchi
PubMed Google Scholar
PubMed Google Scholar
S.S designed and performed analysis. S.S wrote and edited the original manuscript. W.T reviewed and supervised the research framework. W.T reviewed the original manuscript. S.S wrote the revised manuscript. W.T reviewed the revised manuscript.
Correspondence to Shoki Shimada.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Shimada, S., Takeuchi, W. Satellite-based analysis uncovers uneven solar PV distribution across Japan and its consumption of forest and agricultural lands. Sci Rep 15, 26671 (2025). https://doi.org/10.1038/s41598-025-11222-4
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-11222-4
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
Transactions on Electrical and Electronic Materials (2025)
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2025 Springer Nature Limited
Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

source

This entry was posted in Renewables. Bookmark the permalink.

Leave a Reply