From Corpus to Innovation: Advancing Organic Solar Cell Design with Large Language Models – Nature

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
npj Computational Materials , Article number:  (2025)
1707 Accesses
Metrics details
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Advances in machine learning have transformed materials discovery, yet challenges remain due to the lack of informatics-ready data and the complexity of numerical descriptors. Scientific knowledge is scattered across publications, making comprehensive data extraction difficult. This study presents a large language model (LLM)-driven framework to accelerate organic solar cell (OSC) materials discovery by extracting structured data from literature and predicting device performance using natural language embeddings. Trained on a curated dataset of 422 OSC devices, the fine-tuned LLM demonstrated strong predictive accuracy across key performance metrics: power conversion efficiency (PCE, R2: 0.87), short-circuit current (JSC, R2: 0.82), open-circuit voltage (VOC, R2: 0.89), and fill factor (FF, R2: 0.59). The models are then used to explore the space of 1.4 million combinations of materials, experimental variables and device architectures. The analysis provides data-driven design guidelines, identifying optimal donor-acceptor combinations and processing conditions that consistently yield higher device performance.
The training dataset used in this study is publicly available at OPVPerfPredictor GitHub repository. The same repository also provides the fine-tuned model weights and a script to predict device performance for new cases.
Constantinou, S., Al-naemi, F., Alrashidi, H., Mallick, T. & Issa, W. A review on technological and urban sustainability perspectives of advanced building-integrated photovoltaics. Energy Sci. Eng. 12, 1265–1293 (2024).
Google Scholar 
Solak, E. K. & Irmak, E. Advances in organic photovoltaic cells: a comprehensive review of materials, technologies, and performance. RSC Adv. 13, 12244–12269 (2023).
Google Scholar 
Tripathy, M., Sadhu, P. & Panda, S. A critical review on building integrated photovoltaic products and their applications. Renew. Sustain. Energy Rev. 61, 451–465 (2016).
Google Scholar 
Riede, M., Spoltore, D. & Leo, K. Organic solar cells-the path to commercial success. Adv. Energy Mater. 11, 2002653 (2021).
Google Scholar 
Cheng, P. & Zhan, X. Stability of organic solar cells: challenges and strategies. Chem. Soc. Rev. 45, 2544–2582 (2016).
Google Scholar 
Qi, B. & Wang, J. Open-circuit voltage in organic solar cells. J. Mater. Chem. 22, 24315–24325 (2012).
Google Scholar 
Ma, H., Yip, H.-L., Huang, F. & Jen, A. K.-Y. Interface engineering for organic electronics. Adv. Funct. Mater. 20, 1371–1388 (2010).
Google Scholar 
Gao, H. et al. Recent progress in all-small-molecule organic solar cells. Small 19, 2205594 (2023).
Google Scholar 
Speller, E. M. et al. From fullerene acceptors to non-fullerene acceptors: prospects and challenges in the stability of organic solar cells. J. Mater. Chem. A 7, 23361–23377 (2019).
Google Scholar 
Zhang, Y., Lang, Y. & Li, G. Recent advances of non-fullerene organic solar cells: From materials and morphology to devices and applications. EcoMat 5, e12281 (2023).
Google Scholar 
Zhao, F., Wang, C. & Zhan, X. Morphology control in organic solar cells. Adv. Energy Mater. 8, 1703147 (2018).
Google Scholar 
Kim, S. W. et al. Synergistic effects of terpolymer regioregularity on the performance of all-polymer solar cells. Macromolecules 52, 738–746 (2019).
Google Scholar 
Zhong, H. et al. A regioregular conjugated polymer for high performance thick-film organic solar cells without processing additive. J. Mater. Chem. A 5, 10517–10525 (2017).
Google Scholar 
Woo, C. H., Thompson, B. C., Kim, B. J., Toney, M. F. & Fréchet, J. M. J. The influence of poly(3-hexylthiophene) regioregularity on fullerene-composite solar cell performance. J. Am. Chem. Soc. 130, 16324–16329 (2008).
Google Scholar 
Mori, H., Hara, S., Nishinaga, S. & Nishihara, Y. Solar cell performance of phenanthrodithiophene-isoindigo copolymers depends on their thin-film structure and molecular weight. Macromolecules 50, 4639–4648 (2017).
Google Scholar 
Ma, P. et al. Optimization of PDTS-DTffBT-based solar cell performance through control of polymer molecular weight. J. Phys. Chem. C. 120, 19513–19520 (2016).
Google Scholar 
Hu, H. et al. Over 19% efficiency organic solar cells enabled by manipulating the intermolecular interactions through side chain fluorine functionalization. Angew. Chem. Int. Ed.e202400086 (2024).
Zhang, T. et al. Trifluoro alkyl side chains in the non-fullerene acceptors to optimize the phase miscibility and vertical distribution of organic solar cells. J. Mater. Chem. A 10, 8837–8845 (2022).
Google Scholar 
Wang, J.-L. et al. Difluorobenzothiadiazole-based small-molecule organic solar cells with 8.7% efficiency by tuning of π-conjugated spacers and solvent vapor annealing. Adv. Funct. Mater. 26, 1803–1812 (2016).
Google Scholar 
Li, W. et al. Mobility-controlled performance of thick solar cells based on fluorinated copolymers. J. Am. Chem. Soc. 136, 15566–15576 (2014).
Google Scholar 
Schwaiger, D. M., Lohstroh, W. & Müller-Buschbaum, P. The influence of the blend ratio, solvent additive, and post-production treatment on the polymer dynamics in ptb7:pcbm blend films. Macromolecules 54, 6534–6542 (2021).
Google Scholar 
Guo, S. et al. Solvent-morphology-property relationship of PTB7:PC71BM polymer solar cells. ACS Appl. Mater. Interfaces 9, 3740–3748 (2017).
Google Scholar 
Wang, J. & Liang, Z. Synergetic solvent engineering of film nanomorphology to enhance planar perylene diimide-based organic photovoltaics. ACS Appl. Mater. Interfaces 8, 22418–22424 (2016).
Google Scholar 
Guo, S. et al. Influence of solvent and solvent additive on the morphology of ptb7 films probed via x-ray scattering. J. Phys. Chem. B 118, 344–350 (2014).
Google Scholar 
Xie, Y. et al. Post-annealing to recover the reduced open-circuit voltage caused by solvent annealing in organic solar cells. J. Mater. Chem. A 4, 6158–6166 (2016).
Google Scholar 
Harreiß, C. et al. Understanding and controlling the evolution of nanomorphology and crystallinity of organic bulk-heterojunction blends with solvent vapor annealing. Sol. RRL 6, 2200127 (2022).
Google Scholar 
Lilliu, S. et al. Dynamics of crystallization and disorder during annealing of p3ht/pcbm bulk heterojunctions. Macromolecules 44, 2725–2734 (2011).
Google Scholar 
Verploegen, E. et al. Effects of thermal annealing upon the morphology of polymer-fullerene blends. Adv. Funct. Mater. 20, 3519–3529 (2010).
Google Scholar 
Wang, K., Liu, C., Meng, T., Yi, C. & Gong, X. Inverted organic photovoltaic cells. Chem. Soc. Rev. 45, 2937–2975 (2016).
Google Scholar 
Wang, G., Adil, M. A., Zhang, J. & Wei, Z. Large-area organic solar cells: Material requirements, modular designs, and printing methods. Adv. Mater. 31, 1805089 (2019).
Google Scholar 
Zhang, B., Yang, F. & Li, Y. Recent progress in large-area organic solar cells. Small Sci. 3, 2300004 (2023).
Google Scholar 
Jin, Y. et al. A novel naphtho[1,2-c:5,6-c’]bis([1,2,5]thiadiazole)-based narrow-bandgap π-conjugated polymer with power conversion efficiency over 10%. Adv. Mater. 28, 9811–9818 (2016).
Google Scholar 
Jin, Y. et al. Thick film polymer solar cells based on naphtho[1,2-c:5,6-c]bis[1,2,5]thiadiazole conjugated polymers with efficiency over 11%. Adv. Energy Mater. 7, 1700944 (2017).
Google Scholar 
Shetty, P., Adeboye, A., Gupta, S., Zhang, C. & Ramprasad, R. Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing. Chem. Mater. 36, 7676–7689 (2024).
Google Scholar 
Sahu, H., Rao, W., Troisi, A. & Ma, H. Toward predicting efficiency of organic solar cells via machine learning and improved descriptors. Adv. Energy Mater. 8, 1801032 (2018).
Google Scholar 
Padula, D., Simpson, J. D. & Troisi, A. Combining electronic and structural features in machine learning models to predict organic solar cells properties. Mater. Horiz. 6, 343–349 (2019).
Google Scholar 
Mahmood, A., Irfan, A. & Wang, J.-L. Machine learning and molecular dynamics simulation-assisted evolutionary design and discovery pipeline to screen efficient small molecule acceptors for PTB7-th-based organic solar cells with over 15% efficiency. J. Mater. Chem. A 10, 4170–4180 (2022).
Google Scholar 
Sun, W. et al. Machine learning-assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 5, eaay4275 (2019).
Google Scholar 
Lee, M.-H. Predicting and analyzing the fill factor of non-fullerene organic solar cells based on material properties and interpretable machine-learning strategies. Sol. Energy 267, 112191 (2024).
Google Scholar 
Zhang, C.-R. et al. Machine learning study on organic solar cells and virtual screening of designed non-fullerene acceptors. J. Appl. Phys. 134, 153104 (2023).
Google Scholar 
Suthar, R., T, A. & Karak, S. Machine-learning-guided prediction of photovoltaic performance of non-fullerene organic solar cells using novel molecular and structural descriptors. J. Mater. Chem. A 11, 22248–22258 (2023).
Google Scholar 
Huang, D. et al. A machine learning prediction model for quantitative analyzing the influence of non-radiative voltage loss on non-fullerene organic solar cells. Chem. Eng. J. 475, 145958 (2023).
Google Scholar 
Sahu, H. et al. Designing promising molecules for organic solar cells via machine learning assisted virtual screening. J. Mater. Chem. A 7, 17480–17488 (2019).
Google Scholar 
Abbasi Jannat Abadi, E., Sahu, H., Javadpour, S. M. & Goharimanesh, M. Interpretable machine learning for developing high-performance organic solar cells. Mater. Today Energy 25, 100969 (2022).
Google Scholar 
Zhang, S. et al. Deep learning-assisted design of novel donor-acceptor combinations for organic photovoltaic materials with enhanced efficiency. Adv. Mater. 37, 2407613 (2025).
Google Scholar 
Sun, J. et al. Accelerating the discovery of acceptor materials for organic solar cells by deep learning. npj Comput. Mater. 10, 181 (2024).
Google Scholar 
Wang, K. et al. Design of experiments with the support of machine learning for process parameter optimization of all-small-molecule organic solar cells. FlexMat 1, 234–247 (2024).
Google Scholar 
Polak, M. P. & Morgan, D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nat. Commun. 15, 1569 (2024).
Google Scholar 
Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15, 1418 (2024).
Google Scholar 
Gupta, S., Mahmood, A., Shetty, P., Adeboye, A. & Ramprasad, R. Data extraction from polymer literature using large language models. Commun. Mater. 5, 269 (2024).
Google Scholar 
Agarwal, S., Mahmood, A. & Ramprasad, R. Polymer solubility prediction using large language models. ACS Mater. Lett. 7, 2017–2023 (2025).
Google Scholar 
Gupta, S., Mahmood, A., Shukla, S. & Ramprasad, R. Benchmarking large language models for polymer property predictions, Macromol. Rapid Commun. e00388 (2025).
Zhang, P., Zeng, G., Wang, T. & Lu, W. Tinyllama: An open-source small language model (2024). https://arxiv.org/abs/2401.02385.
Hu, E. J. et al. Lora: Low-rank adaptation of large language models https://arxiv.org/abs/2106.09685 (2021).
Zhang, X. et al. High fill factor organic solar cells with increased dielectric constant and molecular packing density. Joule 6, 444–457 (2022).
Google Scholar 
Qi, B. & Wang, J. Fill factor in organic solar cells. Phys. Chem. Chem. Phys. 15, 8972–8982 (2013).
Google Scholar 
Jao, M.-H., Liao, H.-C. & Su, W.-F. Achieving a high fill factor for organic solar cells. J. Mater. Chem. A 4, 5784–5801 (2016).
Google Scholar 
Yang, Y. The original design principles of the y-series nonfullerene acceptors, from y1 to y6. ACS Nano 15, 18679–18682 (2021).
Google Scholar 
Duan, L. & Uddin, A. Progress in stability of organic solar cells. Adv. Sci. 7, 1903259 (2020).
Google Scholar 
Chander, N., Singh, S. & Iyer, S. S. K. Stability and reliability of p3ht:pc61bm inverted organic solar cells. Sol. Energy Mater. Sol. Cells 161, 407–415 (2017).
Google Scholar 
Wang, W. et al. In operando morphology investigation of inverted bulk heterojunction organic solar cells by GISAXS. J. Mater. Chem. A 3, 8324–8331 (2015).
Google Scholar 
Shetty, P. et al. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Comput. Mater. 9, 52 (2023).
Google Scholar 
Grattafiori, A. et al. The llama 3 herd of models https://arxiv.org/abs/2407.21783 (2024).
Sahu, H. & Ma, H. Unraveling correlations between molecular properties and device parameters of organic solar cells using machine learning. J. Phys. Chem. Lett. 10, 7277–7284 (2019).
Google Scholar 
Nagasawa, S., Al-Naamani, E. & Saeki, A. Computer-aided screening of conjugated polymers for organic solar cell: Classification by random forest. J. Phys. Chem. Lett. 9, 2639–2646 (2018).
Google Scholar 
Download references
This work was supported by the Office of Naval Research through grants N00014-19-1-2103 and N00014-20-1-2175.
School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Dr NW, Atlanta, 30332, GA, USA
Harikrishna Sahu, Akhlak Mahmood, Labeeba B. Shafique & Rampi Ramprasad
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
H.S. was the primary architect of the dataset, models, and screening workflow, and wrote the manuscript. A.M. contributed to the development of the data extraction pipeline. L.S. assisted with data collection. R.R. conceived the project, provided overall guidance, and supervised the work.
Correspondence to Rampi Ramprasad.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Sahu, H., Mahmood, A., Shafique, L.B. et al. From Corpus to Innovation: Advancing Organic Solar Cell Design with Large Language Models. npj Comput Mater (2025). https://doi.org/10.1038/s41524-025-01896-9
Download citation
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524-025-01896-9
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
npj Computational Materials (npj Comput Mater)
ISSN 2057-3960 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

source

This entry was posted in Renewables. Bookmark the permalink.

Leave a Reply