Abstract

With the exponential growth of computing power and accumulation of embryo image data in recent years, artificial intelligence (AI) is starting to be utilized in embryo selection in IVF. Amongst different AI technologies, machine learning (ML) has the potential to reduce operator-related subjectivity in embryo selection while saving labor time on this task. However, as modern deep learning (DL) techniques, a subcategory of ML, are increasingly used, its integrated black-box attracts growing concern owing to the well-recognized issues regarding lack of interpretability. Currently, there is a lack of randomized controlled trials to confirm the effectiveness of such black-box models. Recently, emerging evidence has shown underperformance of black-box models compared to the more interpretable traditional ML models in embryo selection. Meanwhile, glass-box AI, such as interpretable ML, is being increasingly promoted across a wide range of fields and is supported by its ethical advantages and technical feasibility. In this review, we propose a novel classification system for traditional and AI-driven systems from an embryology standpoint, defining different morphology-based selection approaches with an emphasis on subjectivity, explainability, and interpretability.

A proposed classification system for artificial intelligence embryo selection models with different subjectivity, interpretability, and explainability.
Graphical Abstract

A proposed classification system for artificial intelligence embryo selection models with different subjectivity, interpretability, and explainability.

Introduction

Optimum embryo selection is critical in assisting patients undergoing IVF treatment to achieve pregnancy in the least number of transfers possible (Gardner et al., 2015). However, a well-recognized challenge in morphology-based embryo selection is inter- and intra-operator subjectivity (Sundvall et al., 2013; Storr et al., 2017; Bormann et al., 2020). The clinical application of time-lapse videography (TLV) in IVF enables identification of novel biomarkers for embryo selection, with potentially enhanced performance when coupled with artificial intelligence (AI) (Liu et al., 2020a; Riegler et al., 2021). Machine learning (ML) is a subgroup of AI offering the potential to minimize operator-associated subjectivity and improve embryo selection. In the past few years, there has been a boom in the clinical application of deep learning (DL) in embryo selection (Curchoe and Bormann, 2019). DL is a subgroup of ML and is predominantly based on artificial neural networks with multiple hidden layers. These hidden layers, however, make its reasoning process uninterpretable, giving rise to an alternative term for DL—the black-box (Tu, 1996). As a result, this lack of transparency attracts growing ethical and societal concerns from IVF professionals (Afnan et al., 2021).

The pursuit for AI-driven applications in modern human society is not dissimilar from the old day’s ‘gold rush’ and is fraught with risks. With the effectiveness of most AI-driven decision support systems (DSSs) in embryo selection yet to be validated via randomized controlled trials (RCTs), the sacrifice of interpretability cannot be justified (Afnan et al., 2021). A recent methodology comparison of 12 algorithms developed for blastocyst viability prediction revealed that logistic regression, as an interpretable ML method, outperformed the other 11 ML methods, including its black-box counterparts (Bamford et al., 2023). This is a timely reminder for IVF professionals to perhaps decelerate from the black-box hype and consider alternative AI-driven DSSs for embryo selection, so that proper guardrails can be established under a stronger regulatory framework. To facilitate appropriate differentiation of AI decision tools, this mini-review aims to propose a novel classification system for traditional and AI-driven DSSs for morphology-based embryo selection, by considering each category’s embryo annotation and ranking steps. AI-driven DSSs are further classified into black-box, glass-box, and matte-box subgroups with a focus on their overall interpretability, explainability, and subjectivity (Fig. 1).

A proposed classification system for artificial intelligence embryo selection models with different subjectivity, interpretability, and explainability.
Figure 1.

A proposed classification system for artificial intelligence embryo selection models with different subjectivity, interpretability, and explainability.

Traditional embryo selection

The two steps in traditional embryo selection

Historically, traditional embryo selection utilizes two distinct sequential steps, namely annotation of the raw image(s) of the embryo, followed by ranking amongst a cohort of embryos. Accurate annotation enables extraction of biologically meaningful variables, either qualitative or quantitative (Liu et al., 2020a). Annotation can be derived from both static images via microscopic observations at defined times during embryo culture (Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology, 2011) and morphokinetic data (i.e. TLV-based images captured over time in culture, which can be manually generated). However, static assessment does not allow annotation of quantitative TLV parameters such as time of the first cell cleavage to a two-cell embryo, duration of cell cycles, and start of blastulation (Ciray et al., 2014). Subsequently, a ranking model is built by incorporating all contributing variables through reasoning, supported by either in-house studies or evidence in the literature (Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology, 2011; Meseguer et al., 2011). In traditional embryo selection, subjectivity issues are expected to exist in both the annotation (the assessing embryologist’s identification of developmental events and/or grading of specific morphological features) and ranking steps (prognostic weightings amongst such features that the assessing embryologist can see), despite its high interpretability (Fig. 1: ‘Traditional’ column).

Traditional embryo selection using static images

Morphology-based embryo selection with standard incubation has historically been the predominant method of traditional embryo selection. In cleavage-stage embryos, cell number, level of fragmentation and the presence of multi-nucleation can be assessed (Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology, 2011), and are all well-recognized to be highly subjective considering the quality of recording from a static observation and operator subjectivity. The Gardner blastocyst grading system is a popular traditional approach and combines extended culture and static morphological grading of expansion stage, inner cell mass (ICM), and trophectoderm (TE) (Gardner and Schoolcraft, 1999). Moderate to high levels of inter- and intra-operator consistencies are observed when using this system (Sundvall et al., 2013; Storr et al., 2017).

Although the accuracy of pregnancy prediction is lower in traditional morphology-based embryo selection compared to AI-driven DSSs (VerMilyea et al., 2020; Chavez-Badiola et al., 2020a; Loewke et al., 2022), it is a globally well-established protocol that involves minimum investment and training for new embryologists. Development of advanced models (i.e. AI-driven DSSs as discussed in below sections) may require intellectual statistical input and/or capital investment if commercial third parties are adopted. Therefore, many clinics may opt to persist with morphology-only based methods of embryo selection.

Traditional embryo selection using TLV images

The clinical introduction of TLV offers potentially decreased inter-operator subjectivity in the annotation step in comparison to static observation (Scott et al., 2021), while also enabling identification of novel viability biomarkers (Liu et al., 2020b; Bickendorf et al., 2023). Compared to static morphology, the subjectivity of the ranking step in traditional embryo selection using TLV can be further reduced by use of an algorithm with a defined set of rules to determine the most viable embryo. Different approaches to achieve this include the incorporation of manually annotated embryonic features to generate simple manual decision trees (Meseguer et al., 2011) (Fig. 1: ‘Traditional’ column). However, the subjective nature of manual annotation and the limited number of viability biomarkers available in this category form a bottleneck for further improvement in embryo selection. AI-driven DSSs have the potential to leverage such challenges through reduced subjectivity and ability to assist in defining more biomarkers. Furthermore, AI-driven DSSs also have the potential to detect discrete or novel variables that are otherwise undetectable by the human eye.

AI-driven DSSs

In modern AI methodology, an important distinction to make is that DL or black-box methods are a subcategory of ML, which falls within the broader umbrella of AI (Fig. 1). In the field of embryo selection, the majority of black-box methods utilize neural networks such as convolutional neural networks (Dimitriadis et al., 2022). Black-box DSSs, as defined in this article, receive raw imaging data as input, often in video form, to directly predict pregnancy without defining any embryonic features (cleavage timings, abnormal cleavage, etc.) (Fig. 1: ‘Black-box DSS’ column). In addition to black-box DSSs (one-step process), black-box methods can also be included in two-step DSSs by automatically annotating known embryonic features (such as morphokinetics or morphology, rather than directly predicting pregnancy) in the first step of the two-step process; and/or ranking embryos using annotated embryonic features in the second step. Therefore, glass-box and matte-box DSSs also involve black-box methods in one or both steps of their two-step process while still enabling sense checking to some extent. To elaborate on two-step AI-driven DSSs, a matte-box DSS with automatic annotation of embryonic features can combine a black-box first step and a ranking second step that also uses black-box methods (matte-box with automatic annotation, Fig. 1). Imaging data can also be manually annotated for embryonic features first, followed by a ranking step using black-box methods to predict pregnancy potential (matte-box DSS with manual annotation). Embryo ranking black-box methods contrast with more interpretable ML methods, which have clear constraints and allow an understanding of how a model reached its output. Glass-box DSSs have an embryo ranking step conducted by interpretable ML methods and can receive either manually annotated (glass-box DSS with manual annotation) or automatically annotated embryonic features (glass-box DSS with automatic annotation). Interpretable ML methods (those that do not involve black-box methods in the ranking step) are mostly statistics oriented, such as logistic regression, decision trees, random forest, and Bayesian networks.

Black-box DSS and its associated issues

With the rising hopes for, and accelerating popularity of, AI applications in embryo selection (Curchoe and Bormann, 2019), black-box DSSs have dominated recent literature in embryo selection (Dimitriadis et al., 2022). The recent application of many black-box DSSs has merged the two-step embryo selection process and hence removed its inherited interpretability and explainability. Interpretability can be defined as ‘the ability to explain or present in understandable terms to a human’ (Doshi-Velez and Kim, 2017) or ‘the degree to which a human can understand the cause of a decision’ (Miller, 2019). In other words, the more interpretable a model is, the easier it is to identify cause-and-effect relationships within its inputs and outputs (Linardatos et al., 2020). For example, an ML model built on linear regression is interpretable where the trained model consists of an equation with positive or negative coefficients multiplying the values of relevant covariates. Although it is often used interchangeably with interpretability, explainability has been defined as ‘the collection of features of the interpretable domain, that have contributed for a given example to produce a decision’ (Montavon et al., 2018). For example, a black-box model is considered explainable if the importance of the variables used as input can be quantified.

One influential black-box DSS was reported by Tran et al. (2019), showing a near perfect prediction (AUC >0.9) of fetal heart detection by raw TLV data as input. However, its unbalanced dataset (80% of the included embryos were not transferred) and the metrics used to measure performance were subsequently criticized by different groups (Kan-Tor et al., 2020; Chavez-Badiola et al., 2020b). The authors, however, have rightly pointed out in the ‘Discussion’ section that the poor understanding of the algorithm’s decision-making logic required further investigation (Tran et al., 2019). Different approaches were also reported to predict embryo viability using static blastocyst images captured by light microscopy or different TLV devices (VerMilyea et al., 2020; Diakiw et al., 2022). However, if a long-term bias were to occur on the embryos selected via black-box DSS, it would be difficult to detect under such a black-box setup in the short term before it is perpetuated and further amplified over time, especially when a surrogate short-time end point, such as ploidy or fetal heart detection (rather than live birth or healthy birth), had been used for training (Afnan et al., 2021). For example, reports have shown sex-linked morphokinetic and morphometric differences in human zygotes/embryos (Bronet et al., 2015; Orevich et al., 2022; Kilbee et al., 2023). In populations where individuals are selected by algorithms involving such features, this could lead to downstream effects. Indeed, while a high-quality large dataset is recognized as an important determinant of performance and robustness in black-box DSSs (Bormann et al., 2020; Curchoe et al., 2020; Hickman et al., 2020), more interpretable or explainable algorithms would have the intrinsic advantage of minimizing such risks by the early detection/correction of any foreseeable bias, as reasoning in the algorithm’s decisions can be pinpointed (Rudin, 2019). Therefore, considering the important role that IVF plays in human reproduction, we emphasize that embryo selection should be highly interpretable and explainable.

From a subjectivity standpoint, black-box AI embryo selection is advantageous over traditional methods (Fig. 1: ‘Black-box DSS’ column). Fordham et al. (2022) recently presented inferior reproducibility in blastocyst ranking by embryologists in reference to a purpose-built deep neural network model. Furthermore, a study by Loewke et al. (2022) reported a 5–12% improvement in clinical pregnancy rate via a convolutional neural network model compared to embryo selection by manual blastocyst ranking. Interestingly, the authors reported that the use of a low-quality microscope in one of the 11 participating clinics had failed to achieve optimization in clinical pregnancy rate, highlighting the importance to consider systematic factors on top of embryonic data (Loewke et al., 2022). Nevertheless, it is often challenging to label all known confounders during the training process of a black-box algorithm, let alone the unknown confounders. This further highlights the critical role that RCTs play in effective evaluation of any new intervention before routine clinical application.

Another important issue is that of transferability (Meseguer and Valera, 2021), where an algorithm that is trained under specific clinical settings does not perform well when applied to a different clinical environment (Petersen et al., 2016; Barrie et al., 2017; Liu et al., 2019; Johansen et al., 2023). An external algorithm trained elsewhere should only be introduced clinically following thorough in-house validation as different factors can affect embryo morphokinetic profiles and endometrial receptivity. Potential contributors to the transferability issue include: impacts of culture conditions (Zaninovic et al., 2013), culture media (van Duijn et al., 2022), patient population (Freour et al., 2013), and controlled ovarian stimulation regimes (Munoz et al., 2013). Commercial algorithms may benefit from access to larger datasets so that a more balanced transferability/performance can be achieved, though large high-quality datasets are difficult to obtain (Hickman et al., 2020). There are, however, alternative paths to address the transferability issue by utilizing more transferable qualitative TLV measures, such as abnormal cleavage (Liu et al., 2020a) or spontaneous blastocyst collapse (Bickendorf et al., 2023). Annotation for embryo abnormal cleavage patterns is still mostly performed manually by an embryologist, while newer reports, such as on blastocyst spontaneous collapse (Cimadomo et al., 2022), are starting to engage AI-powered tools to enable automated annotation. In fact, automated annotation is worth further attention considering its important role in both improvement of consistency and accuracy in annotating time-lapse images and maintaining better interpretability of embryo selection. This could be achieved by ensuring the two-step process is maintained as demonstrated in the matte-box and glass-box DSSs promoted in this article.

Matte-box DSSs (manual or automatic annotation followed by a black-box ranking step)

A key element of a matte-box DSS in embryo selection is the clear separation of the annotation and ranking steps; annotation is completed manually or automatically, followed by a black-box ranking step (Fig. 1: ‘Matte-box DSS’ columns). Such clear separation of the two steps in a matte-box DSS is preferrable over black-box DSSs owing to: its increased interpretability by involving defined embryonic features; availability of sense checking by users on the annotations; and the possibility of applying confidence limits. Using manual annotation, Bori et al. (2020) incorporated a dozen morphokinetic parameters, both conventional and novel, into an artificial neural network to predict clinical pregnancy. However, reports in the matte-box DSS category using combined automated annotation and black-box ranking are currently limited. There are claims that certain software is able to annotate a range of TLV measures but embryologist supervision/correction is often required. Therefore, a dividing line between manual and automated annotation would be unclear in such cases.

Regarding automated annotation independently, the automated segmentation of areas of interest in an embryo image has been one of the most studied areas, with the potential to reduce operator-related subjectivity. These include the automated segmentation of the ICM (Kheradmand et al., 2017; Rad et al., 2017), TE (Rad et al., 2020), zona pellucida thickness (Yee et al., 2013; Rad et al., 2018) or a combination of the three in a blastocyst (Filho et al., 2012; Farias et al., 2023). The use of TLV further boosted advancements in automated detection of pronuclei (Fukunaga et al., 2020) and subsequent embryo cleavage stages (Dirvanauskas et al., 2019; Raudonis et al., 2019). An important milestone study was reported by Feyeux et al. (2020), which automated the annotation of morphokinetic parameters ranging from early cleavage stages to the expanded blastocyst stage. This progress, however, has purely focused on the annotation aspect without involving any ranking step. More recently, Cimadomo et al. (2022) investigated blastocyst collapsing events in much detail by annotating start/end times of blastocyst collapse as well as the degree of blastocoel shrinkage. This study underlined blastocyst collapsing as a potential viability marker. However, the automated detection of abnormal cleavage events has progressed relatively slowly. Reverse cleavage is among these, which requires close tracking of both karyokinesis and cytokinesis activities (Liu et al., 2014). Furthermore, research in this field would expect heavy involvement of embryologist(s) to facilitate the data labeling process during training for such algorithms. Ground truth (i.e. embryologists’ consensus) used in automatic annotation studies is far from objective in comparison to fetal heart detection or live birth, which are mostly used. While automatic annotation can potentially reduce subjectivity, human associated confounders could still be rampant and thus have downstream effects in embryo ranking.

Glass-box DSSs (manual or automatic annotation followed by interpretable ML in the ranking step)

Image(s), static or TLV, in the glass-box DSS category can be manually or automatically evaluated, followed by a ranking step via interpretable ML methods such as Bayesian networks, ML decision trees or multi-variate logistic regression (Fig. 1: ‘Glass-box DSS’ columns). For the embryo ranking step, ML methods outside of DL are more interpretable, as they enable clear demonstration of variables involved and the weighting information of each of them. Bayesian networks, as a probabilistic graphical model, allow coherent inference by enabling calculation of probabilities for known variables in the network and have been used in embryo selection (Morales et al., 2008; Hernandez-Gonzalez et al., 2018). With a large multi-center dataset, Petersen et al. (2016) used an ML decision tree to rank embryos into five groups depending on their implantation potential. Decision trees are considered interpretable and explainable because of their tree-like structure and the way their decision rules provide insights into how the model reached its output.

Moreover, multi-variate logistic regressions also allow relative weightings of coefficients for a given outcome. As mentioned previously, the Gardner blastocyst grading system is considered a traditional embryo selection method. However, varying blastocyst observation timings between laboratories proved to be challenging to control when linking static morphology features to treatment outcomes, considering the dynamic nature of embryo development (Liu et al., 2022). Using a multi-variate logistic regression and incorporating calculated weightings for each contributing variable, the same study presented a numerical scoring system that could potentially minimize operator-related subjectivity at the ranking step (Liu et al., 2022). In the study, four variables were input into the logistic regression and coefficients were used as weightings for each of the variables. Therefore, owing to the calculated weightings, the model is interpretable and can result in less disagreement between embryologists. More recently, following manual morphokinetic annotation, Bamford et al. (2023) established that a logistic regression model was superior amongst 12 different models in blastocyst ploidy prediction. The other 11 models included black-box DSSs, reinforcing the fact that further performance comparisons amongst matte-box, glass-box, and black-box DSSs are warranted.

By incorporating automatically annotated features with minimized operator-related subjectivity, glass-box DSSs have the potential to achieve optimized interpretability and robustness. Using TLV, Khosravi et al. (2019) utilized deep neural networks to automate blastocyst classification against five experienced embryologists, which resulted in near perfect performance (AUC >0.98). In this study, an embryologist majority voting procedure (blastocyst quality determined by agreement from at least three of the five embryologists) was used to classify blastocyst images (Khosravi et al., 2019). This was followed by an ML decision tree to predict live birth.

A glass-box DSS was recently reported based on a day 3 embryo dataset, involving only two automatically annotated embryo morphokinetic measures, namely the durations of two- and three-cell stages (Valera et al., 2023). These measures were subsequently input into an ML decision-tree-based ranking model giving rise to five categories. This process could be replicated once all the parameters are automatically annotated. Future glass-box DSSs involving an expanded panel of automatically annotated embryonic parameters and interpretable ML models are warranted in the coming years, owing to their relative trade-off of interpretability, explainability, and subjectivity in comparison to other approaches.

Alternative paths to enhanced interpretability and explainability

Recently, attempts to understand how black-box DSSs arrive at their output from embryo imaging data have been made by different groups (Enatsu et al., 2022) and this is a way to potentially improve explainability (Curchoe et al., 2020). Studies have used the gradient-weighted class activation method (Enatsu et al., 2022; Loewke et al., 2022) or attribution algorithms, which include integrated gradients and occlusion maps (Loewke et al., 2022). Both techniques enable visualization of areas on the embryo that contribute most to the predicted outcome. Although these studies do not lead to complete interpretability because of their black-box nature, they provide important insights for future studies, especially when exploring novel biomarkers for embryo viability. Nevertheless, the absence of a clear border between the annotation and ranking steps hampers its development toward further interpretability.

Combining clinical parameters with embryo morphological data to train an algorithm is another way toward improved explainability. This is because such AI-driven DSSs combine a black-box DSS (one-step process) to analyze the embryos, and a more interpretable ML model to consider clinical values of the patient, to output the final embryo ranking. Enatsu et al. (2022) reported a hybrid AI-driven DSS, which combined static blastocyst image data and clinical parameters including female age, pregnancy history, hormonal levels, etc. The hybrid model resulted in a 4.5% AUC rise in prediction of clinical pregnancy compared to image data alone. Similarly, by using a black-box method and TLV data, Duval et al. (2023) demonstrated superior predicting performance in its hybrid model. Clinical pregnancy prediction was improved by further incorporating clinical parameters such as oocyte age, total gonadotrophin dose intake, number of embryos generated and endometrial thickness. Although the addition of meaningful clinical parameters in these algorithms elevates their overall interpretability and explainability, the embryonic elements in these models remain uninterpretable. Hybrid models highlight that the embryo alone may not be the best predictor of treatment outcome and the confounding impacts of clinical factors should also be considered.

Current challenges and future perspectives

In this article, we reviewed different embryo selection approaches from an embryology perspective. We included examples of published studies (Supplementary Table S1) to assist in defining each category and to differentiate between studies. There was, however, a skewed distribution of available studies in each category with the matte-box approaches being the least available. This is unfortunately not apparent in our table as we did not perform a systematic review. Based on our findings, we call for more research in the following areas: development of more glass-box DSSs to increase the interpretability of AI-driven DSSs in general; more evidence in performance evaluation between different methodologies (including approaches such as logistic regression); comprehensive performance metrics analyses when measuring algorithm performance, as proposed by Riegler et al. (2021); automation in TLV parameter annotation (preferably qualitative TLV measures for better transferability between laboratories) coupled with advanced ranking methods; RCTs to provide robust validation of any new algorithm; and creation of a publicly available open repository of a diverse, high-quality large embryo image dataset for algorithm development, validation and benchmarking, for improved standardization and robustness.

Conclusion

Traditional embryo selection involves two steps, namely embryo annotation (static image or morphokinetics) and ranking amongst a cohort of embryos, and is therefore highly interpretable although their intrinsic subjective nature hampers standardization. In black-box DSSs, the two-step border is absent. In glass-box and matte-box DSSs, the annotation step can either be manually completed by a human or automatically executed by black-box methods. Secondly, glass-box and matte-box DSSs use more interpretable ML and black-box methods, respectively, for embryo ranking. The performance advantage of black-box DSSs has recently been challenged and glass-box DSSs are more interpretable than black-box DSSs owing to the use of interpretable ML methods in the ranking step. Successful implementation of glass-box DSSs will require close interdisciplinary collaboration, where embryologists play a vital role not only in identifying more biologically meaningful embryonic features but also in assisting in the training of automatic annotation of these features. An increasing awareness of black-box issues amongst IVF professionals, patients, and software and device developers, as well as regulatory bodies, would foster a shift to more interpretable and explainable glass-box DSSs.

Data availability

All data generated in this study are included in the full text and Supplementary Table S1.

Acknowledgements

The authors thank the embryology team at Fertility North for their kind support on this work.

Authors’ roles

T.L. and Y.L. conceived this study and drafted the initial manuscript. T.L., J.N., V.C., and Y.L. contributed to the final manuscript and approved the final version for submission.

Funding

There is no external funding to report.

Conflict of interest

All authors report no conflict of interest.

References

Afnan
MAM
,
Liu
Y
,
Conitzer
V
,
Rudin
C
,
Mishra
A
,
Savulescu
J
,
Afnan
M.
Interpretable, not black-box, artificial intelligence should be used for embryo selection
.
Hum Reprod Open
2021
;
2021
:
hoab040
.

Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology
.
The Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting
.
Hum Reprod
2011
;
26
:
1270
1283
.

Bamford
T
,
Easter
C
,
Montgomery
S
,
Smith
R
,
Dhillon-Smith
RK
,
Barrie
A
,
Campbell
A
,
Coomarasamy
A.
A comparison of 12 machine learning models developed to predict ploidy, using a morphokinetic meta-dataset of 8147 embryos
.
Hum Reprod
2023
;
38
:
569
581
.

Barrie
A
,
Homburg
R
,
McDowell
G
,
Brown
J
,
Kingsland
C
,
Troup
S.
Examining the efficacy of six published time-lapse imaging embryo selection algorithms to predict implantation to demonstrate the need for the development of specific, in-house morphokinetic selection algorithms
.
Fertil Steril
2017
;
107
:
613
621
.

Bickendorf
K
,
Qi
F
,
Peirce
K
,
Natalwala
J
,
Chapple
V
,
Liu
Y.
Spontaneous collapse as a prognostic marker for human blastocysts: a systematic review and meta-analysis
.
Hum Reprod
2023
;
38
:
1891
1900
.

Bori
L
,
Paya
E
,
Alegre
L
,
Viloria
TA
,
Remohi
JA
,
Naranjo
V
,
Meseguer
M.
Novel and conventional embryo parameters as input data for artificial neural networks: an artificial intelligence model applied for prediction of the implantation potential
.
Fertil Steril
2020
;
114
:
1232
1241
.

Bormann
CL
,
Thirumalaraju
P
,
Kanakasabapathy
MK
,
Kandula
H
,
Souter
I
,
Dimitriadis
I
,
Gupta
R
,
Pooniwala
R
,
Shafiee
H.
Consistency and objectivity of automated embryo assessments using deep neural networks
.
Fertil Steril
2020
;
113
:
781
787 e781
.

Bronet
F
,
Nogales
MC
,
Martinez
E
,
Ariza
M
,
Rubio
C
,
Garcia-Velasco
JA
,
Meseguer
M.
Is there a relationship between time-lapse parameters and embryo sex?
Fertil Steril
2015
;
103
:
396
401 e392
.

Chavez-Badiola
A
,
Flores-Saiffe-Farias
A
,
Mendizabal-Ruiz
G
,
Drakeley
AJ
,
Cohen
J.
Embryo Ranking Intelligent Classification Algorithm (ERICA): artificial intelligence clinical assistant predicting embryo ploidy and implantation
.
Reprod Biomed Online
2020a
;
41
:
585
593
.

Chavez-Badiola
A
,
Mendizabal-Ruiz
G
,
Flores-Saiffe Farias
A
,
Garcia-Sanchez
R
,
Drakeley
AJ.
Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer
.
Hum Reprod
2020b
;
35
:
482
.

Cimadomo
D
,
Marconetto
A
,
Trio
S
,
Chiappetta
V
,
Innocenti
F
,
Albricci
L
,
Erlich
I
,
Ben-Meir
A
,
Har-Vardi
I
,
Kantor
B
et al.
Human blastocyst spontaneous collapse is associated with worse morphological quality and higher degeneration and aneuploidy rates: a comprehensive analysis standardized through artificial intelligence
.
Hum Reprod
2022
;
37
:
2291
2306
.

Ciray
HN
,
Campbell
A
,
Agerholm
IE
,
Aguilar
J
,
Chamayou
S
,
Esbert
M
,
Sayed
S
;
Time-Lapse User Group
.
Proposed guidelines on the nomenclature and annotation of dynamic human embryo monitoring by a time-lapse user group
.
Hum Reprod
2014
;
29
:
2650
2660
.

Curchoe
CL
,
Bormann
CL.
Artificial intelligence and machine learning for human reproduction and embryology presented at ASRM and ESHRE 2018
.
J Assist Reprod Genet
2019
;
36
:
591
600
.

Curchoe
CL
,
Malmsten
J
,
Bormann
C
,
Shafiee
H
,
Flores-Saiffe Farias
A
,
Mendizabal
G
,
Chavez-Badiola
A
,
Sigaras
A
,
Alshubbar
H
,
Chambost
J
et al.
Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
Fertil Steril
2020
;
114
:
934
940
.

Diakiw
SM
,
Hall
JMM
,
VerMilyea
MD
,
Amin
J
,
Aizpurua
J
,
Giardini
L
,
Briones
YG
,
Lim
AYX
,
Dakka
MA
,
Nguyen
TV
et al.
Development of an artificial intelligence model for predicting the likelihood of human embryo euploidy based on blastocyst images from multiple imaging systems during IVF
.
Hum Reprod
2022
;
37
:
1746
1759
.

Dimitriadis
I
,
Zaninovic
N
,
Badiola
AC
,
Bormann
CL.
Artificial intelligence in the embryology laboratory: a review
.
Reprod Biomed Online
2022
;
44
:
435
448
.

Dirvanauskas
D
,
Maskeliūnas
R
,
Raudonis
V
,
Damaševičius
R
,
Scherer
R.
Hemigen: human embryo image generator based on generative adversarial networks
.
Sensors
2019
;
19
:
3578
.

Doshi-Velez
F
,
Kim
B.
Towards a rigorous science of interpretable machine learning. arXiv:170208608,
2017
, preprint: not peer reviewed.

Duval
A
,
Nogueira
D
,
Dissler
N
,
Maskani Filali
M
,
Delestro Matos
F
,
Chansel-Debordeaux
L
,
Ferrer-Buitrago
M
,
Ferrer
E
,
Antequera
V
,
Ruiz-Jorro
M
et al.
A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems
.
Hum Reprod
2023
;
38
:
596
608
.

Enatsu
N
,
Miyatsuka
I
,
An
LM
,
Inubushi
M
,
Enatsu
K
,
Otsuki
J
,
Iwasaki
T
,
Kokeguchi
S
,
Shiotani
M.
A novel system based on artificial intelligence for predicting blastocyst viability and visualizing the explanation
.
Reprod Med Biol
2022
;
21
:
e12443
.

Farias
AF-S
,
Chavez-Badiola
A
,
Mendizabal-Ruiz
G
,
Valencia-Murillo
R
,
Drakeley
A
,
Cohen
J
,
Cardenas-Esparza
E.
Automated identification of blastocyst regions at different development stages
.
Sci Rep
2023
;
13
:
15
.

Feyeux
M
,
Reignier
A
,
Mocaer
M
,
Lammers
J
,
Meistermann
D
,
Barrière
P
,
Paul-Gilloteaux
P
,
David
L
,
Fréour
T.
Development of automated annotation software for human embryo morphokinetics
.
Hum Reprod
2020
;
35
:
557
564
.

Filho
ES
,
Noble
JA
,
Poli
M
,
Griffiths
T
,
Emerson
G
,
Wells
D.
A method for semi-automatic grading of human blastocyst microscope images
.
Hum Reprod
2012
;
27
:
2641
2648
.

Fordham
DE
,
Rosentraub
D
,
Polsky
AL
,
Aviram
T
,
Wolf
Y
,
Perl
O
,
Devir
A
,
Rosentraub
S
,
Silver
DH
,
Gold Zamir
Y
et al.
Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?
Hum Reprod
2022
;
37
:
2275
2290
.

Freour
T
,
Dessolle
L
,
Lammers
J
,
Lattes
S
,
Barriere
P.
Comparison of embryo morphokinetics after in vitro fertilization-intracytoplasmic sperm injection in smoking and nonsmoking women
.
Fertil Steril
2013
;
99
:
1944
1950
.

Fukunaga
N
,
Sanami
S
,
Kitasaka
H
,
Tsuzuki
Y
,
Watanabe
H
,
Kida
Y
,
Takeda
S
,
Asada
Y.
Development of an automated two pronuclei detection system on time‐lapse embryo images using deep learning techniques
.
Reprod Med Biol
2020
;
19
:
286
294
.

Gardner
DK
,
Meseguer
M
,
Rubio
C
,
Treff
NR.
Diagnosis of human preimplantation embryo viability
.
Hum Reprod Update
2015
;
21
:
727
747
.

Gardner
DK
,
Schoolcraft
WB.
Culture and transfer of human blastocysts
.
Curr Opin Obstet Gynecol
1999
;
11
:
307
311
.

Hernandez-Gonzalez
J
,
Inza
I
,
Crisol-Ortiz
L
,
Guembe
MA
,
Inarra
MJ
,
Lozano
JA.
Fitting the data from embryo implantation prediction: learning from label proportions
.
Stat Methods Med Res
2018
;
27
:
1056
1066
.

Hickman
CFL
,
Alshubbar
H
,
Chambost
J
,
Jacques
C
,
Pena
C-A
,
Drakeley
A
,
Freour
T.
Data sharing: using blockchain and decentralized data technologies to unlock the potential of artificial intelligence: what can assisted reproduction learn from other areas of medicine?
Fertil Steril
2020
;
114
:
927
933
.

Johansen
M
,
Kato
K
,
Ueno
S
,
Palm
S
,
Kernbach
M
,
Basak
B
,
Keles
I
,
Gabrielsen
A
,
Iversen
L
,
Berntsen
J.
O-242 comparing the performance of an artificial intelligence model for predicting embryo implantation between clinics with patient cohorts of different maternal age distributions
.
Hum Reprod
2023
;
38
:
dead093
-
296
.

Kan-Tor
Y
,
Ben-Meir
A
,
Buxboim
A.
Can deep learning automatically predict fetal heart pregnancy with almost perfect accuracy?
Hum Reprod
2020
;
35
:
1473
.

Kheradmand
S
,
Singh
A
,
Saeedi
P
,
Au
J
,
Havelock
J.
Inner cell mass segmentation in human hmc embryo images using fully convolutional network. In: 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China. Piscataway, New Jersey: Institute of Electrical and Electronic Engineers,
2017
,
1752
1756
.

Khosravi
P
,
Kazemi
E
,
Zhan
Q
,
Malmsten
JE
,
Toschi
M
,
Zisimopoulos
P
,
Sigaras
A
,
Lavery
S
,
Cooper
LAD
,
Hickman
C
et al.
Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization
.
NPJ Digit Med
2019
;
2
:
21
.

Kilbee
M
,
Del Canto Cano
A
,
Clark
G
,
Coudereau
C
,
Fernandez Marcos
E
,
Babariya
D
,
Wells
D.
O-004 Male and female blastocysts display differences in development such that embryos assigned an identical morphological grade may have differing viability dependent on their sex
.
Hum Reprod
2023
;
38
:
dead093
004
.

Linardatos
P
,
Papastefanopoulos
V
,
Kotsiantis
S.
Explainable AI: a review of machine learning interpretability methods
.
Entropy
2020
;
23
:
18
.

Liu
Y
, ,
Chapple
V
,
,
Feenan
K
,
,
Roberts
P
,
,
Matson
P.
Time-lapse deselection model for human day 3 in vitro fertilization embryos: the combination of qualitative and quantitative measures of embryo growth
.
Fertil Steril
2016
;
105
:
656
662.e1
.

Liu
Y
,
Chapple
V
,
Roberts
P
,
Matson
P.
Prevalence, consequence, and significance of reverse cleavage by human embryos viewed with the use of the embryoscope time-lapse video system
.
Fertil Steril
2014
;
102
:
1295
1300.e92
.

Liu
Y
,
Feenan
K
,
Chapple
V
,
Matson
P.
Assessing efficacy of day 3 embryo time-lapse algorithms retrospectively: impacts of dataset type and confounding factors
.
Hum Fertil (Camb)
2019
;
22
:
182
190
.

Liu
Y
,
Ong
K
,
Korman
I
,
Turner
R
,
Shaker
D
,
Zander-Fox
D
,
Rombauts
L.
The effect of day 5 blastocyst assessment timing on live birth prediction and development of a prediction algorithm
.
Reprod Biomed Online
2022
;
44
:
609
616
.

Liu
Y
,
Qi
F
,
Matson
P
,
Morbeck
DE
,
Mol
BW
,
Zhao
S
,
Afnan
M.
Between-laboratory reproducibility of time-lapse embryo selection using qualitative and quantitative parameters: a systematic review and meta-analysis
.
J Assist Reprod Genet
2020a
;
37
:
1295
1302
.

Liu
Y
,
Sakkas
D
,
Afnan
M
,
Matson
P.
Time-lapse videography for embryo selection/de-selection: a bright future or fading star?
Hum Fertil (Camb)
2020b
;
23
:
76
82
.

Loewke
K
,
Cho
JH
,
Brumar
CD
,
Maeder-York
P
,
Barash
O
,
Malmsten
JE
,
Zaninovic
N
,
Sakkas
D
,
Miller
KA
,
Levy
M
et al.
Characterization of an artificial intelligence model for ranking static images of blastocyst stage embryos
.
Fertil Steril
2022
;
117
:
528
535
.

Meseguer
M
,
Herrero
J
,
Tejera
A
,
Hilligsoe
KM
,
Ramsing
NB
,
Remohi
J.
The use of morphokinetics as a predictor of embryo implantation
.
Hum Reprod
2011
;
26
:
2658
2671
.

Meseguer
M
,
Valera
MA.
The journey toward personalized embryo selection algorithms
.
Fertil Steril
2021
;
115
:
898
899
.

Miller
T.
Explanation in artificial intelligence: insights from the social sciences
.
Artif Intell
2019
;
267
:
1
38
.

Montavon
G
,
Samek
W
,
Müller
K-R.
Methods for interpreting and understanding deep neural networks
.
Digit Signal Process
2018
;
73
:
1
15
.

Morales
DA
,
Bengoetxea
E
,
Larranaga
P
,
Garcia
M
,
Franco
Y
,
Fresnada
M
,
Merino
M.
Bayesian classification for the selection of in vitro human embryos using morphological and clinical data
.
Comput Methods Programs Biomed
2008
;
90
:
104
116
.

Munoz
M
,
Cruz
M
,
Humaidan
P
,
Garrido
N
,
Perez-Cano
I
,
Meseguer
M.
The type of GnRH analogue used during controlled ovarian stimulation influences early embryo developmental kinetics: a time-lapse study
.
Eur J Obstet Gynecol Reprod Biol
2013
;
168
:
167
172
.

Orevich
LS
,
Watson
K
,
Ong
K
,
Korman
I
,
Turner
R
,
Shaker
D
,
Liu
Y.
Morphometric and morphokinetic differences in the sperm- and oocyte-originated pronuclei of male and female human zygotes: a time-lapse study
.
J Assist Reprod Genet
2022
;
39
:
97
106
.

Petersen
BM
,
Boel
M
,
Montag
M
,
Gardner
DK.
Development of a generally applicable morphokinetic algorithm capable of predicting the implantation potential of embryos transferred on Day 3
.
Hum Reprod
2016
;
31
:
2231
2244
.

Rad
RM
,
Saeedi
P
,
Au
J
,
Havelock
J.
Coarse-to-fine texture analysis for inner cell mass identification in human blastocyst microscopic images. In: 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), Montreal, Canada. Piscataway, New Jersey: Institute of Electrical and Electronic Engineers,
2017
,
1
5
.

Rad
RM
,
Saeedi
P
,
Au
J
,
Havelock
J.
Human blastocyst's zona pellucida segmentation via boosting ensemble of complementary learning
.
Informatics Med Unlocked
2018
;
13
:
112
121
.

Rad
RM
,
Saeedi
P
,
Au
J
,
Havelock
J.
Trophectoderm segmentation in human embryo images via inceptioned U-Net
.
Med Image Anal
2020
;
62
:
101612
.

Raudonis
V
,
Paulauskaite-Taraseviciene
A
,
Sutiene
K
,
Jonaitis
D.
Towards the automation of early-stage human embryo development detection
.
Biomed Eng Online
2019
;
18
:
120
120
.

Riegler
MA
,
Stensen
MH
,
Witczak
O
,
Andersen
JM
,
Hicks
SA
,
Hammer
HL
,
Delbarre
E
,
Halvorsen
P
,
Yazidi
A
,
Holst
N
et al.
Artificial intelligence in the fertility clinic: status, pitfalls and possibilities
.
Hum Reprod
2021
;
36
:
2429
2442
.

Rudin
C.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
.
Nat Mach Intell
2019
;
1
:
206
215
.

Scott
N
,
Barrie
A
,
Smith
R
,
Best
L
,
Davis
N
,
Duffy
S
,
Krokos
S
,
Lodge
Y
,
Montgomery
S
,
O’Boyle
S.
P–792 the introduction of an embryo morphokinetics annotation quality assurance scheme across ten fertility clinics including 59 participants
.
Hum Reprod
2021
;
36
:
deab130
791
.

Storr
A
,
Venetis
CA
,
Cooke
S
,
Kilani
S
,
Ledger
W.
Inter-observer and intra-observer agreement between embryologists during selection of a single day 5 embryo for transfer: a multicenter study
.
Hum Reprod
2017
;
32
:
307
314
.

Sundvall
L
,
Ingerslev
HJ
,
Breth Knudsen
U
,
Kirkegaard
K.
Inter- and intra-observer variability of time-lapse annotations
.
Hum Reprod
2013
;
28
:
3215
3221
.

Tran
D
,
Cooke
S
,
Illingworth
PJ
,
Gardner
DK.
Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer
.
Hum Reprod
2019
;
34
:
1011
1018
.

Tu
JV.
Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes
.
J Clin Epidemiol
1996
;
49
:
1225
1231
.

Valera
MA
,
Aparicio-Ruiz
B
,
Perez-Albala
S
,
Romany
L
,
Remohi
J
,
Meseguer
M.
Clinical validation of an automatic classification algorithm applied on cleavage stage embryos: analysis for blastulation, euploidy, implantation, and live-birth potential
.
Hum Reprod
2023
;
38
:
1060
1075
.

van Duijn
L
,
Rousian
M
,
Kramer
CS
,
van Marion
ES
,
Willemsen
SP
,
Speksnijder
JP
,
Laven
JSE
,
Steegers-Theunissen
RPM
,
Baart
EB.
The impact of culture medium on morphokinetics of cleavage stage embryos: an observational study
.
Reprod Sci
2022
;
29
:
2179
2189
.

VerMilyea
M
,
Hall
J
,
Diakiw
S
,
Johnston
A
,
Nguyen
T
,
Perugini
D
,
Miller
A
,
Picou
A
,
Murphy
A
,
Perugini
M.
Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF
.
Hum Reprod
2020
;
35
:
770
784
.

Yee
D
,
Saeedi
P
,
Havelock
J.
An automatic model-based approach for measuring the zona pellucida thickness in day five human blastocysts. In: Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), Las Vegas, USA. Las Vegas, USA: CSREA Press,
2013
,
1
.

Zaninovic
N
,
Goldschlag
J
,
Yin
H
,
Ye
Z
,
Clarke
R
,
Rosenwaks
Z.
Impact of oxygen concentration on embryo development, embryo morphology and morphokinetics
.
Fertil Steril
2013
;
100
:
S240
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data