Characterizing hip joint morphology using a multitask deep learning model

Performance metrics of the DL model for detecting hip joint abnormalities.

Abnormality (%)	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	F1-score	AUROC
Ischial spine sign	87.2	66.7	96.0	87.5	87.2	0.757	0.89
CAM	78.0	80.0	73.9	86.4	64.2	0.831	0.80
Dysplasia	76.6	75.0	77.2	56.6	88.6	0.645	0.80
All abnormalities	71.6	67.3	85.3	93.5	45.3	0.783	0.81

Abnormality (%)	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	F1-score	AUROC
Ischial spine sign	87.2	66.7	96.0	87.5	87.2	0.757	0.89
CAM	78.0	80.0	73.9	86.4	64.2	0.831	0.80
Dysplasia	76.6	75.0	77.2	56.6	88.6	0.645	0.80
All abnormalities	71.6	67.3	85.3	93.5	45.3	0.783	0.81

Table 1.

Open in new tab Download slide

Performance metrics of the DL model for detecting hip joint abnormalities.

Abnormality (%)	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	F1-score	AUROC
Ischial spine sign	87.2	66.7	96.0	87.5	87.2	0.757	0.89
CAM	78.0	80.0	73.9	86.4	64.2	0.831	0.80
Dysplasia	76.6	75.0	77.2	56.6	88.6	0.645	0.80
All abnormalities	71.6	67.3	85.3	93.5	45.3	0.783	0.81

Abnormality (%)	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	F1-score	AUROC
Ischial spine sign	87.2	66.7	96.0	87.5	87.2	0.757	0.89
CAM	78.0	80.0	73.9	86.4	64.2	0.831	0.80
Dysplasia	76.6	75.0	77.2	56.6	88.6	0.645	0.80
All abnormalities	71.6	67.3	85.3	93.5	45.3	0.783	0.81

The receiver operating characteristic (ROC) curves for each abnormality, as shown in Fig. 1, illustrate the model’s strong performance. The model achieved an area under ROC (AUROC) of 0.89 for ischial spine sign, 0.80 for cam deformity, 0.80 for dysplasia, and 0.81 for all abnormalities, demonstrating the model’s ability to distinguish between the presence and absence of these characteristics. As sex was an auxiliary objective that was easy for the model to predict, it achieved an accuracy of 1.00 and an AUROC of 1.00 in predicting patients’ sex.

Figure 1.

ROC curves for the DL model in detecting various hip joint abnormalities.

Inter-rater agreement, assessed using Gwet’s AC1, was substantial for dysplasia (0.83) and all abnormalities (0.88), and moderate for ischial spine sign (0.75) and cam deformity (0.61).

Grad-CAM visualizations (Fig. 2) highlight the ROIs that the model focuses on when making predictions. These visualizations provide valuable insights into the model’s decision-making process and demonstrate its ability to identify relevant anatomical landmarks for each abnormality.

Figure 2.

Grad-CAM visualizations highlighting the ROIs that the DL model focuses on detecting ischial spine sign, dysplasia, cam deformity, and all abnormalities combined. The visualizations provide insights into the model’s decision-making process and the ability to identify relevant anatomical landmarks for each abnormality. The model accurately localizes the ischial spine (red circle) for detecting the ischial spine sign, the lateral acetabular edge (orange circle) for dysplasia, the femoral head–neck junction (yellow circle) for cam deformity, and all relevant regions for detecting any abnormalities.

Open in new tab Download slide

Discussion

The current study introduces a novel approach utilizing DL techniques for the automated detection of morphological hip pathologies, focusing on DDH and FAI. The results demonstrate the efficacy of a ConvNeXt-Tiny model trained on a dataset of AP pelvic radiographs, achieving good, but nowhere near perfect, accuracy in predicting various hip joint characteristics, including ischial spine sign, cam deformity, dysplasia, and other abnormalities. Expert inter-rater reliability was also good, but not perfect. These results show the promise and current limitations of using plain radiographs for simple morphological hip classifications, and the downstream impact this has on developing reliable DL models.

The current study’s model contributes to the growing body of evidence supporting the application of DL in musculoskeletal radiographic analysis. Previous studies successfully utilized DL methods for diagnosing hip osteoarthritis, osteoporosis, sacroiliitis, avascular necrosis, and other hip pathologies based on pelvic radiographs [19–23]. Notably, the study extends this paradigm to the diagnosis of FAI and DDH, demonstrating the feasibility of using DL algorithms to identify morphological abnormalities indicative of over or under-coverage of the femoral head by the acetabulum and bony abnormalities within the femur and pelvis.

FAI and DDH present a significant challenge in clinical practice, often leading to secondary hip osteoarthritis and debilitating symptoms in young individuals. While advanced imaging modalities like computer tomography (CT) and magnetic resonance imaging (MRI) offer detailed assessment of hip joint morphology, conventional radiography remains the initial imaging modality due to its accessibility and ease of evaluation. The accurate interpretation of radiographic parameters can be challenging, particularly in borderline cases, leading to misdiagnosis and delayed treatment. DL algorithms can aid in the precise measurement and interpretation of radiographic parameters, improving diagnostic accuracy and reducing inter-observer variability. Furthermore, integrating artificial intelligence (AI) into radiology information systems and picture archiving and communication systems can streamline the diagnostic workflow by enabling automatic identification and classification of findings, generating recommendations, and facilitating surveillance and research studies.

Compared to previous studies, the current study demonstrates promising advancements in the automated detection of morphological hip pathologies using DL techniques. Hoy et al. achieved a convolutional neural network (CNN) accuracy of 74% for identifying cam-type FAI morphology with high sensitivity (82%) but comparably low specificity (67%) [7]. Atalar et al. reported an accuracy of 87% using a pretrained VGG-16 model for FAI diagnosis with impressive performance metrics including 83% sensitivity, 90% specificity, 86% precision, 0.84 F1 score, and 0.92 AUC [8]. Xu et al.’s AI-aided diagnostic system used a mask-region-based CNN object detection model followed by a high-resolution network for landmark detection/extraction and a ResNet50 model for classification, showing accuracies ranging from 86% to 95% in providing appropriate Tonnis and International Hip Dysplasia Institute (IHDI) classifications. This system also showed intraclass consistency of acetabular index and center-edge angle among surgeons of 0.79–0.98, while AI showed perfect agreement. Additionally, the algorithm took significantly less time than a group of surgeons to draw conclusions [10]. Fraiwan et al. employed a deep transfer learning technique using a DarkNet53 model that achieved very high accuracy at 96% and perfect sensitivity but with a comparably lower specificity of 94% [9]. Liu et al. employed a pyramid nonlocal U-Net model to measure Tonnis angle with 90% accuracy [11]. Al-Bashir used Canny edge detection to identify pertinent radiographic features followed by bread first search to identify center of the femoral head. It then applied Hough transform for edge detection to find center edge angle and Tonnis angle with accuracies ranging from 46% to 78% and 84% to 85%, respectively [24]. In comparison, the current study examined a cohort of 500 patients, demonstrating robust performance in detecting various hip joint abnormalities from AP pelvic radiographs. (Table 2)

Table 2.

Performance of various models in detecting morphological features on radiographs.

Morphologic feature	Study	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	AUC	AUROC	F1 Score
Cam deformity [7]	Hoy et al.	74	82	70	–	0.74	–	–
Cam deformity [7]	Current study	78	80	74	86	–	0.80	0.83
Ischial spine sign	Current study	87	67	96	88	–	0.89	0.76
FAI diagnosis [8]	Atalar et al.	87	83	90	86	0.92	–	0.84
Dysplasia/DDH [9]	Fraiwan et al.	96	100	94	91	–	–	95%
Dysplasia/DDH [9]	Current study	77	75	77	57	–	0.80	0.65
Tonnis angle [11, 24]	Liu et al.	90	–	–	–	–	–	–
Tonnis angle [11, 24]	Al-Bashir et al.	84–85	–	–	–	–	–	–
Center-edge angle [24]	Al-Bashir et al.	46–78	–	–	–	–	–	–
Shenton’s line [10]	Xu et al.	92–95	92–96	88–91	–	–	–	–
Lateral edge of acetabulum [10, 11]	Xu et al.	89–90	87–89	93	–	–	–	–
Lateral edge of acetabulum [10, 11]	Liu et al.	90			90
Sourcil [10]	Xu et al.	86–87	84–85	88–90	–	–	–	–
Any abnormality	Current study	72	67	85	94	–	0.81	0.78

Morphologic feature	Study	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	AUC	AUROC	F1 Score
Cam deformity [7]	Hoy et al.	74	82	70	–	0.74	–	–
Cam deformity [7]	Current study	78	80	74	86	–	0.80	0.83
Ischial spine sign	Current study	87	67	96	88	–	0.89	0.76
FAI diagnosis [8]	Atalar et al.	87	83	90	86	0.92	–	0.84
Dysplasia/DDH [9]	Fraiwan et al.	96	100	94	91	–	–	95%
Dysplasia/DDH [9]	Current study	77	75	77	57	–	0.80	0.65
Tonnis angle [11, 24]	Liu et al.	90	–	–	–	–	–	–
Tonnis angle [11, 24]	Al-Bashir et al.	84–85	–	–	–	–	–	–
Center-edge angle [24]	Al-Bashir et al.	46–78	–	–	–	–	–	–
Shenton’s line [10]	Xu et al.	92–95	92–96	88–91	–	–	–	–
Lateral edge of acetabulum [10, 11]	Xu et al.	89–90	87–89	93	–	–	–	–
Lateral edge of acetabulum [10, 11]	Liu et al.	90			90
Sourcil [10]	Xu et al.	86–87	84–85	88–90	–	–	–	–
Any abnormality	Current study	72	67	85	94	–	0.81	0.78

Table 2.

Performance of various models in detecting morphological features on radiographs.

Morphologic feature	Study	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	AUC	AUROC	F1 Score
Cam deformity [7]	Hoy et al.	74	82	70	–	0.74	–	–
Cam deformity [7]	Current study	78	80	74	86	–	0.80	0.83
Ischial spine sign	Current study	87	67	96	88	–	0.89	0.76
FAI diagnosis [8]	Atalar et al.	87	83	90	86	0.92	–	0.84
Dysplasia/DDH [9]	Fraiwan et al.	96	100	94	91	–	–	95%
Dysplasia/DDH [9]	Current study	77	75	77	57	–	0.80	0.65
Tonnis angle [11, 24]	Liu et al.	90	–	–	–	–	–	–
Tonnis angle [11, 24]	Al-Bashir et al.	84–85	–	–	–	–	–	–
Center-edge angle [24]	Al-Bashir et al.	46–78	–	–	–	–	–	–
Shenton’s line [10]	Xu et al.	92–95	92–96	88–91	–	–	–	–
Lateral edge of acetabulum [10, 11]	Xu et al.	89–90	87–89	93	–	–	–	–
Lateral edge of acetabulum [10, 11]	Liu et al.	90			90
Sourcil [10]	Xu et al.	86–87	84–85	88–90	–	–	–	–
Any abnormality	Current study	72	67	85	94	–	0.81	0.78

Morphologic feature	Study	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	AUC	AUROC	F1 Score
Cam deformity [7]	Hoy et al.	74	82	70	–	0.74	–	–
Cam deformity [7]	Current study	78	80	74	86	–	0.80	0.83
Ischial spine sign	Current study	87	67	96	88	–	0.89	0.76
FAI diagnosis [8]	Atalar et al.	87	83	90	86	0.92	–	0.84
Dysplasia/DDH [9]	Fraiwan et al.	96	100	94	91	–	–	95%
Dysplasia/DDH [9]	Current study	77	75	77	57	–	0.80	0.65
Tonnis angle [11, 24]	Liu et al.	90	–	–	–	–	–	–
Tonnis angle [11, 24]	Al-Bashir et al.	84–85	–	–	–	–	–	–
Center-edge angle [24]	Al-Bashir et al.	46–78	–	–	–	–	–	–
Shenton’s line [10]	Xu et al.	92–95	92–96	88–91	–	–	–	–
Lateral edge of acetabulum [10, 11]	Xu et al.	89–90	87–89	93	–	–	–	–
Lateral edge of acetabulum [10, 11]	Liu et al.	90			90
Sourcil [10]	Xu et al.	86–87	84–85	88–90	–	–	–	–
Any abnormality	Current study	72	67	85	94	–	0.81	0.78

The current DL model achieved high accuracy in detecting cam deformity of 78.0% with notable sensitivity and specificity values of 80.0% and 73.9%, respectively. Additionally, the model’s performance in identifying ischial spine sign showed commendable accuracy and specificity of 87.2% and 96.0%, respectively, but with comparably lower sensitivity of 66.7%. Detection of dysplasia exhibited good accuracy of 76.6%, sensitivity of 75.0%, and specificity of 77.2%. Detection accuracy of all abnormalities was 71.6%, with a lower sensitivity of 67.3% and higher specificity of 85.3%. These results highlight the model’s capability to accurately distinguish between the presence and absence of various hip joint characteristics.

Moreover, the performance metrics further demonstrate the model’s efficacy, achieving AUROCs of 0.89 for ischial spine sign, 0.80 for cam deformity, 0.80 for dysplasia, and 0.81 for all abnormalities combined. These high AUROC values underscore the model’s moderate ability to identify hip pathologies from radiographic images.

Inter-reader agreement was calculated using Gwet’s AC1, a metric resistant to the prevalence effect and the assumption of independence between observers—limitations commonly associated with the Kappa statistic [25]. Gwet’s AC1 is suitable for the current study, in which the balanced assessment of agreement among annotators is crucial, regardless of the class distribution or the independence of observations. Assessment of inter-rater agreement using Gwet’s AC1 revealed substantial agreement for dysplasia (0.83) and all abnormalities (0.88), and moderate agreement for ischial spine sign (0.75) and cam deformity (0.61). These results indicate the limitations in achieving consensus among human evaluators, which creates challenges in creating ground truth data in any hip preservation study based on radiographs alone (especially if angles are not utilized in classification) and, more importantly to the current study, precludes a clean ground truth for comparison by the DL model.

Grad-CAM visualizations provide valuable insights into the ROIs that the model focuses on when making predictions, highlighting its ability to identify relevant anatomical landmarks for each abnormality. When evaluating for CAM deformity, ischial spine sign, and dysplasia, the model notes ROIs along the lateral and medial aspect of the femoral head, the area medial to the iliopectineal line, and the anterolateral aspect of the acetabulum, respectively.

It is important to highlight the simplicity of the current study’s approach. The model examined solely the presence of two signs of FAI (ischial spine sign, cam deformity) or dysplasia rather than conducting a comprehensive workup involving complex landmark detection and feature extraction to measure continuous metrics such as Tonnis angle, center-edge angles, and extrusion index. Clinicians often utilize 3D imaging modalities such as MRI and CT scans as the standard of care for morphological hip classification due to their ability to provide detailed assessments. However, the current paper deliberately examined the most basic form of hip pain workup—radiographs without accounting for angles—to evaluate whether DL could effectively classify simple measures with performance comparable to human interpretation. As such, this study serves as a proof-of-concept before incorporating more comprehensive metrics in subsequent works.

The study’s limitations include the retrospective design, which restricted the evaluation to AP radiographs without angle measurements, potentially overlooking cases where additional imaging modalities may provide supplementary diagnostic information. In future research endeavors, it would be advantageous to expand the analysis to continuous variables as well as to incorporate 3D imaging modalities such as MRI and CT scans. These additions will provide a more comprehensive understanding of hip morphology and further validate the applicability of DL techniques in musculoskeletal radiographic interpretation.

The current study highlights the potential and inherent limitations of DL methods in augmenting diagnostic capabilities for hip pathologies, particularly FAI and DDH, through automated analysis of morphology present on pelvic radiographs. While further validation and refinement are necessary, the findings suggest a promising avenue for leveraging AI-driven technologies to enhance musculoskeletal radiographic interpretation and improve patient care. However, they also introduce a strong note of caution in training models based on metrics upon which even experts have poor consensus that will influence ultimate model capability.

Acknowledgements

The authors would like to thank all the support staff at the Orthopedic Surgery Artificial Intelligence Laboratory.

Conflict of interest

None declared.

Funding

None declared.

Data availability

The data underlying this article are available upon reasonable request from the corresponding author.

References

Kopec

Qian

Cibere

Relationship between hip morphology and hip-related patient- reported outcomes in young and middle-aged individuals: a population-based study

Arthritis Care Res

2019

;

1202

–

10.1590/0100-3984.2019.0049

Santos

Ferreira Júnior

Wada

et al.

Artificial intelligence, machine learning, computer-aided diagnosis, and radiomics: advances in imaging towards to precision medicine

Radiol Bras

2019

;

387

–

. doi:

Aggarwal

Sounderajah

Martin

et al.

Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis

NPJ Digit Med

2021

;

:65. doi:

10.1038/s41746-021-00438-z

10.1016/j.arth.2022.12.013

Khosravi

Rouzrokh

Mickley

et al.

Creating high fidelity synthetic pelvis radiographs using generative adversarial networks: unlocking the potential of deep learning models without patient privacy concerns

J Arthroplasty

2023

;

2037

–

2043.e1

. doi:

Rouzrokh

Khosravi

Mickley

THA-net: a deep learning solution for next- generation templating and patient-specific surgical execution

J Arthroplasty

2024

;

727

–

. doi:

10.1016/j.arth.2023.08.063

Wyles

Maradit-Kremers

Fruth

Frank stinchfield award: creation of a patient- specific total hip arthroplasty periprosthetic fracture risk calculator

J Arthroplasty

2023

;

–

Hoy

Desai

Mutasa

et al.

Deep learning-assisted identification of femoroacetabular impingement (FAI) on routine pelvic radiographs

J Imaging Inform Med

2024

;

339

–

. doi:

10.1007/s10278-023-00920-y

Atalar

Üreten

Kanatlı

et al.

The diagnosis of femoroacetabular impingement can be made on pelvis radiographs using deep learning methods

Jt Dis Relat Surg

2023

;

298

–

304

. doi:

10.52312/jdrs.2023.996

Fraiwan

Al-Kofahi

Ibnian

et al.

Detection of developmental dysplasia of the hip in X-ray images using deep transfer learning

BMC Med Inform Decis Mak

2022

;

:216. doi:

10.1186/s12911-022-01957-9

10.

Shu

Gong

et al.

A deep-learning aided diagnostic system in assessing developmental dysplasia of the hip on pediatric pelvic radiographs

Front Pediatr

2021

;

:785480. doi:

10.3389/fped.2021.785480

10.1097/01.bpb.0000060292.16932.05

11.

Liu

Xie

Zhang

et al.

Misshapen pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip

IEEE Trans Med Imaging

2020

;

3944

–

. doi:

10.1109/TMI.2020.3008382

12.

Kane

TPC

Harvey

Richards

et al.

Radiological outcome of innocent infant hip clicks

J Pediatr Orthop B

2003

;

259

–

. doi:

PubMed

10.1007/s11999-010-1447-9

13.

Mast

Impellizzeri

Keller

et al.

Reliability and agreement of measures used in radiographic evaluation of the adult hip

Clin Orthop Relat Res

2011

;

469

188

–

. doi:

14.

Berry

Kessler

Morrey

Maintaining a hip registry for 25 years. Mayo Clinic experience

Clin Orthop Relat Res

1997

;

344

:61. doi:

10.1097/00003086-199711000-00007

15.

Rouzrokh

Khosravi

Johnson

Applying deep learning to establish a total hip arthroplasty radiography registry: a stepwise approach

J Bone Joint Surg Am

2022

;

104

1649

–

. doi:

10.2106/JBJS.21.01229

16.

Jocher

Chaurasia

Stoken

Ultralytics/Yolov5: V6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference

2020

17.

Khosravi

Rouzrokh

Maradit Kremers

et al.

Patient-specific hip arthroplasty dislocation risk calculator: an explainable multimodal machine learning-based approach

Radiol Artif Intell

2022

;

:e220067. doi:

10.1148/ryai.220067

10.48550/arXiv.2302.06675

18.

Chen

Liang

Huang

et al.

Symbolic discovery of optimization algorithms

. arXiv [csLG].

2023

. doi:

10.1007/s00256-020-03433-9

19.

Üreten

Maraş

Duran

et al.

Deep learning methods in the diagnosis of sacroiliitis from plain pelvic radiographs

Mod Rheumatol

2023

;

202

–

. doi:

20.

Üreten

Arslan

Gültekin

et al.

Detection of hip osteoarthritis by using plain pelvic radiographs with deep learning methods

Skeletal Radiol

2020

;

1369

–

. doi:

21.

Xue

Zhang

Deng

et al.

A preliminary examination of the diagnostic value of deep learning in hip osteoarthritis

PLoS One

2017

;

:e0178992. doi:

10.1371/journal.pone.0178992

10.1038/s41598-021-99549-6

22.

Jang

Choi

Kim

et al.

Prediction of osteoporosis from simple hip radiography using deep learning algorithm

Sci Rep

2021

;

:19997. doi:

10.1109/EMBC.2015.7319854

23.

Chee

Kim

Kang

et al.

Performance of a deep learning algorithm in detecting osteonecrosis of the femoral head on digital radiography: a comparison with assessments by radiologists

AJR Am J Roentgenol

2019

;

213

155

–

. doi:

24.

Al-Bashir

Al-Abed

Abu Sharkh

et al.

Algorithm for automatic angles measurement and screening for developmental dysplasia of the hip (DDH)

Annu Int Conf IEEE Eng Med Biol Soc

2015

;

2015

6386

–

. doi:

25.

Wongpakaran

Wedding

et al.

A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples

BMC Med Res Methodol

2013

;

:61. doi:

10.1186/1471-2288-13-61