Abstract

Background

Brain metastasis invasion pattern (BMIP) is an emerging biomarker associated with recurrence-free and overall survival in patients, and differential response to therapy in preclinical models. Currently, BMIP can only be determined from the histopathological examination of surgical specimens, precluding its use as a biomarker prior to therapy initiation. The aim of this study was to investigate the potential of machine learning (ML) approaches to develop a noninvasive magnetic resonance imaging (MRI)-based biomarker for BMIP determination.

Methods

From an initial cohort of 329 patients, a subset of 132 patients met the inclusion criteria for this retrospective study. We evaluated the ability of an expert neuroradiologist to reliably predict BMIP. Thereafter, the dataset was randomly divided into training/validation (80% of cases) and test subsets (20% of cases). The ground truth for BMIP was the histopathologic evaluation of resected specimens. Following MRI sequence co-registration, advanced feature extraction techniques deriving hand-crafted radiomic features with traditional ML classifiers and convolution-based deep learning (CDL) models were trained and evaluated. Different ML approaches were used individually or using ensembling techniques to determine the model with the best performance for BMIP prediction.

Results

Expert evaluation of brain MRI scans could not reliably predict BMIP, with an accuracy of 44%–59% depending on the semantic feature used. Among the different ML and CDL models evaluated, the best-performing model achieved an accuracy of 85% and an F1 score of 90%.

Conclusions

ML approaches can effectively predict BMIP, representing a noninvasive MRI-based approach to guide the management of patients with brain metastases.

Key Points
  • - Brain metastasis invasion pattern cannot be predicted using expert evaluation.

  • - Machine learning can predict invasion patterns of brain metastases with high accuracy.

  • - This noninvasive strategy to determine invasion may be used for prognostic and predictive patient stratification.

Importance of the Study

Brain metastasis invasion pattern (BMIP) is associated with prognosis in patients and response to emerging but yet-to-be-approved targeted therapies in preclinical models. Currently, BMIP can only be determined in surgically resected brain metastases. For BMIP to eventually be used effectively in the clinic, there is an unmet need for a noninvasive method to predict BMIP prior to therapy initiation. In this proof-of-principle study, we demonstrate that BMIP can be predicted using machine learning on brain MRI scans, something that cannot be currently accomplished by expert evaluation. This represents a novel application that builds upon the extensive and increasing literature using machine intelligence for biomarker development. With further development, such a tool could enable the noninvasive stratification of patients to personalized therapeutics based on BMIP.

Brain metastases (BrM) are a sequela of advanced cancer associated with poor prognosis and diminished quality of life.1 Treatment options for patients with brain metastases include neurosurgical resection, radiotherapy, and systemic therapies such as targeted therapies, immunotherapies, and chemotherapies.1 Surgically resected BrM can be classified into brain metastasis invasion pattern (BMIP) subtypes based on their histopathological growth pattern: minimally invasive (MI) lesions which account for approximately 34%–50% of BrM, and highly invasive (HI) BrM which are identified in approximately 50%–64% of lesions.2–4 MI BrM remain as localized masses within the brain with no evidence of peritumoral invasion, while HI BrM demonstrates marked invasion of clusters of cells or single cells into the brain parenchyma, as identified on hematoxylin and eosin-stained slides derived from surgical resection or autopsy specimens. HI BrM have been found to be associated with shortened local recurrence-free, leptomeningeal metastasis-free, and overall survival.2,4

To identify BMIP, a surgically resected specimen with adequate brain-tumor interface is required for neuropathological analysis. Only a small subset of BrM are surgically resected, and of that subset, a large percentage of surgically resected BrM do not have adequate brain-tumor interface to determine BMIP.1,2 With BMIP emerging as a predictive biomarker of response to novel therapeutics, there is an unmet need for biomarkers that identify BMIP noninvasively, which would in turn increase the translational potential of BMIP to be used in clinical trials for patient stratification and therapeutic regimen selection.

There are multiple studies demonstrating the potential of computerized image analysis for quantitative feature extraction and the use of different traditional machine-learning (TML) approaches for the prediction or classification of various pathologic, molecular, or clinical endpoints.5–7 There is variation in the nomenclature used but these studies are broadly referred to as texture analysis, radiomics, machine learning (ML), or deep learning (DL) studies. Using hand-crafted or learned features (eg, deep features) and ML, there is potential for noninvasive prediction of various clinical outcomes of interest based on radiologic images obtained as part of the current standard of care. These approaches have been used extensively in BrM,8 but have not yet investigated the potential of ML for the prediction of BMIP.

Given the potential importance of noninvasive approaches for BMIP prediction prior to treatment as an image-based biomarker for BrM management, we developed and evaluated different traditional ML and DL models for predicting BMIP based on features extracted from brain magnetic resonance imaging (MRI) scans.

Methods

Patient Population

Research Ethics Board (REB) approval and a waiver of informed consent were secured for this retrospective study conducted at a single institution (McGill University Health Center (MUHC) Study number: MP-37-2021-7645). The MUHC REB working procedures completely satisfy the requirements for REB Attestation (REBA) as stipulated by Health Canada. Patient selection was conducted utilizing a preexisting database encompassing surgically resected BrM from 2007-2021, incorporating electronic medical records from the Montreal Neurological Institute-Hospital (MNIH).2 All patients with surgically resected BrM were operated on as a standard of care procedure with the following indications: patients with large tumors with mass effect and/or associated symptoms, or patients with suspected brain metastases without a diagnosed primary cancer and who may benefit from surgery to attain a diagnosis and undergo tumor removal.9

Inclusion criteria encompassed the following: (1) Adult patients aged 18 years or older, (2) preoperative brain MRI scan having at least T2-weighted (T2W) and post-contrast T1-weighted (T1WC+) sequences and performed within 30 days of the date of the patient’s surgery, (3) surgically resected and pathologically proven brain metastasis, (4) absence of prior local treatment (eg, radiosurgery) of the target BrM, and (5) adequate brain-tumor interface to allow for a determination of BMIP, as outlined in the methodology described below. Exclusion criteria were: (1) absence of the requisite preoperative scan brain MRI scan, (2) severe image degradation and artifact distorting or obscuring the target brain metastasis (cases with mild degree of motion and other artifacts were not excluded), (3) insufficient specimen for pathological determination of BMIP, and (4) extra-axial extension of the target BrM.

Out of a total of 329 potential candidates, 139–166 patients (depending on the parameter/semantic feature of interest) were included in the neuroradiologist expert prediction of BMIP, and 132 were eligible for ML-based prediction of BMIP. Please refer to Supplementary Methods for additional details.

Determination of BMIP by Histopathology

BMIP was determined as previously described.2 Briefly, hematoxylin and eosin-stained slide specimens were first assessed by a single reviewer for evidence of metastasis-brain interface that permitted scoring of invasion pattern. Specimens included in the cohort were evaluated for degree of invasion by 2 independent observers blinded to patient outcomes (MD and MCG), with reviewer’s scores being averaged to reach a composite score.

A score of 0 was assigned to specimens featuring a well-defined pseudo capsule surrounding the lesion or exhibiting an immune infiltrate layer rich in lymphocytes, creating a distinct separation between cancer cells and the adjacent brain. Specimens with a sharp, direct delineation between metastatic lesions and the adjacent brain received a score of 1. A score of 2 was given to specimens displaying a clear border between metastases and brain parenchyma, with small pockets of cells protruding close to the margin but maintaining clear intervening brain parenchyma. A score of 3 was assigned to specimens showcasing extensive single-cell invasion or clusters of cancer cells within the adjacent brain parenchyma. In cases where specimens exhibited different invasion patterns at discreet locations along the metastasis-brain margin, the highest invasion score was applied. The scores from both observers were averaged, with average scores ranging from 0 to 2 classified as MI, while scores of 2.5 to 3 were categorized as HI lesions.

Given that BMIP classification relies on scoring by 2 independent observers (MD and MCG), the inter-observer agreement was assessed to ensure the reliability and consistency of the scoring system. This analysis was performed by calculating the inter-observer agreement percentage and Cohen’s Kappa coefficient between the 2 reviewers to quantify the level of agreement beyond chance.

Assessment of Conventional Imaging Features Chosen as Potential Proxies for BMIP

A board-certified, fellowship-trained neuroradiologist (SL) was tasked with assessing the following imaging features on preoperative MRI as potential proxies for BMIP: fuzziness at the tumor-brain interface (defined as a binary yes/no), tumor size (largest dimension, in cm), peritumoral edema (scored on a scale of 0–3), mass effect (scored on a scale of 0–3), multifocality of brain metastases (defined as more than 1 distinct metastatic lesion identified on the preoperative scan), and leptomeningeal involvement of the surgical specimen (defined as a binary yes/no). These parameters were then correlated with the gold standard BMIP defined by histopathological evaluation. Accuracy, precision, recall/sensitivity, specificity, and F1-score were utilized to compare the imaging features with the ground truth histopathology BMIP determinations (Table 1).

Table 1.

Association Between Imaging Features by Expert Assessment and the Ground Truth of Brain Metastasis Invasion Pattern (BMIP) Determined by Histopathological Assessment

CriteriaFuzzy borderTumor sizeEdemaMass effectLeptomeningeal
involvement
Multifocality
TP (n)744266454158
TN (n)42312403929
FP (n)453140141329
FN (n)195120454650
Scans excluded (n)23182521260
Accuracy54.944.256.559.057.652.4
Precision62.257.562.376.375.966.7
Recall/sensitivity79.645.176.750.047.153.7
Specificity8.242.623.174.175.050.0
F1-score69.850.668.860.458.259.5
CriteriaFuzzy borderTumor sizeEdemaMass effectLeptomeningeal
involvement
Multifocality
TP (n)744266454158
TN (n)42312403929
FP (n)453140141329
FN (n)195120454650
Scans excluded (n)23182521260
Accuracy54.944.256.559.057.652.4
Precision62.257.562.376.375.966.7
Recall/sensitivity79.645.176.750.047.153.7
Specificity8.242.623.174.175.050.0
F1-score69.850.668.860.458.259.5

A neuroradiologist retrospectively evaluated T2W and T1W-contrast enhanced images of preoperative scans in patients with surgically resected BrM and correlated findings with BMIP. Fuzzy border, tumor size greater than the median, edema score greater than or equal to 2, mass effect scores less than or equal to one, multifocality of greater than one metastatic lesion identified on the preoperative scan, and the presence of leptomeningeal involvement in the surgically resected specimen were used for evaluation. TP, true positive; TN, true negative; FP, false positive; FN, false negative; n = number of samples. In addition, positive class represents the HI samples, while negative class represents the MI ones.

Table 1.

Association Between Imaging Features by Expert Assessment and the Ground Truth of Brain Metastasis Invasion Pattern (BMIP) Determined by Histopathological Assessment

CriteriaFuzzy borderTumor sizeEdemaMass effectLeptomeningeal
involvement
Multifocality
TP (n)744266454158
TN (n)42312403929
FP (n)453140141329
FN (n)195120454650
Scans excluded (n)23182521260
Accuracy54.944.256.559.057.652.4
Precision62.257.562.376.375.966.7
Recall/sensitivity79.645.176.750.047.153.7
Specificity8.242.623.174.175.050.0
F1-score69.850.668.860.458.259.5
CriteriaFuzzy borderTumor sizeEdemaMass effectLeptomeningeal
involvement
Multifocality
TP (n)744266454158
TN (n)42312403929
FP (n)453140141329
FN (n)195120454650
Scans excluded (n)23182521260
Accuracy54.944.256.559.057.652.4
Precision62.257.562.376.375.966.7
Recall/sensitivity79.645.176.750.047.153.7
Specificity8.242.623.174.175.050.0
F1-score69.850.668.860.458.259.5

A neuroradiologist retrospectively evaluated T2W and T1W-contrast enhanced images of preoperative scans in patients with surgically resected BrM and correlated findings with BMIP. Fuzzy border, tumor size greater than the median, edema score greater than or equal to 2, mass effect scores less than or equal to one, multifocality of greater than one metastatic lesion identified on the preoperative scan, and the presence of leptomeningeal involvement in the surgically resected specimen were used for evaluation. TP, true positive; TN, true negative; FP, false positive; FN, false negative; n = number of samples. In addition, positive class represents the HI samples, while negative class represents the MI ones.

Brain MRI Scan Acquisition

To construct DL and traditional ML models, features were extracted from T2W and T1WC + images from preoperative brain MRIs obtained as part of the standard of care, acquired before and after intravenous administration of gadopentetate dimeglumine (Magnevist®) at a dose of 0.01 mmol/kg body weight. The images were obtained using 2 MRI machines: a 3.0 T Philips MRI machine and a 1.5 T GE Signa scanner. All images were stored in the DICOM image format. Figure 1 demonstrates an example of a subset of slices from T2W and T1WC + sequences and associated manually contoured masks that were subsequently used to generate additional computationally derived masks.

Examples of 2 MRI images, each in a row, with their overlaid manually segmented masks of (1) the metastatic tumor of interest, determined by the outer margins of the enhancing lesion on contrast-enhanced T1W images (shown in column A) and (2) edema (including primary tumor), determined by the outer margins of abnormal hyperintense signal on T2W images (visualized in column B). Computationally generated masks isolating the area of edema alone on T2W images were also generated (column C).
Figure 1.

Examples of 2 MRI images, each in a row, with their overlaid manually segmented masks of (1) the metastatic tumor of interest, determined by the outer margins of the enhancing lesion on contrast-enhanced T1W images (shown in column A) and (2) edema (including primary tumor), determined by the outer margins of abnormal hyperintense signal on T2W images (visualized in column B). Computationally generated masks isolating the area of edema alone on T2W images were also generated (column C).

MRI Sequence Selection and Manual Lesion Segmentation

We classified the MRI (T2W and T1WC+) sequences into 4 distinct groups by primary tumor type (lung, breast, melanoma, other) and divided the total patient population into training and testing subsets. Due to the predominance of lung and breast (LB) primaries in BrM, with a limited number of melanoma or other metastases, melanoma and other (MO) samples were exclusively included in the training set, while LB samples were divided, randomly allocating 80% to the training subset and 20% to the testing subset. Therefore, the final classification performance was only for LB BrM. In developing the convolution-based DL (CDL) models, 20% of the split training set was allocated for validation. However, the entire training set was utilized for developing the classic ML models. For patients with multiple resected specimens, we adopted a patient-wise split strategy to prevent any information leakage between our training and testing subsets (Supplementary Table S1).

In this study, different ML-based models were constructed using features extracted from T2W and T1WC + images from patient preoperative brain MRI scans. For each case, a volume of interest (VOI) was manually drawn around the (1) tumor, represented by an enhancing lesion seen on T1WC+, consisting of the outer margin of the contiguous homogeneous or heterogeneously enhancing tumor component, including, if present, immediately contiguous leptomeningeal and pachymeningeal components; and (2) edema, determined based on the maximum area of high T2 signal surrounding a tumor. At the time of manual segmentation, the edema VOI also included the BrM (Figure 1), subsequently separated computationally. Segmentation was first performed by a medical student (BR) trained to perform this task. Thereafter, all contours were reviewed (and modified, if necessary) with a single board-certified neuroradiologist (SL, MCL, or RF). Each neuroradiologist reviewed approximately one-third of the contours. Adjustments made to the contours included boundary refinements, ensuring the inclusion of leptomeningeal invasion, exclusion of vessels, and correcting contours to account for imaging artifacts. Additionally, facilitating discussions and consensus meetings among neuroradiologists was implemented to agree on challenging cases, which helped standardize the contouring process and reduce variability. Manual contours were generated using the open-source medical image visualization software, 3D Slicer version 5.0.3. Following initial manual contouring of the BrM (T1WC+) and (edema + tumor based on T2W images), additional contours of edema only from T2W images were generated computationally. Leveraging 3D Slicer, a blend of interactive tools, such as intensity-based thresholding, region-growing algorithms, and manual adjustments, were used to ensure accurate and precise outlining of the tumor boundaries. Manual adjustments were executed to ensure alignment of the segmented regions with the tumor edges.

Registration, Data Processing, and Additional VOI Generation

To perform a comprehensive evaluation, we analyzed and tested models extracting features from the tumor on T1WC + (T), combined tumor and associated edema on T2W (T + E), or edema without the actual tumor on T2W images (E). In this specific use case, we were also interested in capturing features from edema elicited by the tumor. We reasoned that important predictive information from the peritumoral invasion of the brain may be captured by analysis of signal changes within the brain parenchyma immediately surrounding the tumor in the area of tumor-associated edema. To accomplish this, we leveraged the manual segmentations described previously to generate additional masks. This was done both for (1) efficiency and (2) obtaining the optimal mask. The rationale for this protocol is the following. On MRI, the gold standard for delineation of the actual brain metastasis is its contour as demonstrated on the T1WC + sequence. However, the optimal delineation of edema is done on T2W images. As such, a computational approach based on the aforementioned manually generated masks is not only more efficient (or less manually laborious) but also would be considered most accurate if the images are co-registered.

Using T1WC + volumes as a reference and using advanced normalization tools10,11 python package, we registered the T2W volumes and their associated masks to match the dimensions of the T1W data to define a dataset called T1WR. We also subtracted the T1WC + volumes from the corresponding T2W volumes to define a dataset called T1WC + E and T2WE. Together, these approaches allowed us to apply isolated edema masks (without tumor) to high-resolution T2 volumes (E), retaining only the masked area while eliminating the remaining regions of the image.

Radiomic Feature Extraction

Feature sets extracted from T, T + E, and E images were used for prediction independently as well as in combination. We employed PyRadiomics version v3.0.1, and Python version 3.8 package7,12 to extract a comprehensive set of 107 radiomic features from each sample in the datasets.13 In order to avoid overfitting, we limited the number of features to approximately 10% of the sample size. Subsequently, we refined this feature set to include only the top 10 representative features through a 2-step process: (1) eliminating approximately 50% of features, which was experimentally and analytically achieved by removing features with a variance lower than 0.03, and (2) applying the chi-square statistical feature selection technique14 to select from the remaining features with the highest degree of association with the target (BMIP). Note that features with low variance are typically removed because they exhibit minimal variation across samples, thereby offering little discriminatory power between classes. Additionally, feature selection was conducted exclusively on the training set and performed after the standardization of the features. We specifically performed image-level standardization to adjust the images to a common intensity scale, thereby reducing variations caused due to differences in calibration and sensitivity between the 2 utilized MRI machines. Moreover, we applied feature-level normalization to ensure a consistent feature scale among all the extracted features. Due to the differences in MRI machine configurations, MRI scan dimensionalities, and various feature scales, these 2 steps were required for stable model training and evaluation processes. We examined the T1WC+, T2W, and Registered T2-weighted (T2WR) images individually, employing identical image processing and feature extraction procedures. This methodology guarantees a uniform approach to image processing and feature extraction across all sequences, thereby maintaining the integrity of our final comparisons and ensuring an unbiased assessment. For additional details on processing steps, refer to the Supplementary Materials section.

Model Training

We employed a Voting Ensemble approach, individually enclosing both traditional TML and CDL models. During this process, we constructed 3 distinct TML models: support vector classifier,15 Random Forest (RF),16 and Multi-Layer Perceptron (MLP).17 In this phase, we incorporated the grid search method from the Scikit-Learn Python package with 3-fold cross-validation to select the best-performing model. Experimentally, we observed that further adjusting the class weights in the support vector classifier model, when necessary, improves performance across both classes. Additionally, 3 variations of the EfficientNet18 model were employed for CDL. Finally, we determined the majority of votes at 2 levels to derive our final results (Figure 2): (1) aggregating the prediction of the images belonging to the same volume; (2) aggregating all 3 models’ predictions for each volume. A visual representation of the process is provided in Figure 2 for prediction using the peritumoral edema. For additional details on model development, please refer to the Supplementary Materials section.

The process of model development and evaluation involved multiple traditional machine learning approaches (TML) and deep learning (DL) approaches. The predicted labels from each set of TML and DL models were independently aggregated, and the final prediction, for both TML and DL models, was generated by applying the majority vote during the aggregation process. Although MLP is commonly categorized as a deep model, we treated it as a TML model in this context, given its use as a classifier training on the extracted handcrafted radiomics features. SVC, support vector classifier; RF, random forest; MLP, multi-layer perceptron; B0, EfficientNet-B0; B1, EfficientNet-B1; B2, EfficientNet-B2.
Figure 2.

The process of model development and evaluation involved multiple traditional machine learning approaches (TML) and deep learning (DL) approaches. The predicted labels from each set of TML and DL models were independently aggregated, and the final prediction, for both TML and DL models, was generated by applying the majority vote during the aggregation process. Although MLP is commonly categorized as a deep model, we treated it as a TML model in this context, given its use as a classifier training on the extracted handcrafted radiomics features. SVC, support vector classifier; RF, random forest; MLP, multi-layer perceptron; B0, EfficientNet-B0; B1, EfficientNet-B1; B2, EfficientNet-B2.

Results

Conventional MR Imaging Features Do Not Reliably Predict BMIP

A neuroradiologist blinded to the BMIP status of patients was asked to assess the following pre-determined imaging features on preoperative MRI: fuzziness at the tumor-brain interface, tumor size, peritumoral edema, mass effect, multifocality, and leptomeningeal involvement of the surgical specimen. Imaging assessment was performed in 139-166 patients, depending on the parameter of interest, after excluding those cases with motion artifacts, absence of contrast-enhanced sequences, or severely hemorrhagic lesions. Using BMIP determined by histopathology as the ground truth, the presence of a fuzzy tumor-brain interface as assessed by the neuroradiologist, was able to predict BMIP in 54.9% of cases (Table 1). When attempting to predict HI BMIP, sensitivity was 79.6%, specificity was 8.2%, and F1-score was 69.8%. Despite the fact that the development of leptomeningeal metastases has been previously associated with HI BMIP, the presence of leptomeningeal involvement in the target lesion on the preoperative MRI performed poorly as a predictor of BMIP (F1-Score = 0.582; Table 1).

The method of calling BMIP was validated by calculating the inter-observer agreement between the 2 observers to ensure the reliability and consistency of the scoring system. This analysis revealed inter-observer agreement of 83.7% and a Cohen’s Kappa coefficient of 0.66 (95% CI: 0.55–0.77), suggesting significant agreement between the 2 reviewers (Supplementary Table S2).

Using a logistic regression model, we investigated the utility of semantic imaging features, including mass effect, leptomeningeal disease, and multifocal disease, for predicting BMIP. We conducted a hyperparameter grid search with 3- and 5-fold cross-validation techniques, and our best model obtained an accuracy of 59.0% and an F1 score of 71.8% (Supplementary Figure S1). The results indicate that the model performs poorly in classifying the MI and HI classes. Accordingly, the semantic imaging features lack sufficient discriminative information for the model to discriminate between these classes effectively.

Noninvasive Prediction of BMIP From MRI Images Using ML

We next investigated the use of radiomics and ML to determine whether BMIP can be predicted noninvasively based on computerized analysis of MRI images. Given that LB cancer BrM constituted the largest proportion of patients in our cohort, performance in the test set was only evaluated on patients with lung or breast cancer metastasis. We included a total of 112 BrM in the training set (55 lungs, 24 breasts, 13 melanomas, 20 “others”) and 20 BrM in the test set (15 lungs and 5 breasts), representing 20% of the specimens from these primary tumor types (Supplementary Table S1).

To evaluate the top-performing models, which were selected based on the highest F1-score obtained during training on the validation set, we utilized accuracy, precision, recall, and F1-score metrics to compute the model performance on the independent test set. We incorporated the performance of the models after aggregation, aiming to establish a consensus among our best-performing models. Multiple models were evaluated based on tumor features on T1WC + images, tumor and edema on T2W images, and edema only on T2W images. In addition, both TML and CDL models were evaluated.

As shown in Table 2, TML models using edema on T2W images (E) exhibited superior performance compared to CDL models, demonstrating an approximate 18% improvement for the ensembled TML models with an overall accuracy of 85%, precision of 93%, a recall of 87%, and F1-score of 90%. Among the models, the RF model attains the highest F1-score; however, its confusion matrix (Figure 3) indicates only 60% accuracy for the MI class, implying challenges in achieving satisfactory results for both positive (HI) and negative (MI) classes. The MLP (MLP) model itself demonstrates high performance in both classes, significantly contributing to the enhanced results of our ensembled model (MLEnsemble in Table 2). Nevertheless, MLEnsemble not only achieved a comparable 90% F1 score with the RF model but also demonstrated proficiency in accurately predicting both MI and HI classes, achieving the greatest robustness and accuracy using this dataset.

Table 2.

Performance Evaluation of Various Models Based on Accuracy, Precision, Recall, and F1-Score on the Independent Test Set

Learning approachDataModelAccuracyPrecisionRecallF1-score
Traditional
machine-learning
(TML)
T—Tumor on
T1WC + images
SVC35.0100.013.323.5
RF75.091.673.381.4
MLP70.090.966.676.9
MLEnsemble70.090.966.676.9
T + E—Tumor and edema on
T2W images
SVC65.0100.053.369.5
RF65.090.060.072.0
MLP50.072.753.361.5
MLEnsemble60.0100.046.663.6
E—Edema only on T2W imagesSVC75.091.673.381.4
RF85.087.093.390.3
MLP80.092.380.085.7
MLEnsemble85.092.886.689.6
Convolution-based
deep learning
(CDL)
T—Tumor on
T1WC + images
Eff-B057.856.657.855.4
Eff-B154.653.554.653.4
Eff-B270.370.270.370.2
DLEnsemble75.073.475.074.0
T + E—Tumor and
edema on
T2W images
Eff-B051.551.851.551.6
Eff-B154.646.954.645.2
Eff-B260.961.660.961.1
DLEnsemble65.062.565.063.6
E—Edema only on T2W imagesEff-B060.9461.1460.9461.03
Eff-B154.6960.6054.6953.36
Eff-B271.8875.4771.8871.88
DLEnsemble69.9986.3669.9971.87
Learning approachDataModelAccuracyPrecisionRecallF1-score
Traditional
machine-learning
(TML)
T—Tumor on
T1WC + images
SVC35.0100.013.323.5
RF75.091.673.381.4
MLP70.090.966.676.9
MLEnsemble70.090.966.676.9
T + E—Tumor and edema on
T2W images
SVC65.0100.053.369.5
RF65.090.060.072.0
MLP50.072.753.361.5
MLEnsemble60.0100.046.663.6
E—Edema only on T2W imagesSVC75.091.673.381.4
RF85.087.093.390.3
MLP80.092.380.085.7
MLEnsemble85.092.886.689.6
Convolution-based
deep learning
(CDL)
T—Tumor on
T1WC + images
Eff-B057.856.657.855.4
Eff-B154.653.554.653.4
Eff-B270.370.270.370.2
DLEnsemble75.073.475.074.0
T + E—Tumor and
edema on
T2W images
Eff-B051.551.851.551.6
Eff-B154.646.954.645.2
Eff-B260.961.660.961.1
DLEnsemble65.062.565.063.6
E—Edema only on T2W imagesEff-B060.9461.1460.9461.03
Eff-B154.6960.6054.6953.36
Eff-B271.8875.4771.8871.88
DLEnsemble69.9986.3669.9971.87

The table demonstrates the performance of both traditional machine learning (TML) and deep learning (DL) models used for BMIP prediction based on features from the (1) tumor on T1WC + images, (2) tumor + edema on T2W images, and (3) edema only on T2W images. Model performance is shown for multiple individual and ensemble models using TML or DL.

Table 2.

Performance Evaluation of Various Models Based on Accuracy, Precision, Recall, and F1-Score on the Independent Test Set

Learning approachDataModelAccuracyPrecisionRecallF1-score
Traditional
machine-learning
(TML)
T—Tumor on
T1WC + images
SVC35.0100.013.323.5
RF75.091.673.381.4
MLP70.090.966.676.9
MLEnsemble70.090.966.676.9
T + E—Tumor and edema on
T2W images
SVC65.0100.053.369.5
RF65.090.060.072.0
MLP50.072.753.361.5
MLEnsemble60.0100.046.663.6
E—Edema only on T2W imagesSVC75.091.673.381.4
RF85.087.093.390.3
MLP80.092.380.085.7
MLEnsemble85.092.886.689.6
Convolution-based
deep learning
(CDL)
T—Tumor on
T1WC + images
Eff-B057.856.657.855.4
Eff-B154.653.554.653.4
Eff-B270.370.270.370.2
DLEnsemble75.073.475.074.0
T + E—Tumor and
edema on
T2W images
Eff-B051.551.851.551.6
Eff-B154.646.954.645.2
Eff-B260.961.660.961.1
DLEnsemble65.062.565.063.6
E—Edema only on T2W imagesEff-B060.9461.1460.9461.03
Eff-B154.6960.6054.6953.36
Eff-B271.8875.4771.8871.88
DLEnsemble69.9986.3669.9971.87
Learning approachDataModelAccuracyPrecisionRecallF1-score
Traditional
machine-learning
(TML)
T—Tumor on
T1WC + images
SVC35.0100.013.323.5
RF75.091.673.381.4
MLP70.090.966.676.9
MLEnsemble70.090.966.676.9
T + E—Tumor and edema on
T2W images
SVC65.0100.053.369.5
RF65.090.060.072.0
MLP50.072.753.361.5
MLEnsemble60.0100.046.663.6
E—Edema only on T2W imagesSVC75.091.673.381.4
RF85.087.093.390.3
MLP80.092.380.085.7
MLEnsemble85.092.886.689.6
Convolution-based
deep learning
(CDL)
T—Tumor on
T1WC + images
Eff-B057.856.657.855.4
Eff-B154.653.554.653.4
Eff-B270.370.270.370.2
DLEnsemble75.073.475.074.0
T + E—Tumor and
edema on
T2W images
Eff-B051.551.851.551.6
Eff-B154.646.954.645.2
Eff-B260.961.660.961.1
DLEnsemble65.062.565.063.6
E—Edema only on T2W imagesEff-B060.9461.1460.9461.03
Eff-B154.6960.6054.6953.36
Eff-B271.8875.4771.8871.88
DLEnsemble69.9986.3669.9971.87

The table demonstrates the performance of both traditional machine learning (TML) and deep learning (DL) models used for BMIP prediction based on features from the (1) tumor on T1WC + images, (2) tumor + edema on T2W images, and (3) edema only on T2W images. Model performance is shown for multiple individual and ensemble models using TML or DL.

The confusion matrices of the traditional machine-learning (TML) models, including SVC, RF, and MLP, along with their corresponding ensemble aggregation, are based on computerized analysis and machine-learning prediction using edema on T2W images.
Figure 3.

The confusion matrices of the traditional machine-learning (TML) models, including SVC, RF, and MLP, along with their corresponding ensemble aggregation, are based on computerized analysis and machine-learning prediction using edema on T2W images.

Among the other models, the next best-performing model based on accuracy was the CDL model using prediction from analysis of the enhancing tumor component on T1WC + images (Table 2). Similar to the TDL model, the ensembled model had the best performance, with an accuracy of 75%, precision of 73%, recall of 75%, and F1-score of 74% (Table 2). The performance metrics of the other models are provided in Table 2. The hand-crafted radiomic features used for BMIP prediction are described in detail in Supplemental Table S3.

We also conducted experiments by training our TML models exclusively on LB edema radiomics features to assess the impact of primary tumor type on the model performance. Note that this decreases the training size by 43%. To ensure a fair and comparable evaluation with the other analyses performed, we used the same approach as in our main TML experiments for data splitting and parameter tunning on training and testing sets, excluding only the MO samples from the training set in these analyses. These analyses demonstrate only minimal performance degradation after excluding the MO samples from the training set, achieving an accuracy of 80.0% and an F1 score of 86.7% (Supplementary Figure S2), compared to our top-performing TML model, which achieved an F1 score of 89%. Hence, including MO samples in the training process does not harm and may in fact enhance the robustness of the developed models, underscoring the importance of diverse sample inclusion.

Discussion

In this study, we investigated the potential of radiomics and ML for predicting BMIP in BrM using preoperative brain MRI scans. Surgically resected BrM can be classified into minimally (MI) and highly (HI) invasive BMIP subtypes based on their histopathological growth pattern and relation to the adjacent brain parenchyma.2 HI BrM have been demonstrated to be associated with shortened local recurrence-free, leptomeningeal metastasis-free, and overall survival.2,4 In preclinical models, the HI BMIP pattern has also been suggested to serve as a predictive biomarker for emerging, but not yet approved, therapies targeting pSTAT3-expressing astrocytes in the BrM microenvironment.19 However, any future clinical application and use of this biomarker requires a priori knowledge of the BMIP before therapy, which cannot currently be achieved given the fact that BMIP can only be determined with pathological evaluation of resected tumor specimens. Our study demonstrates the potential of radiomics and ML to predict BMIP noninvasively, prior to resection, with a high accuracy using an ensemble TML model. Such a model has the potential to serve as a noninvasive image-based biomarker for determining prognosis and response to therapy in patients with BrM. While other studies have attempted to correlate imaging-based features with invasion in BrM,20,21 the results described herein are the first to noninvasively predict BMIP using radiomics and ML performed on brain MRI scans.

The prediction of BMIP in this study builds upon an increasing literature demonstrating the use of ML for developing image-based biomarkers that can enhance or augment expert evaluation, providing lesion characterization beyond what can be done using conventional largely qualitative image analysis performed by the naked human eye.8 The best-performing model in our study, the ensemble TML model, achieved an accuracy of 85%, whereas none of the conventional imaging features assessed by the expert neuroradiologist on the same dataset were found to reliably predict BMIP. Similarly, the TML model achieved a 90% F1-Score, compared to the 70% F1 score by evaluating conventional imaging features.

Importantly, the ML model with the highest accuracy was the model based on the peritumoral edema. This is congruent with the current biological understanding of BMIP, where one can hypothesize that the features most representative and predictive of BMIP are the stromal reaction and edema in the invaded brain parenchyma.

In our sample, the TML models were superior in predicting BMIP compared to the CDL models. Given that it is well established that deep neural networks typically require much larger sample sizes for training compared to traditional ML approaches, the most likely explanation for this finding in our cohort is the limited sample size. As such, while this study lays out the framework for image-based prediction of BMIP, refinement of these models with larger sample sizes has the potential to improve predictive performance and more effectively use DL architectures for model development. Despite the small sample size, we took multiple steps to ensure the reliability of the performance metrics reported, which include (1) a robust and consistent preprocessing pipeline, (2) random assignment and use of an independent test set for performance evaluation, and (3) ensuring that when more than one metastasis was resected and used for model development, data from the same patient was not used both in the training and test sets to avoid data leakage and violation of the independence assumption.

With further development and validation of additional datasets, a tool such as the one established herein could have important clinical applications. BMIP has been demonstrated to be an important prognostic tool for patients with surgically resected BrM.2,4 Possessing knowledge of BMIP prior to surgical resection from a preoperative MRI may result in modifications to surgical planning and extent of resection, which may lead to improved patient outcomes. Furthermore, adjuvant stereotactic radiosurgery (SRS) after neurosurgical resection of BrM is the current standard of care,22 and radiation oncologists may choose to irradiate with larger margins if a resected specimen has a HI BMIP. However, there are ongoing studies to assess whether radiation therapy prior to surgical resection is superior. Knowing the BMIP from the preoperative MRI may therefore help with SRS planning in this context. Additionally, work is ongoing to use BMIP as a predictive biomarker for anti-cancer treatment. In the context of liver metastases, this proof-of-concept has been extended to patients, with replacement growth patterns being associated with poor responses to anti-angiogenic therapy in patients.23

Our study has several limitations. The most important limitations are the small sample size and the absence of an external validation set. These are largely the result of the study evaluating a unique endpoint that is not routinely evaluated or reported in clinical practice at most centers. As the reporting of BMIP becomes more widely adopted, future studies expanding on our observations will be easier to perform. Our observations will require additional independent evaluation and validation using larger and more diverse datasets, and eventually, prospective studies demonstrating efficacy as a biomarker. While the model described herein was trained on BrM of various primary tumor types, the performance was only tested on LB cancer BrM in the independent test set, a decision made because of the small sample size of the less common primary types, which would preclude a reliable or meaningful evaluation. Ideally, such a study would be performed separately for each primary cancer type, if sample size allowed. Since LB cancer BrM represents approximately 75% of BrM, the results may be applicable to a majority of patients with BrM. Furthermore, this study was performed using data exclusively from patients who underwent surgical resection and had sufficient tissue for histopathological analysis, potentially limiting the generalizability of the findings to the broader population of patients with BrM. In order for these findings to be generalizable to all BrM, it is imperative for future studies to be performed with patients who had non-resectable lesions and subsequent autopsy to determine BMIP. Finally, the use of 2D convolutional models for a task involving spatial data like MRI might overlook crucial spatial relationships captured in 3D structures. Future studies may aim to incorporate additional analyses to explore spatial relationships of captured features.

The ground truth used for BMIP in this study was histopathological assessment. It is important to note that this methodology, while the gold standard for BMIP assessment, is likely imperfect. BMIP is only discernable on specimens deemed to have sufficient brain-tumor interface for histopathological evaluation. While this methodology has demonstrated its clinical relevance given the association of BMIP with clinical outcomes,2 approximately one-third of surgically resected BrM have insufficient tissue for BMIP determination. Furthermore, of the specimens that are amenable to BMIP determination, the totality of the brain-tumor interface is seldom able to be examined given the palliative intent of neurosurgical resection of brain metastases that do not require negative microscopic margins circumferentially around resected the metastatic lesion.9 This implies the possibility that some patients with MI lesions may have undetected components of the tumor with prominent invasion. While this can be seen as a limitation of this paradigm, it may also serve as a strength, in that a noninvasive model such as the one established herein may be able to stratify patients with indeterminate BMIP as determined by histopathology, to predict their clinical course or response to treatment.

In conclusion, this study demonstrates the feasibility of ML for the development of a noninvasive image-based biomarker for predicting BMIP. This is an important proof-of-concept demonstrating that imaging features, particularly in the peritumoral brain, may be used to identify invasive metastatic cancer cells in the brain. Furthermore, these findings highlight the fact that BMIP may be more widely studied as a predictive biomarker in preclinical and clinical contexts, given encouraging results suggesting that it can be determined accurately in a noninvasive manner and in the absence of a surgical specimen.

Funding

This work was funded by Spark Grants on the Application of Disruptive Technologies in Cancer Prevention and Early Detection of the Canadian Cancer Society and the Canadian Institutes of Health Research—Institute of Cancer Research and Brain Canada Foundation (CCS grant #707078/CIHR grant #707078). This Project has been made possible with the financial support of Health Canada, through the Canada Brain Research Fund, an innovative partnership between the Government of Canada (through Health Canada) and Brain Canada, and the Canadian Cancer Society. While at McGill, R.F. was also a clinical research scholar (chercheur-boursier clinicien) supported by the Fonds de recherche en santé du Québec (FRQS) and had an operating grant jointly funded by the FRQS and the Fondation de l’Association des radiologistes du Québec (FARQ).

Acknowledgments

The authors thank all of the patients who donated their brain metastase tissues to this research. We thank Dr. Farhad Maleki for their insightful comments on the manuscript.

Conflicts of interest statement

R.F. has had a research collaboration/grant and has acted as consultant and/or speaker for Nuance Communications/Microsoft Inc., Canon Medical Systems Inc., and GE Healthcare. R.F. has also served on the clinical advisory board of Automated Imaging Diagnostics/Neuropacs Inc. R.F. is also a co-investigator on a National Institutes of Health STTR grant subaward and a co-principal investigator on a National Science Foundation grant. The authors declare no other conflicts of interest. All authors have reviewed and approved the final version of the article.

Authorship statement

K.N. participated in the design of the study, in particular the ML component, and was involved in the development and execution of the ML part of the study. B.R. played a key role in cohort discovery and initial image processing and performed manual tumor segmentations that were subsequently used for additional computationally derived contour as well as feature extraction and lesion analysis for ML algorithm training and evaluation. A.N. compiled clinical data and performed statistical analyses. N.M. performed image segmentation and supported study execution. S.G. contributed to cohort discovery, clinical lesion assessment, and supported study execution. K.P. provided clinical expertise and supported study execution. R.Z. contributed to the initial grant to grant preparation and study execution. C.R. contributed to the study design and supported the study execution. A.B-F. and J.K.W. contributed to specific parts of the study design and execution planning, particularly the approach for image registration and computational derivation of the edema maps. M-C.G. performed histopathological interpretation of patient specimens to determine BMIP. M-C.L. provided clinical expertise and oversaw part of the tumor segmentation. S.L. provided clinical expertise, oversaw part of the tumor segmentation, and performed the expert evaluation for prediction of BMIP. P.M.S. and K.P. provided study supervision, guiding clinical and experimental rationale. M.D. conceived the study concept and was involved in study design, grant preparation, cohort discovery, and determination of ground truth BMIP on pathology slides. R.F. was involved in every aspect of this study including the initial inception and design of the study, he was the principal investigator on a grant funding this study and oversaw the overall execution of the study. All authors were involved in manuscript drafting and/or review.

Data availability

The source numerical data from this study will be made available upon reasonable request. The actual patient images cannot be shared publicly due to patient privacy restrictions.

References

1.

Achrol
AS
,
Rennert
RC
,
Anders
C
, et al.
Brain metastases
.
Nat Rev Dis Primers.
2019
;
5
(
1
):
5
.

2.

Dankner
M
,
Caron
M
,
Al-Saadi
T
, et al.
Invasive growth associated with Cold-Inducible RNA-Binding Protein expression drives recurrence of surgically resected brain metastases
.
Neuro-Oncology.
2021
;
23
(
9
):
1470
1480
.

3.

Berghoff
AS
,
Rajky
O
,
Winkler
F
, et al.
Invasion patterns in brain metastases of solid cancers
.
Neuro-Oncology.
2013
;
15
(
12
):
1664
1672
.

4.

Siam
L
,
Bleckmann
A
,
Chaung
HN
, et al.
The metastatic infiltration at the metastasis/brain parenchyma-interface is very heterogeneous and has a significant impact on survival in a prospective study
.
Oncotarget
.
2015
;
6
(
30
):
29254
29267
.

5.

Haneberg
AG
,
Pierre
K
,
Winter-Reinhold
E
, et al.
Introduction to Radiomics and Artificial Intelligence: A Primer for Radiologists
. Semin Roentgenol.
2023
;
58
(
2
):
152
157
.

6.

Forghani
R.
Precision digital oncology: Emerging role of radiomics-based biomarkers and artificial intelligence for advanced imaging and characterization of brain tumors
.
Radiol Imaging Cancer
.
2020
;
2
(
4
):
e190047
.

7.

Gillies
RJ
,
Kinahan
PE
,
Hricak
H.
Radiomics: Images are more than pictures, they are data
.
Radiology.
2016
;
278
(
2
):
563
577
.

8.

Nowakowski
A
,
Lahijanian
Z
,
Panet-Raymond
V
, et al.
Radiomics as an emerging tool in the management of brain metastases
.
Neurooncol. Adv..
2022
;
4
(
1
):
vdac141
.

9.

Vogelbaum
MA
,
Brown
PD
,
Messersmith
H
, et al.
Treatment for brain metastases: ASCO-SNO-ASTRO Guideline
.
J Clin Oncol.
2022
;
40
(
5
):
492
516
.

10.

Tustison
NJ
,
Cook
PA
,
Klein
A
, et al.
Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements
.
Neuroimage.
2014
;
99
:
166
179
.

11.

Avants
BB
,
Tustison
NJ
,
Stauffer
M
, et al.
The Insight ToolKit image registration framework
.
Front Neuroinf.
2014
;
8
:
44
.

12.

Van Griethuysen
JJ
,
Fedorov
A
,
Parmar
C
, et al.
Computational radiomics system to decode the radiographic phenotype
.
Cancer Res.
2017
;
77
(
21
):
e104
e107
.

13.

Karu
K
,
Jain
AK
,
Bolle
RM.
Is there any texture in the image
?
Pattern Recognit.
1996
;
29
(
9
):
1437
1446
.

14.

Ferri
FJ
,
Pudil
P
,
Hatef
M
,
Kittler
J.
Comparative study of techniques for large-scale feature selection
. In: Gelsema ES, Kanal LS, eds.
Machine Intelligence and Pattern Recognition
. Vol
16
.
Amsterdam, Netherlands
:
Elsevier
;
1994
:
403
413
.

15.

Platt
J.
Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
.
Adv Large Margin Classif
.
1999
;
10
(
3
):
61
74
.

16.

Breiman
L.
Random forests
.
Mach Learn.
2001
;
45
:
5
32
.

17.

Hinton
GE.
Connectionist learning procedures
. In: Kodratoff Y, Michalski RS, eds.
Machine Learning
.
Amsterdam, Netherlands
:
Elsevier
;
1990
:
555
610
.

18.

Tan
M
,
Le
Q.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
. In:
PMLR
;
2019
:
6105
6114
.

19.

Dankner
M
,
Maritan
SM
,
Priego
N
, et al.
Invasive growth of brain metastases is linked to CHI3L1 release from pSTAT3-positive astrocytes
.
Neuro Oncol
.
2024
;
26
(
6
):
1052
1066
.

20.

Fiss
I
,
Hussein
A
,
Barrantes-Freer
A
, et al.
Cerebral metastases: Do size, peritumoral edema, or multiplicity predict infiltration into brain parenchyma
?
Acta Neurochir (Wien).
2019
;
161
(
5
):
1037
1045
.

21.

Blazquez
R
,
Proescholdt
MA
,
Klauser
M
, et al.
Breakouts—a radiological sign of poor prognosis in patients with brain metastases
.
Front Oncol.
2022
;
12
:
849880
.

22.

Brown
PD
,
Ballman
KV
,
Cerhan
JH
, et al.
Postoperative stereotactic radiosurgery compared with whole brain radiotherapy for resected metastatic brain disease (NCCTG N107C/CEC· 3): A multicentre, randomised, controlled, phase 3 trial
.
Lancet Oncol.
2017
;
18
(
8
):
1049
1060
.

23.

Frentzas
S
,
Simoneau
E
,
Bridgeman
VL
, et al.
Vessel co-option mediates resistance to anti-angiogenic therapy in liver metastases
.
Nat Med.
2016
;
22
(
11
):
1294
1302
.

Author notes

Keyhan Najafian and Benjamin Rehany contributed equally.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].