According to the Shared Syntactic Integration Resource Hypothesis (SSIRH Patel, 2003), musical and linguistic syntactic processing rely on shared resources for integrating incoming elements (e.g., chords, words) into unfolding sequences.
Across both domains, syntactic processing involves predicting and integrating incoming elements into higher-order structures. Language and music rely on complex sequences organized according to syntactic principles that are implicitly understood by enculturated listeners. However, others have found that musical syntactic processing also interferes with semantic processing of language (Perruchet & Poulin-Charronnat, 2013), and have suggested that interference effects could be accounted for by a framework of shared mechanisms for cognitive control (Slevc & Okada, 2015) or by general attentional mechanisms rather than by mechanisms more specifically related to syntax processing (Fedorenko & Varley, 2016).
This relationship between linguistic and musical syntax is supported by research demonstrating that violations of linguistic and musical syntax elicit highly similar neurophysiological responses in healthy adults (Patel et al., 1998) and by studies showing interference between linguistic and musical syntactic processing in healthy adults (Fedorenko et al., 2009 Koelsch et al., 2005 Kunert et al., 2015 Slevc et al., 2009). According to the SSIRH, cases of "amusisa without aphasia" or "aphasia without amusia" result from damage to domain-specific representation networks (see Boebinger et al., 2021 Norman-Haignere et al., 2015 for fMRI evidence consistent with the existence of such networks in the superior temporal lobes), while results from neuroimaging studies reflect shared resources for syntactic integration for both domains.