Friday, April 21, 2017

Among the Philosophers

 Last week I was in Seattle for the APA/ASL meeting. The APA (American Philosophy Association) is kind of new to me, I had never been to one of their meetings before.

The program for the sessions I was is in this blog post Association for Symbolic Logic at the Pacific APA | Richard Zach.   The picture was taken by Richard Zach, the organizer of the session.

Mine was a very short talk about the Lambek Calculus  and categorical models for it, using Dialectica categories. This is joint work with Harley Eades III, updating the old paper I talked about in Lambek Calculus and Dialectica Categories.

I had one very good question, on why do I not believe in the Lambek calculus as a way of modelling language. The answer is complicated and long, so it will be left for a later blog post. I have many good friends who love the Lambek calculus exactly for this application, so I have to be careful how I describe my issues. Anyways, I do love the Lambek calculus as a logical system and the slides for the short talk are in slideshare.

Friday, March 3, 2017

Legal Start-ups

I have not written for far too long. Lots of work, lots of deadlines and especially lots of reviewing. But working with other people always helps.

Sabrina Praduroux works with Luigi di Caro in the University of Torino, Italy, as part of the MIREL (MIning and REasoning with legal text) project. Last year she had a visiting fellowship  to Stanford,  and we did some work together mapping the legal start-ups in the Valley. A report was written for the MIREL workshop and it is posted at my webpage.

Reasoning with legal texts is something I would like to do more. Happily Luigi is visiting again this Summer. I don't know yet if Sabrina and Livio will be visiting too. When they come next I have to ask what the logo of MIREL is all about. I find it a bit disturbing, I must say.

Thursday, December 1, 2016

ECD Graphs? oh, no...

There's a common belief that everything that has been implemented once, in a given programming language X (say C++), can be more easily re-implemented later, in language Y (e.g Java).

Changing programming language for your system is a function of fitting in with a company, or with friends or with fashions. ECD was originally written in Prolog, ported to C++ and then to Java. But it can also be conceptualized using graphs. (there are many hits for "entailment graphs" on the web, should check some of them.)

An ECD Graph comprises three sub-graphs: a semantic graph for the text, a semantic graph for the hypothesis, and a semantic graph of term matches linking hypothesis and text terms. Other classes are constructed on top of an ECD Graph to perform actions like making initial term matches, updating the specificity relations on those matches, and figuring out the entailment relation between text and hypothesis. These operations will update the contents of the ECD Graph.

There is an  EcdGraph class,  initialized with lists of semantic analyses for the text and hypothesis sentences, which get added to the text and hypothesis subgraphs of the EcdGraph.  

On incorporating semantic representations, multiple text sentences are added to the single text graph. Likewise for hypotheses. 
At present it is assumed that the bulk of the lexical lookup will have been done in constructing the semantic representations. However a database of lexical knowledge can also be accessed at inference time and concept details for additional word senses can be introduced by  extra  semantics modules, if desired.

On constructing the initial graph a number of additional data structures are populated, to assist subsequent processing. This includes equivalence classes of coreferent text and hypothesis terms, text and hypothesis terms subject to relative clause restrictions, and determiner properties.

Initial Term Matching
The first operation on an ECD graph is to identify potential matches between hypothesis and text skolem terms. A variety of match types are possible, and these are searched for in a fixed order:
  1. H- and T-terms have the same stem (specificity: equals)
  2. H- and T-terms have the same surface form (occasionally the parser's stemming is messed up) (specificity: equals)
  3. H- and T-terms have at least one sense that belongs to the same cluster of senses (specificity: equals)
  4. H- and T-terms have at least one sense embedded under a concept that is in a sub/superclass of the other (specificity: depends on concept relation)
  5. H-term has a sense that matches the sense on a lexically derived T-term (specificity: equals)
  6. H-term has a sense that has a cluster-sense shared with  the sense on a lexically derived T-term (specificity: equals)
  7. H-term has a sense that has a concept related to a concept for  the sense on a lexically derived T-term (specificity: depends on concept relation)
  8. H-term has a lexically derived sense that is a sense or cluster match with a surface T-term.

hmm, it would be good to have examples here,  wouldn't it?

While multiple matches for a hypothesis term are allowed at any one level, if a hypothesis term is matched at an earlier level it is not considered for matching at later levels; e.g. a stem match for a hypothesis term precludes searching for cluster-sense matches for that term.

All initial term matches found lead to a new match edge being added to the ECD graph, with the specificity direction of the match (equals, sub-class, super-class, disjoint) being included on the edge.

Once the initial matches are found, they are closed under co-reference chains. That is, if H1 is matched with T1, and H1 is coreferent with H2 and T1 coreferent with T2, then new matches should be added for [H1,T2], [H2, T1], [H2, T2].
(don't you love commuting squares and don't you miss latex in blogger? I know I do!)

Updating Match Specificity

Having made initial term matches between lexical items, it remains to be seen how the specificity of these matches is affected by other modifying terms on both the text and hypothesis side. For example, while "dog" is a sub-class of "mammal", it is not a sub-class of "white mammal" (not all dogs are white). Specificity updates are recorded on the match edges, along with the justifications for those updates.  

Specificity updating considers each match in turn. 

Consider a match [H, T]. If terms H and T both have no further nodes dependent on them in their respective role graph, then the initial term match remains unchanged. If H has additional dependents but T does not, then the hypothesis side of the match becomes more specific. If the H term was already more specific than the T term (subclass), then the match remains as (even more) specific. If the original match was one of equality, it now becomes one of being more specific. If the original class had H as more general than T (superclass), making this match more specific leads to the specificity of the match becoming undetermined (none). Likewise, but in the opposite direction if T has further dependents but H does not.

What happens when both H and T have dependents, Hd and Td? 

First we need to see if all matches for Hd and/or Td are completely updated. If not, defer judgement until they all are. For all hypothesis dependents Hd, suppose that Hd has a (fully updated) match with Tm. 

First find out whether Tm is a direct or indirect dependent of T, or in fact is non dependent. If it is not, then H has an unmatched dependent, and the [H,T] match is made more specific. Otherwise the [h,T] match is updated with the specificity of the [Hd, Tm] match. Once all the Hd's have been considered, are there any Td's that have not been mopped up by one of the Hd matches?
If so, then the T term becomes more specific, so the [H,T] match becomes more general.

Sometimes there will be more than one possible match for an Hd term. In this case, we have to make copies of the [H,T] match, and separately explore the match updates according to the various Hd matches.

Scoring Role Paths
When looking for Tm's matched with Hd's, the Hd is generally a direct dependent of T. But Tm can be anything that is linked to T, either via a directed path (i.e. direct or indirect modifier of T), or via an undirected path (i.e. there is some kind of possibly distant relation between T and Tm). These undirected paths will not necessarily capture a coherent connection between T and Tm, still less one that parallels the connection between H and Hd. Nonetheless these paths are allowed to pass. The hypothesis and text paths pairs for each Hd match are recorded on the match edge when its specificity is updated. The eventual idea is to record these role path pairs, and try to learn which kinds of pairing are reasonable, and which are not.

To this end, there is the stub of a path score included in the specificity update. This assigns a cost to each path pairing, and defines a threshold cost beyond which pairs should be discounted. At present the cost is a combination of (a) the difference in length of hypothesis and text paths, and (b) whether an undirected text path jumps from one sense of a word into another.
Relative Clauses and Determiners
Once a match is completely updated, some more work still remains to be done. Part of this is dealing with relative clause like constructions. Relative clauses like "the dog that barked" are odd in that the noun is restricted by the head of the relative clause, but the noun is at the same time an argument/modifier of the relative clause. Up to this point, only the latter relation has been considered in updating specificity.

Suppose that on the hypothesis side some term has a relative clause restriction. If there is a matching relative clause restriction on the other side, all is well. Likewise if whatever matches the H-restriction is a direct or indirect modifier of the T-term (in which case the [H,T] match is updated with the specificity of the restriction match. If the H-restriction is not accounted for in this way, the [H,T] match is made more specific.

After all the H-restrictions have been dealt with, we look to see if there any remaining unmatched T-restrictions. If the T-restriction matches a hypothesis term that has H as a descendant, then update [H,T] with the specificity of this match. Otherwise there is an unmatched T-restriction, and so [H,T] is made more general.


Once the effects of relative clauses have been taken into account, we need to deal with the effect of determiners. 
Determiners can flip the specificity of a match. For example, while "a dog" is more general than "a black dog", "every dog" is more specific than "every black dog". Determiners can have different monotonicity properties on their restrictions and bodies:
  • + +       some, at least N:                some old men are keen gardeners  |= some men are gardeners
  • - +       all, every:                            all men are keen gardeners  |= all old men are keen gardeners  |= all old men are gardeners
  • - -        no, at most N:                     no men are gardeners |= no old men are keen gardeners
  • ? ?       exactly N:                           exactly 3 old men are keen gardeners  ?? exactly 3 old men are gardeners ?? exactly 3 men are gardeners
  • ? +       most,many                         most men are keen gardeners |= most men are gardeners
  • ? -        few:                                   few men are gardeners |= few men are keen gardeners
It turns out that all determiners that are downward monotone in their second argument ("no", "few") can be treated as a negated version of a determiner that is upward monotone in the second argument (no = not some, few = not many). The effects of this are captured by including a negation in the semantic graphs containing these determiners. Monotonicity entailments in the first argument are dealt with through a determiner table in Java that captures such facts as "all <subclass> |= some <superclass>", e.g. "all dirty water |= some water". This table is used to update the specificity on matches for terms with the corresponding determiners in the property graph.

An agenda of matches to be updated is constructed, prioritizing matches of terms that have least dependents. As matches become fully updated, they are pulled off the agenda. When an Hd has more than one match, new copies of the [H,T] match are added to the graph and to the agenda. The agenda is repeatedly processed until it either becomes empty or does not change. Also, the determiner entailments table is incomplete. Also, copying matches when an Hd has more than one match can sometimes lead to multiple [H,T] matches with the same specificity. When this happens, the higher scoring matches are removed.

Determining Entailment Relations
Once all the term matches have been fully updated according to the structure of the text and hypothesis graphs, the entailment relation between the two graphs can be determined. It is at this point that the context structure of the hypothesis and text graphs becomes important.

First up, we look for any contradictions. How does a contradiction occur? A contradiction requires one term in a positive context being matched with a term in a negative context. Specifically, the two terms must be the context heads of their respective contexts. If the hypothesis term is positive and more or equally specific than the negative text term (a (young) man :: no man), then a contradiction occurs. If it more general (a man :: no young man), then no determination can be made. Likewise, if the hypothesis term is in a negative context and the positive text term is equally or more specific, then a contradiction occurs.

If there are no contradictions, then a search is made for entailments. This looks for the highest level, non-vacuous hypothesis term(s). 
If it occurs in a positive context, then it needs to match a text term in a positive context that is as or more specific. If it is in a negative context, it must match a negative text term that is as or more general.

If an entailment is not found, the relation is neutral. In the case of neutrality, there are some weaker guesses at contradiction and entailment that consider the specificity of the initial term matches rather than the finally updated specificity.

Ideas for Further Work
A number of further things should be explored
  • (we use a readymade clustering) Using Brown clustering for another form of initial term match in ECD
  • Using word sense disambiguation to cut down on the number of ridiculous entailment derived from incorrect word sense choices
  • Explore running ECD off lightly modified (how much? for what?) syntactic dependencies (i.e. with context structure and lexical information included, but without going through all of the transfer semantics)
  • Adding ontological  data to the database server (SUMO, OpenCyc, ...)
  • Learning path scores for the path scorer from training on labeled data.

 The picture is from Liz Copock's project page
Channels of Meaning: How information can be expressed without being said

Tuesday, November 29, 2016

Entailment and Contradiction Detection by any other name

More Dick Crouch on semantics and names, joyously mangled by me.

The Entailment and Contradiction Detector (ECD) -- see the old patent, implements a version of what has come to be known as "Natural Logic" (see 

This performs inference by determining specificity relations between words, phrases, and sentences. For example, "dog" is more specific than "mammal"; "_ saw a dog" is consequently more specific than "_ saw a mammal"; and so "John saw a dog" is more specific than "John saw a mammal". 

In the normal course of events, more specific versions of sentences imply more general versions. So "John saw a dog" entails that "John saw a mammal", but not vice versa. However, the construction of sentences can affect specificity relations over and above the contribution made by individual words. For example, "dog" is a more specific term than "mammal", but "John likes mammals" is less  specific claim than "John likes dogs" (the second carves out a more restricted set of possibilities).  Similarly, "No person saw a dog" is more specific and entails "No man saw a dog", and is contradicted by "A man saw a dog."

Most approaches to using natural logic for textual inference either rely on very close alignments of semantic representations, or on heuristics operating on bags of words and word order. We pursue an intermediate path where we primarily match words at the individual level, but update those matches according to the syntactic/semantic structures in which those words occur.

In the previous post we saw that we can construct  graph objects as semantic representations.  Semantic graphs comprise four sub-graphs:

Role Graph: This represents the basic predicate argument structure of the sentence. Nodes in the graph correspond to skolem and context terms in the (non-graphical) semantic representation, and the arcs between them are semantic roles relations.
Within role graphs, two distinct kinds of node are distinguished: skolem nodes and context nodes. Skolem nodes are terms that are for the most part directly introduced by lexical items (nouns, verbs, adjectives, adverbs). Context nodes can sometimes additionally be introduced by words
and/or constructions. For example, the verb "believe" introduces both a skolem for the act of believing, and a hypothetical context in which the thing believed is situated. All sentences have a top level / true context, and other contexts are introduced by arcs linking them to skolem or contexts nodes in pre-existing contexts. All contexts should have a context_head arc that links the context to the main/head skolem defining the context. This relation is important for the proper treatment of negation. 

Property Graph: Some words (e.g. determiners) as well as morphological features (e.g. singular/plural, past/present tense) do not lead to the introduction of new skolem nodes. Instead they mark properties on existing skolems. The property relations are held in the property graph. Values of the properties are represented as value nodes and are typically atomic (string valued).

Lexical Graph: The lexical graph is used to records other lexical information associated with the skolems, typically derived from the semantic lexicon. Skolems in the role graph will have lexical edges linking them to zero or more sense nodes. The contents of the sense nodes include the sense identifier, and a list of concepts that the sense is attached to. (In the PARC incarnation of ECD the semantic lexicon was called the Unified Lexicon and it was homegrown. Other instantiations are possible.)

Link Graph: The link graph indicates identities between nodes, as induced by such things as (a) anaphora resolution, (b) copula, appositive and other constructions (Obama is the president; Obama, the president), and (c) lexical coreference links identifying lexical skolems with those introduced explicitly in the sentence (e.g. part of the lexical entry for "pilot" says that "pilots fly aircraft" -- the pilot mentioned in some sort of  "naive semantics" is lexically coreferent with the word that introduced it)