• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/94

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

94 Cards in this Set

  • Front
  • Back

What are grammar formalisms?

The rules of a language, a precise way to detail and describe the structure of sentences.

What are specific grammars?

Implementations of a particular grammar formalism

Types of verbs?

1) Intransitive Verbs: Takes only a subject (sleep)


2) Transitive Verbs: Takes one direct object (eat)


3) Ditransitive Verbs: Takes one indirect object (give)

Why can't FSA describe grammar?

FSA's cannot generate a hierarchical structure that suits most grammars.

Difference between strong and weak generative capacity?

Weak generative capacity: Only generating strings that fit a language.




Strong generative capacity: Also concerns with generating the right structure that fit a language

Use of CFG's?

CFG's can capture recursion. However, they are not able to capture bounded recursion (only embed one or two relative causes). They are equivalent to pushdown automata (PDA's).

DG?

Dependency Grammar: describes the structured of sentences as a directed acyclic graph (words = nodes, dependencies = edges).

Constituent?

A constituent is a phrase that acts as a single unit.

How to find constituents?

1) Substitution Test


2) Movement Test


3) Answer Test -- used as an answer

Arguments vs Adjuncts

Arguments: Mandatory


Adjuncts: Optional

Chomsky Normal Form

Can only have 2 kinds of right-had sides:


1) 2 Nonterminals


2) 1 Terminal

CKY Algorithm

O(n3|G|) dynamic programming algorithm to find the POS tagging of the a sentence

Number of Parse Trees in CKY

# of Parse Trees #(VP --> V NP) = #(V)#(NP)


Then add up all the possible combinations

Probabilistic Context-Free Grammar

A CFG with probabilities associated with each transaction. Each possible left hand side has to all equate to 1

PCFG Modeling

It makes independence assumptions --> The flatter the better.



Bias/Variance Tradeoff

A probability model has low bias if it makes few independence assumptions, but this leads to a higher variance which is a poor estimate of the data.

Parser Evaluation: Evaluation of phrase-structure trees

Precision: What percentage of retrieved documents are relevant to the query?


P = TP/(TP + FN)




Recall: What percentage of the relevant documents were retrieved?


R = TP/(TP + FP)




F-Measure: Harmonic Mean

Penn Treebank

The Penn Treebank was the first publicly available POS annotated corpus

Ways to improve performance in PFCG

1) Change the internal grammar




2) Change the probability model

Parent Transformation

PCFG assume the expansion of any nonterminal is independent of its parent. We can change the grammar by adding the name of the parent node to each nonterminal

Lexicalization with PCFG

PFGs cant distinguish between "eat sushi with chopsticks" and "eat sushi with tuna", therefore we need to take into account head words.

Dependency Grammer

Word-word dependencies are a component of many (most/all) grammar formalisms.




Dependency grammar assumes that syntactic structure consists only of dependences




Purely descriptive

Kinds of dependencies

Head-argument: eat sushi


Head-modifier: fresh sushi


Head-specifier: the sushi


Coordination: Sushi and Sashimi

Dependency

There is a syntactic relation between a head H and a dependent D in a construction C if:




The head H determines the syntactic category of the construction C


The head H determines the semantic category of the construction C; D gives semantic specification


The H is obligatory.


The head selects D and determines whether D is obligatory or not.

Prague Dependency Treebank

3 levels of annotation:


Morphological


Surface-syntactic


Semantic

Parsing Algorithms for DG

1) Transition-based parsers


2) Graph-based parsers

Transition-based parsing

Transition based shift reduce parsing processes the sentences S w0w1...wn from left to right. It constructs a single tree.




Utilizes a stack, and a buffer to create arcs.

Feature Structures

List of features (= attributes), and values.




Represented as AVM (attribute value matrix)




Can be recursive




Can be used as a way to properly do constraints

Grammar Formalisms

Formalisms provide a language in which linguistic theories can be expressed and implemented.




Formalisms define elementary objects and recursive operations which generate complex objects from simple objects

Mildly context-sensitive gramars

Contains all context-free grammars/languages




Can be parsed in polynomial time (On6)




Strong generative capacity: capture certain kinds of dependences: nested and cross serial

Tree-Adjoining Grammar (TAG)

Tree-rewriting format that defines operations on trees (elementary objects = trees)




Is lexicalized by anchoring to another lexical item




Is mildly context sensitive

Combinatory Categorial Grammar (CCG)

Categories: specify subcat lists of words/constituents




Combinatory rules: specify how constituents can combine




Lexicon: Specifies which categories a word can have




Derivations: spell out process of combining constituents

CCG Categories

Simple (atomic) categories: NP, S, PP



Complex Categories (functions):


VP, intransitive verbs: S\NP


Transitive Verb: (S\NP)/NP


Adverbs: (S\NP)/(S/NP)


Prepositions:


((S\NP)\(S\NP))/NP


(NP/NP)/NP


PP/NP

Function Application

Forward Application: (S\NP)/NP NP => S\NP


Backward Application: (NP S\NP) => S

Why is MT Difficult

1) Correspondences




2) Lexical Divergences:




3) Syntactic divergences

MT - Correspondences

One to one,


One to Many/Many to One,


Many to Many


Reordering Required

MT - Lexical Divergences

Different senses of homonymous words generally have different translations (river bank, vs financial bank).




Different senses of polysemous words may also have different translations:

MT - Lexical Specificity and Morphological Divergences

Lexical Specificity:


German Kürbis = English pumpkin or squash


English brother = Chinese gege (older), didi (younger)




Morphological divergences:


un nouveau, une nouvelle, etc.

MT - Syntactic Divergences

Head-marking vs. Dependent-marking

MT - Semantic Differences

Peter swims vs Peter is swimming

Direct Translation

1) Morphological Analysis of Source String


2) Lexical transfer (using a translation dictionary)


3) Local reordering


4) Morphology

Limits of direct translation:

Phrasal reordering

Syntactic Transfer

Requires a syntactic parse of the source language, followed by reordering of the tree. Moves the words around to work.

Interlingua Approach

Based on the assumption there is a common representation for a phrase in all languages, just have to find that and say that in the language.

Noisy Channel Model

E = argmax P(F|E) X P(E)




P(F|E) = Translation model, captures faithful-ness of the translation




P(E) = Language Model, captures the fluency of the translation




Used to translate

Translation Probability

Use a phrase table and phrase alignment on a parallel corpus to get the counts

IBM Model 1

Based on the noisy channel, word-based translation models




1) Generates length


2) Generate an alignment


3) Generate the words of the translation

IBM Model 1 Parameters

Length Probability: P(m|n): Probability of generating a source sentence of length m given a target sentence of length n?




Alignment Probability: P( A|m,n)




Translation Probability: P(fj="lac"|aj = i, ei="lake")

IBM Model 1 EM

Expectation-Maximization (EM)


E: Go through training data to compute counts


M: Use the expected counts to compute a new model


Check for convergence

Phrase-based MT

Fundamental units of translation are phrases.


Splits target sentence deterministically into phrases and translate each phrase. Reorder foreign phrases with distortion probability.

Phrase-based MT - Finding the best translation

Since there is an exponential number of possible translations, we can use a heuristic-search algorithm.




Score the partial translation based on the cost.

MT Evaluation

Automatic: BLEU - Uses n-gram precision to grade a translation




Manual: Use a human to track fluency and adequacy.

Semantics

Lexical Semantics: meaning of a word


Compositional Semantics: meaning of a sentence


Discourse Semantics: meaning of a longer piece of text

Nouns/Verbs

Nouns: Refers to entities in the world


Verbs: define n-ary predicates: depending on the arguments they take, the results can be true or false

Sentences

Declarative Sentences: can be true or false depending on the state of the world.




Consists of a verb + 1 or more NP's

Principle of compositionally (Frege):

The meaning of an expression depends on the meaning of its parts and how they are put together.

Predicate logic expressions/ First-order predicate language (FOL)

Terms: refers to entities


Predicates: relations between entitites


Formulas: Can be true or false

Predicate Formulas

Atomic Formulas:


book(x), eat(x, y)




Complex Formulas involve


1) Negation


2) Connectives


3) Quantifiers

Limitations of FOL

Tense, Aspect, Modals, Quantifiers

lambda-expressions

Use lambda expressions along with CCG's to construct sentences




N = lambda z, chef(z)

Quantifier Scope Ambiguity

"Every chef cooks a meal": For every chef, there is a meal which he cooks or There is some meal which every chef cooks

Thematic Roles

Verbs describe events or states.


Thematic roles refer to participants of these events:


Agent: Person who performs the action


Patient: Entity in which the action was performed on


Tool: What was used to perform the action

Proposition Bank (PropBank)

PropBank information classifies arguments needed for verbs.

Diathesis Alternations

Active/passive alternation


Causative alternation


Dative alternation


Locative alternation

Discourse

Discourse is a linguistic unit that consists of multiple sentences.



Speaker describes some state of the world




Speaker attempts to get the listener to construct a similar model of the situation

Entity-based coherence

How we refer to entities influenced how coherent a discourse is.

Centering Theory

A linguistic theory of entity-based coherence and salience that predicts which entities are salient at any point during a discourse. Also predicts whether a discourse in entity-coherent based on referring theory. It is not a computational model

Coherence relations

A sentence is coherent if it provides explanation to context.

Rhetorical Structure Theory (RST)

Describes coherence relations between utterances.




It defines a set of rhetorical relations:


Evidence, Elaboration, Attribution, Contrast, List




Hold a relation between a nucleus and a satellite




Some impose constraints

How to understand discourse

1) Coreference Resolution: "the cafe" = "Bevande"




2) Identifying discourse relations: "He wanted to buy lunch" is the reason "John went to Bevande"

Discourse Models

Explicit Representation of the events and entities that a discourse talks about and the relations between them.




Captures: Physical Entities, Events, States, Temporal Relations, Rhetorical Relations

Referring expression ("this book")


Co-reference

Refers to some entity which is called the referent



Two referring expressions that refer to the same entity





Indefinite NPs

Indefinites usually introduce a new discourse entity. They can refer to a specific entity or not.




No determiner,


The indefinite determiner


Numeral


Indefinite quantifiers


Indefinite this

Definite NPs

Definite NPs refer to an identifiable entity (previously mentioned or not)

Information Status

Hearer-new vs Hearer-old: Assumes entity is (un)known to the listener.




Discourse-new vs. Discourse-old: Speaker introduces new entity into the discourse

Coref as binary classifier

Represent each NP-NP pair (+ context) as a feature vector.




Pass through the text and for each NP go through the previous sentences to find the best antecedent NP

Features for Coref Resolution

Do the 2 NPs have the same head noun?


Do they contain the same modifier?


Does the gender and number match?


Is one NP a hypernym/synonym of the other?


Are both NP's named entities of the same time.

Evaluation of Coref resolution:

B-Cubed F-Score


Precision P: Co and To/Co


Recall: Co and To/To


F-measure: 2PR/(P+R)

Natural Language Generation

Automatic production of natural language text usually from underlying semantic representation

NLG Architectures

1) Text Planning: Content determination and discourser planning




2) Sentence Planning: Sentence aggregation, lexicaliztation and referring expression generation.




3) Linguistic Realization: Syntactic, morphological and orthographic processing

NLG Tasks - Text Planning

1. Content Determination: What information should be communicated.




2. Discourse Planning: How should the message be structured





NLG Tasks - Sentence Planning

3. Sentence Aggregation: Which messages should be combined into individual sentences?



4. Lexicalization: In which words should domain concepts be expressed.




5. Referring expression generation: How should entities be referred to?

NLG Tasks - Linguistic Realization

Linguistic Realization: Generate a grammar and orthographically formed text.

Summarization

"The process of distilling the most important information from a text to produce an abridged version for a particular task and user"

Centroid-based content selection

Which sentences are most central in a document?



A) Central Sentences = Salient/Informative sentences (contains many salient words)




B) Central Sentences = most similar to other s's in doc.

RST-Based Summarization

Use a discouse parser to identify rhetorical relations between sentences/clauses in a document.




This gives a discourse tree that defines a salience ranking

Dialogue Manager

Controls the architecture and structure of dialogue




1. FS


2. Frame Based


3. Information State


4. AI Planning

Grounding

Dialogue is a collective act performed by speaker and hearer




Common ground: set of things mutually believed by both speaker and hearer.

NLP Pipeline

1. Tokenizer/Segmenter: to identify words and sentences.


2. Morphological Analyzer/POS tagger: to identify the part of speech and structure of words.


3. Word Sense Disambiguation: To find the meaning of the words.


4. Syntactic/Semantic Parser: To obtain the structure and meaning of sentences.


5. Coreference resolution/discourse model: to keep track of the various entities and events mentioned.

Core Problems of NLP

Ambiguity: Natural Language is highly ambiguous




Coverage (Zipf's Law): Any NLP system will come across words that did not occur during training.

Statistical Models for NLP

Probabilistic Models (HMM, MEMM, CRFs, PCFGS)

Deep Learning

Neural networks with several hidden layers




Very impressive performanc gains in computer vision and speech recognition




Fast Computers have finally made it viable

Challenges in Neural Networks for NLP

Input is discrete, NN works best with continuous vectores.




Used in word embeddings and NL models

NLG Architecture

Goal


Text Planner


Sentence Planner


Linguistic Realizer


Surface Text