Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
94 Cards in this Set
- Front
- Back
What are grammar formalisms? |
The rules of a language, a precise way to detail and describe the structure of sentences. |
|
What are specific grammars? |
Implementations of a particular grammar formalism |
|
Types of verbs? |
1) Intransitive Verbs: Takes only a subject (sleep) 2) Transitive Verbs: Takes one direct object (eat) 3) Ditransitive Verbs: Takes one indirect object (give) |
|
Why can't FSA describe grammar? |
FSA's cannot generate a hierarchical structure that suits most grammars. |
|
Difference between strong and weak generative capacity? |
Weak generative capacity: Only generating strings that fit a language. Strong generative capacity: Also concerns with generating the right structure that fit a language |
|
Use of CFG's? |
CFG's can capture recursion. However, they are not able to capture bounded recursion (only embed one or two relative causes). They are equivalent to pushdown automata (PDA's). |
|
DG? |
Dependency Grammar: describes the structured of sentences as a directed acyclic graph (words = nodes, dependencies = edges). |
|
Constituent? |
A constituent is a phrase that acts as a single unit. |
|
How to find constituents? |
1) Substitution Test 2) Movement Test 3) Answer Test -- used as an answer |
|
Arguments vs Adjuncts |
Arguments: Mandatory Adjuncts: Optional |
|
Chomsky Normal Form |
Can only have 2 kinds of right-had sides: 1) 2 Nonterminals 2) 1 Terminal |
|
CKY Algorithm |
O(n3|G|) dynamic programming algorithm to find the POS tagging of the a sentence |
|
Number of Parse Trees in CKY |
# of Parse Trees #(VP --> V NP) = #(V)#(NP) Then add up all the possible combinations |
|
Probabilistic Context-Free Grammar |
A CFG with probabilities associated with each transaction. Each possible left hand side has to all equate to 1 |
|
PCFG Modeling |
It makes independence assumptions --> The flatter the better. |
|
Bias/Variance Tradeoff |
A probability model has low bias if it makes few independence assumptions, but this leads to a higher variance which is a poor estimate of the data. |
|
Parser Evaluation: Evaluation of phrase-structure trees |
Precision: What percentage of retrieved documents are relevant to the query? P = TP/(TP + FN) Recall: What percentage of the relevant documents were retrieved? R = TP/(TP + FP) F-Measure: Harmonic Mean |
|
Penn Treebank |
The Penn Treebank was the first publicly available POS annotated corpus |
|
Ways to improve performance in PFCG |
1) Change the internal grammar 2) Change the probability model |
|
Parent Transformation |
PCFG assume the expansion of any nonterminal is independent of its parent. We can change the grammar by adding the name of the parent node to each nonterminal |
|
Lexicalization with PCFG |
PFGs cant distinguish between "eat sushi with chopsticks" and "eat sushi with tuna", therefore we need to take into account head words. |
|
Dependency Grammer |
Word-word dependencies are a component of many (most/all) grammar formalisms. Dependency grammar assumes that syntactic structure consists only of dependences Purely descriptive |
|
Kinds of dependencies |
Head-argument: eat sushi Head-modifier: fresh sushi Head-specifier: the sushi Coordination: Sushi and Sashimi |
|
Dependency |
There is a syntactic relation between a head H and a dependent D in a construction C if: The head H determines the syntactic category of the construction C The head H determines the semantic category of the construction C; D gives semantic specification The H is obligatory. The head selects D and determines whether D is obligatory or not. |
|
Prague Dependency Treebank |
3 levels of annotation: Morphological Surface-syntactic Semantic |
|
Parsing Algorithms for DG |
1) Transition-based parsers 2) Graph-based parsers |
|
Transition-based parsing |
Transition based shift reduce parsing processes the sentences S w0w1...wn from left to right. It constructs a single tree. Utilizes a stack, and a buffer to create arcs. |
|
Feature Structures |
List of features (= attributes), and values. Represented as AVM (attribute value matrix) Can be recursive Can be used as a way to properly do constraints |
|
Grammar Formalisms |
Formalisms provide a language in which linguistic theories can be expressed and implemented. Formalisms define elementary objects and recursive operations which generate complex objects from simple objects |
|
Mildly context-sensitive gramars |
Contains all context-free grammars/languages Can be parsed in polynomial time (On6) Strong generative capacity: capture certain kinds of dependences: nested and cross serial |
|
Tree-Adjoining Grammar (TAG) |
Tree-rewriting format that defines operations on trees (elementary objects = trees) Is lexicalized by anchoring to another lexical item Is mildly context sensitive |
|
Combinatory Categorial Grammar (CCG) |
Categories: specify subcat lists of words/constituents Combinatory rules: specify how constituents can combine Lexicon: Specifies which categories a word can have Derivations: spell out process of combining constituents |
|
CCG Categories |
Simple (atomic) categories: NP, S, PP
VP, intransitive verbs: S\NP Transitive Verb: (S\NP)/NP Adverbs: (S\NP)/(S/NP) Prepositions: ((S\NP)\(S\NP))/NP (NP/NP)/NP PP/NP |
|
Function Application |
Forward Application: (S\NP)/NP NP => S\NP Backward Application: (NP S\NP) => S |
|
Why is MT Difficult |
1) Correspondences 2) Lexical Divergences: 3) Syntactic divergences |
|
MT - Correspondences |
One to one, One to Many/Many to One, Many to Many Reordering Required |
|
MT - Lexical Divergences |
Different senses of homonymous words generally have different translations (river bank, vs financial bank). Different senses of polysemous words may also have different translations: |
|
MT - Lexical Specificity and Morphological Divergences |
Lexical Specificity: German Kürbis = English pumpkin or squash English brother = Chinese gege (older), didi (younger) Morphological divergences: un nouveau, une nouvelle, etc. |
|
MT - Syntactic Divergences |
Head-marking vs. Dependent-marking |
|
MT - Semantic Differences |
Peter swims vs Peter is swimming |
|
Direct Translation |
1) Morphological Analysis of Source String 2) Lexical transfer (using a translation dictionary) 3) Local reordering 4) Morphology |
|
Limits of direct translation: |
Phrasal reordering |
|
Syntactic Transfer |
Requires a syntactic parse of the source language, followed by reordering of the tree. Moves the words around to work. |
|
Interlingua Approach |
Based on the assumption there is a common representation for a phrase in all languages, just have to find that and say that in the language. |
|
Noisy Channel Model |
E = argmax P(F|E) X P(E) P(F|E) = Translation model, captures faithful-ness of the translation P(E) = Language Model, captures the fluency of the translation Used to translate |
|
Translation Probability |
Use a phrase table and phrase alignment on a parallel corpus to get the counts |
|
IBM Model 1 |
Based on the noisy channel, word-based translation models 1) Generates length 2) Generate an alignment 3) Generate the words of the translation |
|
IBM Model 1 Parameters |
Length Probability: P(m|n): Probability of generating a source sentence of length m given a target sentence of length n? Alignment Probability: P( A|m,n) Translation Probability: P(fj="lac"|aj = i, ei="lake") |
|
IBM Model 1 EM |
Expectation-Maximization (EM) E: Go through training data to compute counts M: Use the expected counts to compute a new model Check for convergence |
|
Phrase-based MT |
Fundamental units of translation are phrases. Splits target sentence deterministically into phrases and translate each phrase. Reorder foreign phrases with distortion probability. |
|
Phrase-based MT - Finding the best translation |
Since there is an exponential number of possible translations, we can use a heuristic-search algorithm. Score the partial translation based on the cost. |
|
MT Evaluation |
Automatic: BLEU - Uses n-gram precision to grade a translation Manual: Use a human to track fluency and adequacy. |
|
Semantics |
Lexical Semantics: meaning of a word Compositional Semantics: meaning of a sentence Discourse Semantics: meaning of a longer piece of text |
|
Nouns/Verbs |
Nouns: Refers to entities in the world Verbs: define n-ary predicates: depending on the arguments they take, the results can be true or false |
|
Sentences |
Declarative Sentences: can be true or false depending on the state of the world. Consists of a verb + 1 or more NP's |
|
Principle of compositionally (Frege): |
The meaning of an expression depends on the meaning of its parts and how they are put together. |
|
Predicate logic expressions/ First-order predicate language (FOL) |
Terms: refers to entities Predicates: relations between entitites Formulas: Can be true or false |
|
Predicate Formulas |
Atomic Formulas: book(x), eat(x, y) Complex Formulas involve 1) Negation 2) Connectives 3) Quantifiers |
|
Limitations of FOL |
Tense, Aspect, Modals, Quantifiers |
|
lambda-expressions |
Use lambda expressions along with CCG's to construct sentences N = lambda z, chef(z) |
|
Quantifier Scope Ambiguity |
"Every chef cooks a meal": For every chef, there is a meal which he cooks or There is some meal which every chef cooks |
|
Thematic Roles |
Verbs describe events or states. Thematic roles refer to participants of these events: Agent: Person who performs the action Patient: Entity in which the action was performed on Tool: What was used to perform the action |
|
Proposition Bank (PropBank) |
PropBank information classifies arguments needed for verbs. |
|
Diathesis Alternations |
Active/passive alternation Causative alternation Dative alternation Locative alternation |
|
Discourse |
Discourse is a linguistic unit that consists of multiple sentences.
Speaker attempts to get the listener to construct a similar model of the situation |
|
Entity-based coherence |
How we refer to entities influenced how coherent a discourse is. |
|
Centering Theory |
A linguistic theory of entity-based coherence and salience that predicts which entities are salient at any point during a discourse. Also predicts whether a discourse in entity-coherent based on referring theory. It is not a computational model |
|
Coherence relations |
A sentence is coherent if it provides explanation to context. |
|
Rhetorical Structure Theory (RST) |
Describes coherence relations between utterances. It defines a set of rhetorical relations: Evidence, Elaboration, Attribution, Contrast, List Hold a relation between a nucleus and a satellite Some impose constraints |
|
How to understand discourse |
1) Coreference Resolution: "the cafe" = "Bevande" 2) Identifying discourse relations: "He wanted to buy lunch" is the reason "John went to Bevande" |
|
Discourse Models |
Explicit Representation of the events and entities that a discourse talks about and the relations between them. Captures: Physical Entities, Events, States, Temporal Relations, Rhetorical Relations |
|
Referring expression ("this book") Co-reference |
Refers to some entity which is called the referent
Two referring expressions that refer to the same entity |
|
Indefinite NPs |
Indefinites usually introduce a new discourse entity. They can refer to a specific entity or not. No determiner, The indefinite determiner Numeral Indefinite quantifiers Indefinite this |
|
Definite NPs |
Definite NPs refer to an identifiable entity (previously mentioned or not) |
|
Information Status |
Hearer-new vs Hearer-old: Assumes entity is (un)known to the listener. Discourse-new vs. Discourse-old: Speaker introduces new entity into the discourse |
|
Coref as binary classifier |
Represent each NP-NP pair (+ context) as a feature vector. Pass through the text and for each NP go through the previous sentences to find the best antecedent NP |
|
Features for Coref Resolution |
Do the 2 NPs have the same head noun? Do they contain the same modifier? Does the gender and number match? Is one NP a hypernym/synonym of the other? Are both NP's named entities of the same time. |
|
Evaluation of Coref resolution: |
B-Cubed F-Score Precision P: Co and To/Co Recall: Co and To/To F-measure: 2PR/(P+R) |
|
Natural Language Generation |
Automatic production of natural language text usually from underlying semantic representation |
|
NLG Architectures |
1) Text Planning: Content determination and discourser planning 2) Sentence Planning: Sentence aggregation, lexicaliztation and referring expression generation. 3) Linguistic Realization: Syntactic, morphological and orthographic processing |
|
NLG Tasks - Text Planning |
1. Content Determination: What information should be communicated. 2. Discourse Planning: How should the message be structured |
|
NLG Tasks - Sentence Planning |
3. Sentence Aggregation: Which messages should be combined into individual sentences?
4. Lexicalization: In which words should domain concepts be expressed. 5. Referring expression generation: How should entities be referred to? |
|
NLG Tasks - Linguistic Realization |
Linguistic Realization: Generate a grammar and orthographically formed text. |
|
Summarization |
"The process of distilling the most important information from a text to produce an abridged version for a particular task and user" |
|
Centroid-based content selection |
Which sentences are most central in a document?
B) Central Sentences = most similar to other s's in doc. |
|
RST-Based Summarization |
Use a discouse parser to identify rhetorical relations between sentences/clauses in a document. This gives a discourse tree that defines a salience ranking |
|
Dialogue Manager |
Controls the architecture and structure of dialogue 1. FS 2. Frame Based 3. Information State 4. AI Planning |
|
Grounding |
Dialogue is a collective act performed by speaker and hearer Common ground: set of things mutually believed by both speaker and hearer. |
|
NLP Pipeline |
1. Tokenizer/Segmenter: to identify words and sentences. 2. Morphological Analyzer/POS tagger: to identify the part of speech and structure of words. 3. Word Sense Disambiguation: To find the meaning of the words. 4. Syntactic/Semantic Parser: To obtain the structure and meaning of sentences. 5. Coreference resolution/discourse model: to keep track of the various entities and events mentioned. |
|
Core Problems of NLP |
Ambiguity: Natural Language is highly ambiguous Coverage (Zipf's Law): Any NLP system will come across words that did not occur during training. |
|
Statistical Models for NLP |
Probabilistic Models (HMM, MEMM, CRFs, PCFGS) |
|
Deep Learning |
Neural networks with several hidden layers Very impressive performanc gains in computer vision and speech recognition Fast Computers have finally made it viable |
|
Challenges in Neural Networks for NLP |
Input is discrete, NN works best with continuous vectores. Used in word embeddings and NL models |
|
NLG Architecture |
Goal Text Planner Sentence Planner Linguistic Realizer Surface Text |