Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
NLP

Nlp

by JOURNEYMARIZ, Mar. 2024

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/43

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

43 Cards in this Set

Front
Back

	study of language at the level of sounds	Phonetics
	study of the combination of sounds	phonology
	study of the patterns of formation of words by the combination of sounds	morphology
	study of how words combine to form phrases, phrases combine to form clauses and clauses join to make sentences	syntax/syntactic knowledge
	concerns the meaning of the words and sentences	semantic knowledge
	extension of the meanings or semantics	pragmatic knowledge
	concerns connected sentences	Discourse knowledge
	nothing but everyday knowledge that all the speakers share about the world	word knowledge
	applications of NLP	text analytics smart assistants predictive text machine translation
	converts unstructured text data into meaningful data for analysis	text analytics
	recognize patterns in speech thanks to voice recognition, then infer meaning and provide a useful response	smart assistants
	predict things to say based on what you type, finishing the word or suggesting a relevant one	predictive text
	generally translating phrases from one language to another	machine translation
	used to specify strings we might want to extract from a document	regular expressions
	putting characters in sequence	concatenation
	used to specify what a single character cannot be by the use of caret^	square braces
	Set of operation that allows us to say things like "some number of as" are based on the asterisk or *	Kleene * (cleany star)
	one or more occurrences of the immediately preceding character or regex	Kleene+
	Process of cleaning your corpus is called	Text cleaning
	Diacritics, often loosely called	accents
	regex has another method, used to remove punctuations from corpora	sub() — substitution
	An international encoding standard for use with different languages and scripts by which each letter, digit, or symbol is assigned with a unique numeric value	unicode
	Combination of words that are shortened by dropping letters and replacing them with apostrophes	Contractions
	Refers to the process of converting a sequence of text into smaller parts known as tokens	Tokenization
	Breaks up text into smaller trunks or segments with more focus information content	Segmentation
	Builds a vocabulary containing g words but are limited to	Words punctuation marks numbers
	Carry sentiment and meaning	Graphemes
	Are parts of words that contains meaning in and of themselves	Morphemes
	Word extraction includes	One word pair triplets quadruplets
	Enables your machine to know about "ice cream" as well as the "ice" and "cream" that comprise it	n-grams
	Simplest way to tokenize a sentence is to use white space within a string as the	Delimeter
	Occurrences of tokens in the sentence/ paragraph/corpora	One hot vector
	Word frequency	Frequency vector
	Presence or absence of a particular word in a particular sentence	Binary vector
	When a sequence of tokens is vectorized it loses a lot of meaning inherent in the order of words	N grams
	Common words in any language that occur with the high frequency but carry less substantive information about the meaning of a phrase	Stop words
	Removes suffixes from words in an attempt to combine words with similar meanings together under their common stem	Stemming
	One of the most popular stemming methods proposed in 1980 by a british computer scientist named martin f porter	Porter stemming
	Multilingual as it can handle non-english words	Snowball stemmer
	More aggressive and dynamic compared to the other two stemmers	Lancaster stammer
	Save the rules externally and basically used an iterative algorithm	Lancaster stemmer
	Is way more aggressive than porter stemmer and is also referred to as porter2 stemmer	Snowball stemmer
	Steming algorithm that utilizes regular expression to identify and remove suffixes from words	regexp stemmer

Share This Flashcard Set

Set the Language

Related Flashcards

Nlp

Add to Folders

Upgrade to Cram Premium

Card Range To Study

43 Cards in this Set