• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/67

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

67 Cards in this Set

  • Front
  • Back

Key

A set of attributes is a key if it is a minimal set of identifying attributes - removing any one attribute would make it no longer uniquely identifiable.

Super Key

A set of attributes is a superkey for an entity if those attributes, taken together, always uniquely identify every entity instance.

Composite Key

A Primary Key made up of multiple attributes.

Primary Key

A single key that is unique and not-null. It is one of the candidate keys.

Candidate Key

A candidate key can be uniquely and used to identify a database record. They are Not Null and Unique.

Representative Sample

The corpora should contain a similar mix of text to the language variant for which it is being developed. For example using Shakespeare's works is not a good representation of Elizabethan English.

Finiteness

A copora should be finite. When building a corpora it is usually decided at the outset how the language is to be sampled and how much data to include.

Monitor Corpora

These capture the growth and change of a language. They remain finite but extend over time.

Name 4 distinctive features of a Machine Readable Corpus compared to books with printed text

• They can be huge in size; up to a billion words



• They can be searched and analysed efficiently



• They can be made available to many users simultaneously, at large distances



• They can easily (and sometimes automatically) be annotated with additional useful information.

What is the advantage of using a standard reference?

Having a standard reference allows competing theories about the language variety to be compared against each other on the same sample data.

Name 4 notable English Language Corpora

• Oxford English Corpus (OEC)



• Corpus of Contemporary American English (COCA)



• British National Corpus (BNC)



• Brown Corpus

Balancing

Balancing ensures that the linguistic content of a corpus represents the full variety of the language sources for which the corpus is intended to provide a reference. For example a balanced text corpus includes materials from sources such as books, newspapers, magazines, letters, etc.

Sampling

Sampling ensures that the material is representative of the types of source. For example, Sampling from the newspaper text involves selecting texts randomly from different newspapers, issues and sections.

List some of the "dimensions" of the source material that balancing would affect.

• Language Type: Editied Text, Spontainious, Scripted



• Genre



• Domain (What is the text about?)



• Medium

Tokenization

Tokenization divides are textual data into tokens such as words, numbers and punctuation marks.

Sentence Boundary Detection

Identify the start and end of individual sentences.

Why Annotate A Corpus?

It adds informstion to the corpus that is not explicit in the data itself. This is often specific to a particular application; and a single corpus may be annotated in multiple ways.

Annotation Scheme

Annotation Scheme is the basis for annotation, made up of a tag set and annotation guidelines

Annotation Guidelines

Tells annotators - domain experts - how a tag set should be applied. It ensures consistency across annotators.

Tag Set

An inventory of labels for makeup.

Relationship

A relationship is an association between entities.

Relationship Instance

Each individual occurrence of the relationship is a relationship instance.

Relationship Set

A collection for all instances of a relationship

What is this in an ER diagram?

Attribute

What sort of key is used to identify Course?

Composite Key

What is Atomicty?

All or Nothing: a transaction eitehr runs to completion, or fails and leaves the database unchanged.




This may involve a rollback mechanism to undo a partially-complete transaction

What is Consistency?

Applying a transaction in a valid state f the database will always give a valid result state.

What is Isolation

Concurrent transactions hav eht esame effect as sequential ones: the outcome is as if they were done in order.




(NOTE: Transactions may, in fact run at the same time: but should never see each other;s imtermediate state.)

What are the ACID properties amd why are they meeded/

The ACID properties (Atomicity, Consistency, Isolation, Durable) are a key benchmark for assessing database systems.

What is the symbol for total participation in an ER diagram?

Double lines

Which way should an arrow face in a one to many relationship in an ER diagram?

The arrow should point from the many to the one. It should go from the many to the relationship block.

What is a weak entitiy?

A weak entity is an entity that has attributes but may not be enough to uniquely identify its self without it's identifying relationship and identifying owner.

In a tree of an XPath data model, what do the positions of the nodes relative to each other show?

They show where they appear in the XML document. Those appearing first in the document will appear leftmost in the diagram.

In a DTD declarations, what do the order of the lines mean?

Nothing! As DTD is declarative the lines can appear in any order.

What would the ELEMENT line for this node look like? *Insert publisher node from paper*

<!ELEMENT publisher (name,imprint+)>




The publisher may have only one name, however the plus indicates that there maybe more than one imprint.

In a DTD document, when would you use the tag #PCDATA?

PCDATA is used for text nodes in DTD.

What would the attribute line(s) look like for this node?

<!ATTLIST book code CDATA #REQUIRED>


<!ATTLIST book type (hardback|paperback) "paperback">

When is CDATA used in a DTD document?

CDATA is used to represent an open attribute (no regex)

When is #REQUIRED used in DTD

#REQUIRED is used when the attribute of a node cannot be left empty.

When should not null be used in a SQL schema?

Not null should be used on non-primary key attribute if it cannot be left empty or on a foreign key as this must exist for a relationship to exist.

Do primary keys require the not null tag in a SQL schema?

No. As primary keys by nature are required to make a record in a database thus are not null by default.

How do you create a composite key in a SQL schema?

primary key (field1, field2,...)

How do you create a foreign key in a SQL schema?

foreign key x references table(y)

How is text represented in a SQL schema?

VARCHAR(x) where x is the number of characters

In an SQL query, what does distinct mean?

distinct is used after select to remove duplicate entries in the result.

How do you create a SQL query that involves more than one table?

select table1.field

from table1, table2


where table1.field = table2.field and ...




Note: field may be called different things in different tables

What is catagorical data?

Categorical data/scale is data that has no numerical or natural order.

What is ordinal data?

Ordinal data/scale give a recognised order between data items, but there is no arithmetic content. Numbers may still be used but there is no way to apply arithmetic to them.

What is an interval scale?

An interval scale assigns a numeric balue to data, but where these values are relative to each other. Values can be compared, averaged and subtracted but not added together or multiplied.

What is a ratio scale

A ration scale uses numeric values which have an abstemious notion of zero This means they can sensibly be added, and multiples by real numbers.

Give 3 examples of categorical data

Classifying words (nouns, verbs, adjective)


Eye Color


Places in a town

Give 3 examples of ordinal data

T-Shirt Size (XS,S,M,L)


1st, 2nd, 3rd


Shoe size

Give 2 examples of interval data

Times Of Day


Temperature (Celsius)



Give 3 examples of ratio data

Temperature (Kelvin)
Wind Speed


Height

What is the formula for the mean?

*insert picture*

What is the formula for the standard deviation with large data sets?

*INSERT PICTURE(

What is the formula for the standard deviation for a population

*insert picture* This is known as the Bessel Correction

How do you calculate the median?

Order the data (if possible) from smallest to large then find the value that lies in the (n+0.5)/2 position. If the n+0.5/2 value is not an integer then find the value between the (n/2) and (n+1)/2 value.

How do you calculate the mode?

Count the most frequent value.

What is the cosine formula

insert picture

What is Precision?

Precision is what proportion of the documents returned by the system are relevant

What is Recall?

What proportion of all the relevant documents are returned by the system.

Name each section of the contingency table:


*picture*

a = true positives


b = true negatives


c = false positives


d = true negatives

How do you calculate precision?

TP/ (TP+FP)

How do you calculate recall?

TP/(TP+FN)

What is the F Score of a system?

F Score is a measure of how balanced a system is, a score close to one has more weight on precision, and the closer to zero it gets the more weight on recall.

How do yo calculate F Score?

(2PR)/(P+R)