• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/65

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

65 Cards in this Set

  • Front
  • Back

a. Determinant

The attribute on the left-hand side of the arrow in a functional dependency.

b. Functional dependency
A constraint between two attributes or two sets of attributes.
c. Transitive dependency
A functional dependency between two (or more) nonkey attributes.

d. Recursive foreign key

A foreign key in a relation that references the primary key values of that same relation.

e. Normalization

The process of decomposing relations with anomalies to produce smaller, well-structured relations.

f. Composite key

A primary key that consists of more than one attribute.

g. Relation
A named, two-dimensional table of data.
h. Normal form

A state of a relation that results from applying simple rules regarding functional dependencies (or relationships between attributes) to that relation.

i. Partial functional dependency

A functional dependency in which one or more nonkey attributes (such as Name) are functionally dependent on part (but not all) of the primary key.

j. Enterprise key

A primary key whose value is unique across all relations.

k. Surrogate primary key:

A serial number or other system assigned primary key for a relation.

well-structured relation
contains little redundancy
Six important properties of relations are:

a. Each relation in a database has a unique name.
b. An entry at the intersection of each row and column is atomic (or single valued).
c. Each row is unique.
d. Each attribute within a table has a unique name.
e. The sequence of columns is insignificant.
f. The sequence of rows is insignificant.

5. Describe two properties that must be satisfied by candidate keys:

a. Unique identification: For every row, the value of the key must uniquely identify that row.
b. Nonredundancy: No attribute in the key can be deleted without destroying the property of unique identification.

6. Three types of anomalies in tables:

a. Insertion anomaly: A new row cannot be inserted unless all primary key values are supplied.
b. Deletion anomaly: Deleting a row results in the loss of important information not stored elsewhere.
c. Modification anomaly: A simple update must be applied to multiple rows.

9. Describe how the following components of an E-R diagram are transformed into relations:

a. Regular entity type: Each entity type is transformed into a simple relation. Each simple attribute of the entity type becomes an attribute of the relation.
b. Relationship (1:M): A relation is created for each of the two entity types participating in the relationship. The primary key attribute of the entity on the one-side of the relationship becomes a foreign key in the relation on the many-side of the relationship.
c. Relationship (M:N): A new relation is created to represent this relationship. The primary key for each of the participating entity types is included in this new relation.
d. Relationship (supertype/subtype): A separate relation is created for the supertype and each of its subtypes. The primary key of the supertype is assigned to each subtype, as well as attributes that are unique to the subtype.
e. Multivalued attribute: A new relation is created to replace the multivalued attribute. The primary key of this new relation consists of two attributes: the primary key of the original relation, plus the multivalued attribute itself.
f. Weak entity: A new relation is created corresponding to the weak entity. The primary key of this relation consists of the primary key of the owner relation, plus the partial identifier of the weak entity type.
g. Composite attribute: The simple component attributes of the composite attribute are included in the new relation.

10. Four typical problems in merging relations:

a. Synonyms: Two (or more) attributes have different names but the same meaning. Solution: Convince users to standardize on a single name.
b. Homonyms: A single attribute has more than one meaning.
Solution: Create new attribute names that capture the separate meanings.
c. Transitive dependency: Merging relations produces transitive dependencies.
Solution: Create 3 NF relations by removing the transitive dependency.
d. Supertype/subtype: May be implied by content of existing relations.
Solution: Create new relations that explicitly recognize this relationship.

1NF

Multivalued attributes removed

2NF

contains no partial functional dependencies

3NF

transitive dependencies eliminated

Contrast



Normal form; normalization

Normal form is a state of a particular relation regarding functional dependencies, while normalization is the process of decomposing relations with anomalies to produce smaller, well-structured relations.
b.

Contrast



Candidate key; primary key

A primary key is an attribute (or combination of attributes) that uniquely identifies a row in a relation. When a relation has more than one such attribute (or combination of attributes), each is called a candidate key. The primary key is then the one chosen by users to uniquely identify the rows in the relation.
c.

Contrast



Partial dependency; transitive dependency

A partial functional dependency exists when a nonkey attribute is functionally dependent on part (but not all) of a composite primary key; a transitive dependency is a functional dependency between two or more nonkey attributes.
d.

Contrast



Composite key; recursive foreign key

A composite key is a primary key that consists of more than one attribute, while a recursive foreign key is a foreign key in a relation that references the primary key values of that same relation.
e.

Contrast



Determinant; candidate key

A determinant is the attribute on the left-hand side of the arrow in a functional dependency, while a candidate key uniquely identifies a row in a relation.
f.

Contrast



Foreign key; primary key

A primary key uniquely identifies each row in a relation, while a foreign key is a primary key in another table.
g.

Contrast



Enterprise key; surrogate primary key



An enterprise key is a primary key whose value is unique across all relations in the whole database and is likely to hold no business meaning. A surrogate primary key is a primary key whose value is a serial number or other system assigned value and is unique to the relation.


Three conditions that imply a relation is in second normal form:


a. The primary key consists of a simple attribute
b. No nonkey attributes exist in the relation
c. Every nonkey attribute is functionally dependent on the full set of primary key attributes

12. Integrity constraints enforced in SQL CREATE TABLE commands:


a. Entity integrity: enforced by NOT NULL clause
b. Referential integrity: enforced by FOREIGN KEY REFERENCES statement

File organization

A technique for physically arranging the records of a file on secondary storage devices.
b.

Sequential file organization

Records in the file are stored in sequence according to a primary key value.
c.

Indexed file organization

Records are either stored sequentially or non-sequentially, and an index is created that allows software to locate individual records.
d.

Hashing file organization

The address for each record is determined using a hashing algorithm.
e.

Denormalization

The process of transforming normalized relations into unnormalized physical record specifications.
f.

Composite key

A key made up of more than one column.
g.

Secondary key

One or a combination of fields for which more than one record may have the same combination of values.
h.

Data type

Each unit of a detailed coding scheme recognized by system software, such as a DBMS, for representing organizational data.
i.

Bitmap index

A table of bits in which each row represents the distinct values of a key and each column is a bit, which when on indicates that the record for that bit column position has the associated field value.
j.

Redundant Arrays of Inexpensive Disks (RAID)

A set, or array, of physical disk drives that appear to the database user (and programs) as if they form one large, logical storage unit.
k.

Join index

An index on columns from two or more tables that come from the same domain of values.
l.

Stripe

The set of pages on all disks in a RAID that are the same relative distance from the beginning of the disk drive.
m.

Explain plan


A command available in most database management systems which displays how the query optimizer will access indexes, use parallel servers and join tables to prepare query results.

Horizontal partitioning; vertical partitioning

Horizontal partitioning is very similar to creating a supertype/subtype relationship because different types of the entity (where the subtype discriminator is the field used for segregating rows) are involved in different relationships, hence different processing. Neither horizontal nor vertical partitioning prohibits the ability to treat the original relation as a whole.
b.

Physical file; tablespace

A physical file is a named portion of secondary memory (magnetic tape, hard disk) allocated for the purpose of storing records; a tablespace is a named set of disk storage elements in which physical files for database tables may be stored.
c.

Physical record; physical file

A physical file is a named portion of secondary memory (a magnetic tape or hard disk) allocated for the purpose of storing physical records. Physical records of several types can be clustered together into one physical file in order to place records frequently used together close to one another in secondary memory.
d.

Page; physical record

Often a page will store multiple physical records. Depending on the computer system, a lengthy physical record may or may not be allowed to span two pages.

Secondary key; primary key


A secondary key is one or a combination of fields for which more than one record may have the same combination of values; the primary key is one or a combination of fields for which every record has a unique value. Hence, the primary key is a unique identifier for a row.

Three major inputs to physical design:



a. Logical database structures developed during logical design
b. User processing requirements identified during requirements definition
c. Characteristics for the DBMS and other components of the computer operating environment

5. Key decisions in physical database design:


a. Choosing the storage format (called data type) for each attribute from the logical data model: the format is chosen to minimize storage space and to maximize data integrity.
b. Grouping attributes from the logical data model into physical records: you will discover that although the columns of a relational table are a natural definition for the contents of a physical record, this is not always the most desirable grouping of attributes.
c. Arranging similarly structured records in secondary memory (primarily hard disks) so that individual and groups of records can be stored, retrieved, and updated rapidly (called file organizations): consideration must also be given to protecting data and recovering data after errors are found.
d. Selecting structures for storing and connecting files to make retrieving related data more efficient (called indexes and database architectures).
e. Preparing strategies for handling queries against the database that will optimize performance and take advantage of the file organizations and indexes that you have specified: efficient database structures will be of benefit only if queries and the database management systems that handle those queries are tuned to intelligently use those structures.

7. Developing field specifications:


a. Define the data type used to represent values of the field.
b. Establish data integrity controls for the field, including default values, ranges, null value controls, and referential integrity.
c. Determine how missing values for the field will be handled.
d. Other field specifications, such as display format, must be made as part of the total specification of the information system; those specifications are typically handled by programs rather than by the DBMS.

8. Selecting a field data type:
These four objectives will have varying relative importance for different applications:


a. Minimize storage space
b. Represent all possible values
c. Improve data integrity
d. Support all data manipulations

9. Coding or compressing field values:


Where attributes have a sparse set of values or a volume so large that considerable storage space will be consumed, possibilities for coding or compressing field values should be considered. Large data fields mean that data are further apart, which yields slower data processing. Where the set of valid values is small, translation into a code that requires less space is a possibility. Data compression techniques also use coding to reduce the storage space required for commonly recurring patterns of data.

10. Controlling field integrity:

a. Specify default values.
b. Specify a range or list of permissible values.
c. Set null value permissions.
d. Establish referential integrity.

11. Three ways to handle missing field values:


a. Substitute an estimate of the missing value: for example, for a missing sales value when computing monthly product sales, use a formula involving the mean of the existing monthly sales values for that product indexed by total sales for that month across all products. Such estimates must be marked so users know that these are not actual values.
b. Track missing data so that special reports and other system elements cause people to quickly resolve unknown values. Setting up a trigger in the database can accomplish this. A trigger is a routine that will automatically execute when some event occurs or time period passes. One trigger could log the missing entry to a file when a null or other missing value is stored, and another trigger could run periodically to create a report of the contents of this log file.
c. Perform sensitivity testing so that missing data are ignored unless knowing a value might significantly change results. For example, if total monthly sales for a particular salesperson were almost over a threshold that would make a difference in that person’s compensation, then attention would be drawn to the missing value. Otherwise, it would be ignored. This is the most complex of the methods mentioned; it requires the most sophisticated programming, which must be written in application programs since DBMSs do not have the sophistication to handle this method.

12. Effect of normalizing relations on physical record storage:


One goal of physical record design is efficient use of storage space. In most cases, the second goal of physical record design—efficient data processing—dominates the design process. Efficient processing of data, just like efficient accessing of books in a library, depends on how close together related data (or books) are. Often, all the attributes that appear within a relation are not used together, and data from different relations are needed together to answer a query or produce a report. Thus, although normalized relations solve data maintenance anomalies, normalized relations, if implemented one for one as physical records, may not yield efficient data processing.

13. Situations that suggest the possibility of denormalization:


a. Two entities with a one-to-one relationship: Even if one of the entities is an optional participant, if the matching entity exists most of the time, then it may be wise to combine these two relations into one record definition (especially if the access frequency between these two entity types is high). Figure 6-3 shows student data with optional data from a standard scholarship application a student may complete. In this case, one record could be formed with four fields from the STUDENT and SCHOLARSHIP APPLICATION normalized relations. (Note: In this case, fields from the optional entity must have null values allowed.)
b. A many-to-many relationship (associative entity) with nonkey attributes: Rather than joining three files to extract data from the two basic entities in the relationship, it may be advisable to combine attributes from one of the entities into the record representing the many-to-many relationship, thus avoiding one join in many data access modules. Again, this would be most advantageous if this joining occurs frequently. Figure 6-4 shows price quotes for different items from different vendors. In this case, fields from ITEM and PRICE QUOTE relations might be combined into one record to avoid having to join all three files together. (Note: This may create considerable duplication of data—in the example, the ITEM fields, such as Description, would repeat for each price quote—and excessive updating if duplicated data changes.)
c. Reference data: Reference data exists in an entity on the one-side of a one-to-many relationship, and this entity participates in no other database relationships. You should seriously consider merging the two entities in this situation into one record definition when there are few instances of the entity on the many-side for each entity instance on the one-side. See Figure 6-5 in which several ITEMs have the same STORAGE INSTRUCTIONs, and STORAGE INSTRUCTIONs only relate to ITEMs. In this case, the storage instruction data could be stored in the ITEM record, creating, of course, redundancy and potential for extra data maintenance.

14. Advantages and disadvantages of horizontal and vertical partitioning:
Advantages of partitioning:


a. Efficiency: Data used together are stored close to one another and separate from data not used together.
b. Local Optimization: Each partition of data can be stored to optimize performance for its own use.
c. Security: Data not relevant to one group of users can be segregated from data they are allowed to use.
d. Recovery and uptime: Smaller files will take time to recover, and other files are still accessible if one file is damaged, so the effects of damage are isolated.
e. Load balancing: Files can be allocated to different storage areas (disks or other media), which minimize contention for access to the same storage area or even allows for parallel access to the different areas.



Disadvantages of partitioning:
a. Inconsistent access speed: Different partitions may yield different access speeds, thus confusing users. Also, when data must be combined across partitions, users may have to deal with significantly slower response times.
b. Complexity: Partitioning is usually not transparent to programmers, who will have to write more complex programs due to violations of third normal form.
c. Anomalies: Insertion, deletion, and update anomalies are possible, and special programming is required to avoid these problems.
d. Extra space and update time: Data may be duplicated across the partitions, taking extra storage space, compared to storing all the data in normalized files. Updates, which affect data in multiple partitions, can take more time than if one file were used.

15. Seven criteria for selecting a file organization:

a. Data retrieval speed
b. Data input and maintenance transaction processing throughput rate
c. Storage efficiency
d. Failure or data loss protection level
e. Frequency of data reorganization required
f. Ability to accommodate growth
g. Security protection provided

16. The desirability of a bitmap index:


A bitmap is ideal for attributes that have few possible values, which is not true for conventional tree indexes. A bitmap also often requires less storage space (possibly as little as 25 percent) than a conventional tree index (Schumacher, 1997), but for an attribute with many distinct values, a bitmap index can exceed the storage space of a conventional tree index. One bitmap can be used for multiple keys in order to perform searches on elements that would satisfy more than one condition at a time.

17. The benefits of a hash index table:


Using a hashing algorithm allows for rows stored independently of the address, so that whatever file organization makes sense can be used for storage. Also, because index tables are much smaller than a data table, the index can be more easily designed to reduce the likelihood of key collisions or overflows.

18. The purpose of clustering data in a file:


Some database systems allow physical files to contain records with different structures, e.g., rows from different tables may be stored in the same disk area. This clustering reduces the time to access related records compared to the normal allocation of different files to different areas of a disk. Time is reduced since related records will be closer to each other than if the records are stored in separate files in separate areas of the disk.

19. Seven rules of thumb for choosing indexes for relational databases.


a. Indexes are more useful on larger tables.
b. Specify a unique index for the primary key of each table.
c. Indexes are more useful for columns that frequently appear in WHERE clauses of SQL commands, either to qualify the rows to select (e.g., WHERE FINISH = ‘Oak’, for which an index on Finish would speed retrieval ) or for linking (joining) tables (e.g., WHERE PRODUCT.PRODUCT_ID = ORDER_LINE.PRODUCT_ID, for which a secondary key index on Product_ID in the Order_Line table and a primary key index on Product_ID in the Product table would improve retrieval performance). In this second case, the index is on a foreign key in the Order_Line table that is used in joining tables.
d. Use an index for attributes referenced in ORDER BY (sorting) and GROUP BY (categorizing) clauses. You do have to be careful, though, about these clauses. Be sure that the DBMS will, in fact, use indexes on attributes listed in these clauses (e.g., Oracle uses indexes on attributes in ORDER BY clauses but not GROUP BY clauses).
e. Use an index when there is significant variety in the values of an attribute. Oracle suggests that an index is not useful when there are fewer than 30 different values for an attribute, and an index is clearly useful when there are 100 or more different values for an attribute. Similarly, an index will be helpful only if the results of a query that uses that index do not exceed roughly 20 percent of the total number of records in the file (Schumacher, 1997).
f. Check your DBMS for the limit, if any, on the number of indexes allowable per table. Many systems permit no more than 16 indexes and may limit the size of an index key value (e.g., no more than 2000 bytes for each composite value). So, you may have to choose those secondary keys that will most likely lead to improved performance.
g. Be careful about indexing attributes that have null values. For many DBMSs, rows with a null value will not be referenced in the index (so they cannot be found from an index search of ATTRIBUTE = NULL). Such a search will have to be done by scanning the file.

20. Two views of multidimensional databases:


a. Multidimensional table: Each cell contains one or more simple attributes, and the dimensions are ways to categorize the raw data. These categories, or dimensions, are the factors on which users want to summarize or segment the data, such as time periods, geography, lines of business, or people. A cell contains data relevant to the intersection of all of its dimension values. For example, a cell might hold the number of units sold attribute for a given time period, location, line of business, and salesperson.
b. Star-schema: At the center is a fact table, equivalent to the cell in the multidimensional view. This table contains all the raw attributes and a composite key made up of the primary keys of all the surrounding dimension tables. The surrounding dimension tables define each of the ways to categorize data, such as all the description data about each salesperson.

21. How can the use of the EXPLAIN command help in writing a more efficient query:


EXPLAIN plan will show exactly how a query will be submitted to the DBMS for processing. We can see indexes used, servers used, and how tables will be joined. Different execution plans for the query written in several different ways will help identify the least-cost execution for a desired query. The query with the best performance can then be chosen.

22. Four options for optimizing query performance:


a. The most common approach is to replicate the query so that each copy works against a portion of the database, usually a horizontal partition (sets of rows). The partitions need to be defined in advance by the database designer. The same query is run against each portion in parallel on separate processors, and the intermediate results from each processor are combined to create the final query result as if the query were run against the whole database.
b. A second option for using parallel scanning of a table occurs when the query is written. When the DBMS parses a query, it decides, based on statistics it keeps about the database structure, the number of distinct values for fields; and the query, what the best strategy is—called a query plan—for accessing each index and table. The DBMS includes rules that consider these statistics, and the DMBS uses these rules to pick the best query plan. The module of the DBMS that does these calculations is called the cost-based optimizer. The query plan calculated by the cost-based optimizer says what table or index to use first and how to use it, then what table or index to use next, and so on. Oracle includes an ANALYZE command which collects these statistics and stores them for use by the DBMS. It is possible in some DBMS to give the DBMS a hint or suggestion within the query to force the DBMS to process the query in a certain way.
c. Avoid the use of subqueries. SQL allows nesting of queries, or writing one query inside another. The types of queries are less efficient than queries that retrieve the same data set with the subquery.
d. Break complex queries into multiple, simple parts. Because a DBMS may only have one index per query, it is often good to break a complex query down into multiple, simpler parts which each use an index. Then combine the results of the smaller queries together.