• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/22

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

22 Cards in this Set

  • Front
  • Back

What is the difference between a Clustered and NonClustered index? For what type of queries each index is good?

A clustered index determines the order in which the rows of a table are stored on disk. If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index. An example will help clarify what we mean by that.




An example of a clustered indexSuppose we have a table named Employee which has a column named EmployeeID. Let’s say we create a clustered index on the EmployeeID column. What happens when we create this clustered index? Well, all of the rows inside the Employee table will be physically – sorted (on the actual disk) – by the values inside the EmployeeID column. What does this accomplish? Well, it means that whenever a lookup/search for a sequence of EmployeeID’s is done using that clustered index, then the lookup will be much faster because of the fact that the sequence of employee ID’s are physically stored right next to each other on disk – that is the advantage with the clustered index. This is because the rows in the table are sorted in the exact same order as the clustered index, and the actual table data is stored in the leaf nodes of the clustered index.Remember that an index is usually a tree data structure – and leaf nodes are the nodes that are at the very bottom of that tree. In other words, a clustered index basically contains the actual table level data in the index itself. This is very different from most other types of indexes as you can read about below.When would using a clustered index make sense?Let’s go through an example of when and why using a clustered index would actually make sense. Suppose we have a table named Owners and a table named Cars. This is what the simple schema would look like – with the column names in each table:OwnersOwner_NameOwner_AgeCarsCar_TypeOwner_NameLet’s assume that a given owner can have multiple cars – so a single Owner_Name can appear multiple times in the Cars table.Now, let’s say that we create a clustered index on the Owner_Name column in the Cars table. What does this accomplish for us? Well, because a clustered index is stored physically on the disk in the same order as the index, it would mean that a given Owner_Name would have all his/her car entries stored right next to each other on disk. In other words, if there is an owner named “Joe Smith” or “Raj Gupta”, then each owner would have all of his/her entries in the Cars table stored right next to each other on the disk.When is using a clustered index an advantage?What is the advantage of this? Well, suppose that there is a frequently run query which tries to find all of the cars belonging to a specific owner. With the clustered index, since all of the car entries belonging to a single owner would be right next to each other on disk, the query will run much faster than if the rows were being stored in some random order on the disk. And that is the key point to remember!Why is it called a clustered index?In our example, all of the car entries belonging to a single owner would be right next to each other on disk. This is the “clustering”, or grouping of similar values, which is referred to in the term “clustered” index.Note that having an index on the Owner_Name would not necessarily be unique, because there are many people who share the same name. So, you might have to add another column to the clustered index to make sure that it’s unique.What is a disadvantage to using a clustered index?A disadvantage to using a clustered index is the fact that if a given row has a value updated in one of it’s (clustered) indexed columns what typically happens is that the database will have to move the entire row so that the table will continue to be sorted in the same order as the clustered index column. Consider our example above to clarify this. Suppose that someone named “Rafael Nadal” buys a car – let’s say it’s a Porsche – from “Roger Federer”. Remember that our clustered index is created on the Owner_Name column. This means that when we do a update to change the name on that row in the Cars table, the Owner_Name will be changed from “Roger Federer” to “Rafael Nadal”.But, since a clustered index also tells the database in which order to physically store the rows on disk, when the Owner_Name is changed it will have to move an updated row so that it is still in the correct sorted order. So, now the row that used to belong to “Roger Federer” will have to be moved on disk so that it’s grouped (or clustered) with all the car entries that belong to “Rafael Nadal”. Clearly, this is a performance hit. This means that a simple UPDATE has turned into a DELETE and then an INSERT – just to maintain the order of the clustered index. For this exact reason, clustered indexes are usually created on primary keys or foreign keys, because of the fact that those values are less likely to change once they are already a part of a table.A comparison of a non-clustered index with a clustered index with an exampleAs an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the EmployeeID AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like EmployeeName, EmployeeAddress, etc. will all actually be stored in the leaf node of the clustered index itself.This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.A table can have multiple non-clustered indexesA table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.Why can a table have only one clustered index?Because a clustered index determines the order in which the rows will be stored on disk, having more than one clustered index on one table is impossible. Imagine if we have two clustered indexes on a single table – which index would determine the order in which the rows will be stored? Since the rows of a table can only be sorted to follow just one index, having more than one clustered index is not allowed.Summary of the differences between clustered and non-clustered indexesHere’s a summary of the differences:A clustered index determines the order in which the rows of the table will be stored on disk – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.A table can have multiple non-clustered indexes. But, a table can have only one clustered index.Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.

Why OLTP database design not generally a good idea for a Database Warehouse?

The data warehouse obtains data from one or more source systems which could be OLTP systems. So it could happen that the OLTP system and the data warehouse contain the same data. What distinguishes the data warehouse from the OLTP system is the data organisation. The data warehouse organises its data for query, analysis and reporting.

What type of schemas you know in DataWarehouse DBs? What are the differences? Is there any performance differences?

The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database.The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on.------------Snowflake------------? If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking may be appropriate. ? A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement. ? A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database. ? Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snow flaking would typically be required to permit simple query tools such as Cognos Power play to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed. ---------Star----------The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.

What is Dimensional Hierarchy in OLAP ? Why we need it?

1. Dimensional Hierarchy in OBIEEDimensions are categories of attributes by which the business is defined. Common dimensions are time periods, products, markets, customers, suppliers, promotion conditions, raw materials, manufacturing plants, transportation methods, media types, and time of day. Within a given dimension, there may be many attributes. For example, the time period dimension can contain the attributes day, week, month, quarter, and year. Exactly what attributes a dimension contains depends on the way the business is analyzed.A dimensional hierarchy expresses the one-to-many relationships between attributes.Given a sample time dimension, consider the hierarchies it implies, as shown below:With this sample time dimension, days may aggregate, or roll up, into weeks. Months may roll up into quarters, and quarters into years. When one attribute rolls up to another, it implies a one-tomany relationship. The total of all the hierarchy definitions in this sample time dimension make up this time dimension.2. Dimensional Hierarchy in OLAP Analytical Workspace:A hierarchy is a way to organize data at different levels of aggregation. In viewing data, analysts use dimension hierarchies to recognize trends at one level, drill down to lower levels to identify reasons for these trends, and roll up to higher levels to see what affect these trends have on a larger sector of the business.There are two types of hierarchies:Level-based Hierarchy: Level-based dimensions that use parent-child relationships to group members into levels. Most dimensions are level-based.Value-based Hierarchy: Value-based dimensions that have parent-child relationships among their members, but these relationships do not form meaningful levels.

If we need to limit the access to a data in a cube, How do you do it?

Database and Cube Level SecurityDatabase and cube level security can be maintained in either BIDS or Management Studio (if you are using a SSAS Project, which helps with deployment and version control, remember each deployment will overwrite the current security unless you use the deployment wizard (more details on the deployment wizard can be found at http://msdn.microsoft.com/en-us/library/ms174817(v=sql.105).aspx). My preference is to use BIDS, although the screens are very similar in either Management Studio or BIDS. In order to create a new role, simply right click on Roles and Select New Roles.As shown on the below screen print, step 1 is to define the Role Name in the properties windows; use a name which is both descriptive and meaningful. If the properties window is not visible, select View > Properties (or hit F4). Next, fill in the Role Description as needed. As important as the name is, the three other check boxes below the role description play a vital role in defining database level access to the members of this role. Each check box is described next: Full control: This permission grants full access to this particular SSAS database. Members with this permission have similar access rights as the server role noted above. However, these permissions apply to this database only, and not to all the databases within the SSAS instance. Full control grants members access to add other users and to process and maintain databases, cubes, and dimensions.Process database: This permission allows a member to process this database and its related cubes and dimensions.Read definitions: This permission allows role members to read the database metadata. Granting this access, though, does not provide access to read the metadata of other objects within the database such as cube meta data or dimension metadata. This lower level meta data access will be discussed later in the article.Often, none of these check boxes are checked for regular users of the cube.Switching to the members tab allows the administrator to add specific users to the role. Membership can be added either by typing the individual users or by using the windows Advanced > Find option. Again, you can add active directory groups, local windows groups, domain users, or local users; however, SQL users cannot be used! The Add members processes is illustrated in the following screen print. Of course for manageability, using groups is recommended. Moving on to the Data Source tab, intuitively, it would seem that a user would need access to the data source. However, granting access to the data source actually grants permissions to the underlying data sources of the SSAS project or database. Generally, this option should be set to None, unless users are utilizing data mining structures or links to external datasets.Switching to the Cubes tab finally provides methods which will grant read data access to role members. If a SSAS database has multiple cubes, access is given on a cube by cube basis and is broken into the following three categories as described below and illustrated subsequently:AccessNone: Role members are not able to access this particular cubeRead: Users can read data from the cube, but not write data back to the cube. This option is most commonly selectedRead/Write: Users can read data from cube and write data back to the cube (for additional details on write back see:http://ssas-wiki.com/w/Articles#Write_Back )Local Cube/Drillthrough AccessNone: No Drillthough or local cube creationDrill through: Drillthrough is permitted for this cube (see Cube > Actions tab)Drill through and Local Cube: Drillthrough is permitted as is the creation of local cubes from the Server cube.Process:Role members are able to process this particular cube.Remember, if access is not specifically granted to the cube, end users will not even see the cube from their client applications.The Cell Data tab affords administrators the ability to granularly set read, read-contingent, and read/write permissions to role members. Once any of these options are enabled by selecting the appropriate check box, the administrator must enter a MDX expression which defines the cells which are available or restricted for the role members. Although, Cell Data permission restrict access to certain cells or cell ranges, these restrictions work hand and hand with the dimension permissions which will be subsequently discussed. Additionally, if certain cells are restricted, but other non-restricted calculated cells are derived from that restricted values, users may be able to make a determination of a particular value. To see additional details about this complex area of security see http://cwebbbi.wordpress.com/2008/05/20/cell-security-when-read-permissions-are-actually-read-contingent/.The Dimension tab assigns privileges to role members at the dimension level. First, two options exists at the dimensions level, either Read or Read and Write. Roles whose access is set to none will not even see the dimension marked as such. Furthermore, roles can be granted Read Definition access which allows for the role members to read the metadata concerning the dimension. Similar to the Cubes Access Rights, granting Process rights to a role, allows the members to process that particular dimension. The next tab to the right of the dimension tab is the Dimensions data tab; moving from dimension security at the dimension level, the dimension data tab allows the administrator to restrict or deny access to certain attribute values. For example, if a sales group only should have access to sales data in their state geographic territory, then a role could be designed to only allow access the sales person's assigned states. As seen in the next illustration, two options for entering the dimension restrictions: the basic option which allows the selection of specific values to display or restrict. The advanced option relies on the entry of MDX queries for generating the list of available or restrict access points.One caveat in the use of Dimension Data security; total rows will display the total for all values in the dimension even those that restricted which in turn could allow end users to determine the value of the restricted attributes. To have the end client reporting tools only rollup the unrestricted values, check the enable Visual Totals check box, displayed at the bottom of the following illustration, which will then only rollup the unrestricted values. Please note, this feature can cause slower performance.Conclusion-SecurityImplementing SSAS Security is as important as every other methods of restricting organizational data. AS a DBA/ DWA, we are ultimately responsible for providing not only the correct and appropriate data (all very quickly), but we also must provide such data using a method and path which appropriately protects access to a SSAS database with such data. Therefore, SSAS allows us to setup the following security points:who administers the databasewho processes the databases, cubes, and dimensionswho can access the meta data about the cubes and dimensionswho can read data from the SSAS databaseincluding drill throughdimension level and dimension data security

What are different storage mode options in SSAS? How do you decide when to use each of them? What are the differences? What the Advantages and Disadvantages of each technology? How they effect on performance?

Just to save.

If you need to create your own measure that holds some logic, How do you do it? What are Additive, Semi additive and No additive measures?

Just to save.

How would you optimize the dimensions?

Just to save.

What really ‘Optimize schema’ option is SSAS does?

Just to save.

Can you give an example when you need to create two measure groups?

Just to save.

What is the difference between attribute hierarchy and user hierarchy?
By default, an attribute hierarchy is created for every attribute in a dimension, and each hierarchy is available for dimensioning fact data. This hierarchy consists of an "All" level and a detail level containing all members of the hierarchy. [...], you can organize attributes into user-defined hierarchies to provide navigation paths in a cube.So, each attribute is in fact an attribute hierarchy: the all-level and the members of the attribute. (for example: (All - red, blue, yellow, ...)The user-defined hierarchies are the ones you create in the hierarchy pane of your dimension in SSAS. (for example: Product - Subcategory - Category)

What are fact and dimension tables?

Fact tables are measurements of business process. Dimension tables are textual attributes of the measurements in fact tables. The fact has a foreign key relationship with the dimension.

What is data visualization?

A much advanced, direct, precise and ordered way of viewing large volumes of data is called data visualization. It is the visual representation of data in the form of graphs and charts, especially when you can’t define it textually. You can show trends, patters and correlations through various data visualization software and tools; Tableau is one such data visualization software used by businesses and corporates.

2.What are the differences between Tableau desktop and Tableau Server?

While Tableau desktop performs data visualization and workbook creation, Tableau server is used to distribute these interactive workbooks and/or reports to the right audience. Users can edit and update the workbooks and dashboards online or Server but cannot create new ones. However, there are limited editing options when compared to desktop.

3.Define parameters in Tableau and their working.

Tableau parameters are dynamic variables/values that replace the constant values in data calculations and filters. For instance, you can create a calculated field value returning true when the score is greater than 80, and otherwise false. Using parameters, one can replace the constant value of 80 and control it dynamically in the formula.

4.Differentiate between parameters and filters in Tableau.

The difference actually lies in the application. Parameters allow users to insert their values, which can be integers, float, date, string that can be used in calculations. However, filters receive only values users choose to ‘filter by’ the list, which cannot be used to perform calculations.Users can dynamically change measures and dimensions in parameter but filters do not approve of this feature.

5.What are fact table and Dimension table in Tableau?

—>Facts are the numeric metrics or measurable quantities of the data, which can be analyzed by dimension table. Facts are stores in Fact table that contain foreign keys referring uniquely to the associated dimension tables. The fact table supports data storage at atomic level and thus, allows more number of records to be inserted at one time. For instance, a Sales Fact table can have product key, customer key, promotion key, items sold, referring to a specific event.—>Dimensions are the descriptive attribute values for multiple dimensions of each attribute, defining multiple characteristics. A dimension table ,having reference of a product key form the fact table, can consist of product name, product type, size, color, description, etc.

6.What are Quick Filters in Tableau?

Global quick filters are a way to filter each worksheet on a dashboard until each of them contains a dimension. They are very useful for worksheets using the same data source, which sometimes proves to a disadvantage and generate slow results. Thus, parameters are more useful.

7.State limitations of parameters in Tableau.

Parameters facilitate only four ways to represent data on a dashboard (which are seven in quick filters). Further, parameters do not allow multiple selections in a filter.

8.What is aggregation and disaggregation of data in Tableau?

Aggregation and disaggregation in Tableau are the ways to develop a scatterplot to compare and measure data values. As the name suggests, aggregation is the calculated form of a set of values that return a single numeric value. For instance, a measure with values 1,3,5,7 returns 1. You can also set a default aggregation for any measure, which is not user-defined. Tableau supports various default aggregations for a measure like Sum, average, Median, Count and others.Disaggregating data refers to viewing each data source row, while analyzing data both independently and dependently.

9.What is Data Blending?

Unlike Data Joining, Data Blending in tableau allows combining of data from different sources and platforms. For instance, you can blend data present in an Excel file with that of an Oracle DB to create a new dataset.

10. What is a context filter?

The concept of context filter in Tableau makes the process of filtering smooth and straightforward. It establishes a filtering hierarchy where all other filters present refer to the context filter for their subsequent operations. The other filters now process data that has been passed through the context filter.Creating one or more context filters improves performance as users do not have to create extra filters on large data source, reducing the query-execution time.You can create by dragging a filed into ‘Filters’ tab and then, Right-Click that field and select ‘’Add to Context”