Relational Database Case Study

Superior Essays
Related Work:
The problem of classifying objects in a relational database has been studied extensively within the multi-relational data mining community. There are two main approaches to deal with relational structure. The first approach is based on Inductive Logic Programming (ILP) to extend learning techniques so that they can handle relational data. On the other hand, the second approach, called Propositionalization, focuses on aggregating data in a single table so that traditional learning techniques can be applied.

ILP, proposed by {ref}, starts from the basic facts present in the tables and uses induction engines, such as Prolog, to derive rules behind returning a purchase. For example, Customer(Joe, M, 33) and Product(Laptop, HP, $900)
…show more content…
Sometimes, however, class spoilers are attributes that are functionally dependent on the class. For example, if the table Purchase had an attribute reasonForReturn, which indicates the reason for a return (defective product, didn't like the product, etc.), this attribute would be a class spoiler because it would be built once the value of the class attribute is known. Including such attribute in the target table would be incorrect, as the classifier could use it to predict the target attribute with a 100% accuracy. Although class spoilers should be excluded from the target table, they can be used to aggregate information from past data. For example, while we want to exclude the attribute return from the target table, we can include the average value of return among the customer's past purchases. Hence, considering the purchase date, our technique collects important information about past purchasing behavior by exploring rolled paths, at the same time prevents generation of class …show more content…
In our case, we have a Retail database and two external data sources of Suppliers and Reviews. The Product table in the Retail database is the link to external databases. In this paper, we assume that the supplier and review information is already mapped to the Product table in the retail database. Our goal is to attach new attributes to the Purchase table, generated by navigating the Retail, Supplier and Review databases. Our attribute generation procedure is summarized in Algorithm 1.

We generate new attributes by following a similar two step technique, as in previous works, by generating paths first and then aggregating information. Paths are generated from the target table; then, for each path, information is "rolled-up" from the end of the path back to the target table using available aggregate functions, obtaining different

Related Documents

  • Improved Essays

    Nt1320 Unit 2

    • 866 Words
    • 4 Pages

    2. D H C I J E K A B F G 4. SQL-92 was a major revision and was structured into three levels: Entry, Intermediate, and Full.…

    • 866 Words
    • 4 Pages
    Improved Essays
  • Decent Essays

    Hi Professor, Certainly, no data mining, business intelligence, or predictive analysis can be successfully accomplished without relevant data that assure the program integrity: • The USDA needs relevant sources to determine actual income from applicants in order to determined true eligibility. The unemployment office, the banking sector, and the credit score agencies could provide some of this information. • The USDA needs to track down the purchases from the SNAP participants (EBT card) in order to detect abusers from buying more items that the ones needed (baby formula). The USDA could stablish data transfers from participating stores to collect such information. • In my last company unemployment hearing, an ex-employee declared leaving the…

    • 178 Words
    • 1 Pages
    Decent Essays
  • Decent Essays

    The clustering takes a long time to test, observe, modify, and accumulate. 4, estimate the viewed history , purchase, and collection if they can do collaborative…

    • 358 Words
    • 2 Pages
    Decent Essays
  • Improved Essays

    Nt1310 Unit 3 Study Guide

    • 703 Words
    • 3 Pages

    1) What is BI? Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. Companies use BI and several related activities, including data mining, online analytical processing, querying and reporting to improve decision making, cut costs, identify new business opportunities and identify inefficient business processes that are ripe for re-engineering. 2) What is Visual Analytics, dashboards, data warehouse, data dictionary, meta data, ETL, schema, attributes, hierarchy, cube, OLAP, “drill down”, data mining, data mapping.…

    • 703 Words
    • 3 Pages
    Improved Essays
  • Decent Essays

    Db Application Case

    • 584 Words
    • 3 Pages

    The file contains conflicting dates regarding the claimant’s date last worked at the SGA level. The conflict must be resolved to determine the correct onset date. CASE DISCUSSION & POLICY ANALYSIS (INCLUDING SPECIFIC REFERENCES) This 30-year-old claimant is filing a DIB claim alleging disability due to blindness as of 10/01/2016.…

    • 584 Words
    • 3 Pages
    Decent Essays
  • Decent Essays

    Pt2520 Week 2 Assignment

    • 517 Words
    • 3 Pages

    This week we learned lots about the decomposition of our relations we are creating in the database as well how to make the relations into normal forms, which there are three normal forms. I learned that there can be data that is redundant and therefore not useful for the database. This data has no integrity and can be misplaced and even confuse how the data can be used and retrieved. This type of problem must be avoided. I had a hard time understanding how the breaking down of one relation into two or more would help keep all the data from being redundant.…

    • 517 Words
    • 3 Pages
    Decent Essays
  • Superior Essays

    Zillow Case Study Essay

    • 1460 Words
    • 6 Pages

    Introduction In this case study, the business of Zillow.com in providing real estate information to all users to its website is explored, and the use of business intelligence by Zillow.com in its offering to customer, and the way it uses a data mart to market its new product, are discussed. In addition, this case study includes a discussion of various characteristics of information quality as seen from Zillow’s perspective, and how Zillow is using a data-driven website. Analysis Background of Zillow.com Zillow.com is an online web-based real estate site helping homeonwers, buyers to find and share information about real estate and mortgages. It allows users to access information anonoyoumously and free of charge, using the kinds of tools…

    • 1460 Words
    • 6 Pages
    Superior Essays
  • Decent Essays

    This data mining process contains the tools needed to understand the historical data. The information stored and retrieved from the database is key to how the company provides a model for its daily business operations. The Bellagio Hotel / Casino can use the data values to examine and predict past and future guests’ behavior. What else, makes this type of data Management helpful is analyzing information, from collecting the data from the data warehouse after the data has been cleansed of what is useful and profitable for the hotel. Why the pie and review charts are the best practices for classifying data is because both charts have similar data analyzed for the past (historical data) to future (present data).…

    • 339 Words
    • 2 Pages
    Decent Essays
  • Great Essays

    Dataclear is a business data analytics company, based out of Baton Rouge, Louisiana. The company was founded in 1998 by Greg McNally, a graduate of UC Berkeley, with a PhD in computer science. McNally developed his skill working as a software developer for 15 years, “at Borland and Oracle” (Case Study, para.5). DataClear established itself in a market that was wide open and offered very rapid customer and profit growth. Within the first year of operation, the company’s sales hit $2.2 million (Case Study).…

    • 1259 Words
    • 6 Pages
    Great Essays
  • Great Essays

    The task As an assistant manager of an insurance company my task is the prediction of which customers are potentially interested in a caravan insurance policy based on both socio-geographic and personalized data. For model building, data of 4000 customers and 86 variables, including the target variable was available. Also, give an explanation why these customers would buy the caravan insurance company. Make my insights useful and action in order to report it to my boss with no prior knowledge of computational learning technology.…

    • 942 Words
    • 4 Pages
    Great Essays
  • Great Essays

    CHAPTER I INTRODUCTION In today’s business, e-business is very common for utilizing information and communication technology to support all the activities of the business. E-business is a business that runs via online, it is including buying and selling of goods and services through the internet. With e-business, companies are able to link their internal and external data processing system to be more effective and efficient.…

    • 1456 Words
    • 6 Pages
    Great Essays
  • Superior Essays

    1.1 Describe the importance of developing relationships with customers Having a good rapport with your customers is extremely important for the success of any company. Having a solid relationship with your customers can help ensure that the business continues to progress and remain successful. A good way to develop a solid relationship with your customers is to reward loyalty, by simply sending a thank you note or a money off voucher can help to build and maintain brand loyalty, it also helps to introduce incentives such as a customer loyalty program as it will help to entice customers to return.…

    • 1142 Words
    • 5 Pages
    Superior Essays
  • Great Essays

    INTRODUCTION Internal control is “a process, effected by an entity 's board of directors, management, and other personnel, designed to provide reasonable assurance regarding the achievement of objectives relating to operations, reporting, and compliance.” (COSO). A component of internal control is control activities. The purpose of this paper is to analyze the control activities involved in the sales and collection cycle of an Amazon Prime membership. A flowchart will be provided to help illustrate the activities that occur at Amazon when selling a Prime Membership.…

    • 1957 Words
    • 8 Pages
    Great Essays
  • Great Essays

    Attribute selection is the process of identifying and removing as much of the irrelevant and redundant information as possible from a dataset. If the data dimensions are reduced, we will be able to improve the performance of the applied classification algorithms. They will function faster, more efficiently and will be with an improved classification accuracy. As an addition we can also execute data visualization, and increase the insight of the potential classification model [] . There are many different proposed attribute ranking and selection methods.…

    • 3073 Words
    • 13 Pages
    Great Essays
  • Improved Essays

    Data Mining Essay

    • 798 Words
    • 4 Pages

    The topic that I am interested in is Data Mining. This is interesting to me because it can help in various areas of society. This includes the medical field, elderly care, and commerce. This is controversial because of the amount and type of personal data that is being collected. We are living in the “Big Data” era where there are many ways to collect data.…

    • 798 Words
    • 4 Pages
    Improved Essays