The problem of classifying objects in a relational database has been studied extensively within the multi-relational data mining community. There are two main approaches to deal with relational structure. The first approach is based on Inductive Logic Programming (ILP) to extend learning techniques so that they can handle relational data. On the other hand, the second approach, called Propositionalization, focuses on aggregating data in a single table so that traditional learning techniques can be applied.
ILP, proposed by {ref}, starts from the basic facts present in the tables and uses induction engines, such as Prolog, to derive rules behind returning a purchase. For example, Customer(Joe, M, 33) and Product(Laptop, HP, $900) …show more content…
Sometimes, however, class spoilers are attributes that are functionally dependent on the class. For example, if the table Purchase had an attribute reasonForReturn, which indicates the reason for a return (defective product, didn't like the product, etc.), this attribute would be a class spoiler because it would be built once the value of the class attribute is known. Including such attribute in the target table would be incorrect, as the classifier could use it to predict the target attribute with a 100% accuracy. Although class spoilers should be excluded from the target table, they can be used to aggregate information from past data. For example, while we want to exclude the attribute return from the target table, we can include the average value of return among the customer's past purchases. Hence, considering the purchase date, our technique collects important information about past purchasing behavior by exploring rolled paths, at the same time prevents generation of class …show more content…
In our case, we have a Retail database and two external data sources of Suppliers and Reviews. The Product table in the Retail database is the link to external databases. In this paper, we assume that the supplier and review information is already mapped to the Product table in the retail database. Our goal is to attach new attributes to the Purchase table, generated by navigating the Retail, Supplier and Review databases. Our attribute generation procedure is summarized in Algorithm 1.
We generate new attributes by following a similar two step technique, as in previous works, by generating paths first and then aggregating information. Paths are generated from the target table; then, for each path, information is "rolled-up" from the end of the path back to the target table using available aggregate functions, obtaining different