Vinodh Kumar Sunkara,
Microsoft Redmond, WA
As part of this assignment, we’ve two main splits:
1.2.Running following clustering algorithms on two data sets
a. K-Means clustering
b. Expectation Maximization
Running following dimensionality reduction algorithms
a. PCA
b. RCA
c. ICA
d. Any other feature selection algorithm: ISCA (Insignificant component analysis) I have used Weka for all my experiments. ICA wasn’t available in Weka and I installed the
Students.Filters weka plugin containing the FastICA algorithm implementation and that added missing IndependentComponents filters under unsupervised attribute filters.
Datasets
I’ve considered Abalone and WineQualityWhite data sets. First data set is continuous and wine quality is mostly …show more content…
Keep it as -1 will take total number of independent features as the components. I’ve set this value as 3.
ISCA
Insignificating component analysis is opposite of principal component analysis because it picks bottom few eigen values instead of top ones. I’ve picked bottom 4 based on a threshold value.
Divergence from initial dataset
After applying the dimensionality reduction algorithms, squared sum and RMSE gives total divergence from initial dataset. Squared sum and RMSE gives the total elements wise error between the initial dataset and the obtained dataset after applying the dimensionality reduction filter. Below are the squared sum and RMSE approximated to two decimals.
Dataset
Filter 3.02
Different cluster evaluation metrics for abalone and winequality datasets corresponding to all four dimensionality reduction algorithms using kmeans and EM clustering mechanisms are shown below:
RCA and ICA degraded the cluster qualities but RCA does it bit lesser than ICA. There isn’t drastic change in different metrics in case of PCA before and after dimensionality reducation algorithm. PCA is the best choice for reducing the dimensions as per my above