Detecting Outliers in High Dimensional Categorical Data through Feature Selection

Authors

  • N N R Ranga Suri Centre for Artificial Intelligence and Robotics (CAIR)
  • M Narasimha Murty Department of CSA, Indian Institute of Science (IISc)
  • G Athithan Centre for Artificial Intelligence and Robotics (CAIR)

Keywords:

Data Mining, Outlier detection, Categorical data, Entropy, Mutual information

Abstract

Extensive use of qualitative features for describing categorical data leads to high dimensional scenario in which outlier detection turns out to be a challenging task due to data sparseness. The curse of dimensionality has been well addressed in the case of numerical data by developing various feature selection methods, whereas the categorical data scenario is actively being explored. As the outlier detection problem is generally known to be unsupervised in nature due to lack of knowledge about various types of outliers, a novel unsupervised feature selection method is proposed in this paper for effective detection of outliers in categorical data. The proposed algorithm establishes the relevance and the redundancy of a feature through the entropy and the mutual information computation.
By measuring the inherent redundancy of the features describing a data set, a threshold is applied on the allowed maximum
redundancy of a candidate feature with already selected subset of features. This way of selecting features among the relevant ones results in a feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Downloads

Download data is not yet available.

Downloads

Published

2013-01-01

How to Cite

N N R Ranga Suri, M Narasimha Murty, & G Athithan. (2013). Detecting Outliers in High Dimensional Categorical Data through Feature Selection. Journal of Network and Innovative Computing, 1, 10. Retrieved from https://cspub-jnic.org/index.php/jnic/article/view/13

Issue

Section

Original Article