About OpenKnowledge@NAU | For NAU Authors

A probabilistic method for detecting multivariate extreme outliers

Jibrin, Shafiu and Pressman, Irwin S. and Salibian-Barrera, Matias (2004) A probabilistic method for detecting multivariate extreme outliers. International Journal of Nonlinear Sciences and Numerical Simulation, 5 (2). 157–170. ISSN 2191-0294

[img]
Preview
Text
Jibrin_S_etal_2004_Probabilistic_Method_Multivariate_Extreme_Outliers.pdf

Download (3MB) | Preview
Publisher’s or external URL: http://dx.doi.org/10.1515/IJNSNS.2004.5.2.157

Abstract

Given a data set arising from a series of observations, an outlier is a value that deviates substantially from the natural variability of the data set as to arouse suspicions that it was generated by a different mechanism. We call an observation an extreme outlier if it lies at an abnormal distance from the "center" of the data set. We introduce the Monte Carlo SCD algorithm for detecting extreme outliers. The algorithm finds extreme outliers in terms of a subset of the data set called the outer shell. Each iteration of the algorithm is polynomial. This could be reduced by preprocessing the data to reduce its size. This approach has an interesting new feature. It estimates a relative measure of the degree to which a data point on the outer shell is an outlier (its "outlierness"). This measure has potential for serendipitous discoveries in data mining where unusual or special behavior is of interest. Other applications include spatial filtering and smoothing in digital image processing. We apply this method to baseball data and identify the ten most exceptional pitchers of the 1998 American League. To illustrate another useful application, we also show that the SCD can be used to reduce the solution time of the D-optimal experimental design problem.

Item Type: Article
Publisher’s Statement: The final publication is available at www.degruyter.com
ID number or DOI: 10.1515/IJNSNS.2004.5.2.157
Keywords: Depth; D-optimal design; extreme outliers; location; matrix inequality constraints; Monte Carlo; outlierness; redundancy; Semidefinite programming
Subjects: Q Science > QA Mathematics
NAU Depositing Author Academic Status: Faculty/Staff
Department/Unit: College of Engineering, Forestry, and Natural Science > Mathematics and Statistics
Date Deposited: 16 Feb 2016 23:31
URI: http://openknowledge.nau.edu/id/eprint/262

Actions (login required)

IR Staff Record View IR Staff Record View

Downloads

Downloads per month over past year