Document Type
Thesis - Open Access
Award Date
2018
Degree Name
Master of Science (MS)
Department / School
Mathematics and Statistics
First Advisor
Semhar Michael
Keywords
Custering, directional distributions, expectation maximization, mixtures, variable selection, von Mises-Fisher
Abstract
Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering data on a unit hypersphere, but variable selection for these models remains an important and challenging problem. In this paper, we derive two variants of the expectation-maximization framework, which are each used to identify a specific type of irrelevant variables for these models. The first type are noise variables, which are not useful for separating any pairs of clusters. The second type are redundant variables, which may be useful for separating pairs of clusters, but do not enable any additional separation beyond the separability provided by some other variables. Removing these irrelevant variables is shown to improve cluster quality in simulated as well as benchmark datasets.
Library of Congress Subject Headings
Data mining.
Cluster analysis.
Description
Includes bibliographical references
Format
application/pdf
Number of Pages
41
Publisher
South Dakota State University
Recommended Citation
Bayer, Damon, "Variable Selection Techniques for Clustering on the Unit Hypersphere" (2018). Electronic Theses and Dissertations. 2652.
https://openprairie.sdstate.edu/etd/2652