Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plots

dc.contributor.authorThrun, Michael C.en_US
dc.contributor.editorArchambault, Daniel and Nabney, Ian and Peltonen, Jaakkoen_US
dc.date.accessioned2020-05-24T13:27:43Z
dc.date.available2020-05-24T13:27:43Z
dc.date.issued2020
dc.description.abstractFor many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored- Density plot (MDplot). The MDplot allows investigating the distributions of many variables with automatic sampling in case of large datasets. Statistical testing of clusterability is compared with MDplots of the 1st principal component and the distance distribution of data. Contradicting results are evaluated with topographic maps of cluster structures derived from planar projections using the generalized U-Matrix technique. A collection of artificial and natural datasets is used for the comparison. This collection is specially designed to have a variety of clustering problems that any algorithm should be able to handle. The results demonstrate that the MDplot improves statistical testing but, even then, almost touching cluster structures of low intercluster distances without a predominant direction of variance remain challenging.en_US
dc.description.sectionheadersPapers
dc.description.seriesinformationMachine Learning Methods in Visualisation for Big Data
dc.identifier.doi10.2312/mlvis.20201102
dc.identifier.isbn978-3-03868-113-7
dc.identifier.pages19-23
dc.identifier.urihttps://doi.org/10.2312/mlvis.20201102
dc.identifier.urihttps://diglib.eg.org:443/handle/10.2312/mlvis20201102
dc.publisherThe Eurographics Associationen_US
dc.subjectInformation systems
dc.subjectClustering
dc.titleImproving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plotsen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
019-023.pdf
Size:
8.59 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
1013-file2.gz
Size:
2.58 MB
Format:
Unknown data format