Article information

2020 , Volume 25, ¹ 3, p.119-129

Oleinik N.S., Schekoldin V.Y.

Development for modification of Torgerson projection method using cumulative curve analysis in outlier detection problem for high-dimensional data

Purpose. Purpose of the article. The paper aims at the development of methods for multidimensional data presentation for solving classification problems based on the cumulative curves analysis. The paper considers the outlier detection problem for high-dimensional data based on the multidimensional scaling, in order to construct high-quality data visualization. An abnormal observation (or outlier), according to D. Hawkins, is an observation that is so different from others that it may be assumed as appeared in the sample in a fundamentally different way.

Methods. One of the conceptual approaches that allow providing the classification of sample observations is multidimensional scaling, representing by the classical Orlochi method, the Torgerson main projections and others. The Torgerson method assumes that when converting data to construct the most convenient classification, the origin must be placed at the gravity center of the analyzed data, after which the matrix of scalar products of vectors with the origin at the gravity center is calculated, the two largest eigenvalues and corresponding eigenvectors are chosen and projection matrix is evaluated. Moreover, the method assumes the linear partitioning of regular and anomalous observations, which arises rarely. Therefore, it is logical to choose among the possible axes for designing those that allow obtaining more effective results for solving the problem of detecting outlier observations. A procedure of modified CC-ABOD (Cumulative Curves for Angle Based Outlier Detection) to estimate the visualization quality has been applied. It is based on the estimation of the variances of angles assumed by particular observation and remaining observations in multidimensional space. Further the cumulative curves analysis is implemented, which allows partitioning out groups of closely localized observations (in accordance with the chosen metric) and form classes of regular, intermediate, and anomalous observations.

Results. A proposed modification of the Torgerson method is developed. The F1-measure distribution is constructed and analyzed for different design options in the source data. An analysis of the empirical distribution showed that in a number of cases the best axes are corresponding to the second, third, or even fourth largest eigenvalues.

Findings. The multidimensional scaling methods for constructing visualizations of multi-dimensional data and solving problems of outlier detection have been considered. It was found out that the determination of design is an ambiguous problem

[full text] [link to elibrary.ru]

Keywords: outliers, multidimensional data, Torgersons main projection method, cumulative curves, CC-ABOD, classification quality measure

doi: 10.25743/ICT.2020.25.3.013

Author(s):
Oleinik Nikita Sergeevich
Position: Leader Expert
Office: Expobank
Address: 630007, Russia, Novosibirsk, 5 Sovetskaya str.
E-mail: olejnik.2015@stud.nstu.ru

Schekoldin Vladislav Yurjevitc
PhD. , Associate Professor
Position: Associate Professor
Office: Novosibirsk State Technical University
Address: Russia, Novosibirsk, Marx avenue, 20, 5 Sovetskaya str.
Phone Office: (383) 346-31-72
E-mail: raix@ngs.ru

References:
1. Timofeev V.S., Faddeenkov A.V., Shchekoldin V.Yu. Ekonometrika: uchebnik dlya akademicheskogobakalavriata [Econometrics: A textbook for academic baccalaureate]. Izdanie 2-e, pererabotannoe i dopolnennoe. Moscow: YuRAYT; 2017: 328. (In Russ.)

2. Teryokhina A.Yu. Methods of multidimensional data scaling and visualization (Survey). Automationand Remote Control. 1973; 34(7):1109–1121.

3. Groshev S.V., Pivovarova N.V. Using the Andrews plotss to visualize multidimensional data in multicriteria optimization. Nauka i obrazovanie. 2015; (12):197–214. (In Russ.)

4. Dai F., Zhu Y., Maitra R. Three-dimensional radial visualization of High-dimensional Continuous orDiscrete Datasets. ArXiv e-prints; 2019: 20.

5. Demsar J., Legan G., Zupan B. FreeViz — an intelligent multivariate visualization approach toexplorative analysis of biomedical data. Journal of Biomedical Informatics. 2007; (40):661–671.

6. Torgerson W.S. Mnogomernoe shkalirovanie. Teoriya i metod. V kn.: Statisticheskoe izmerenie kachestvennykh kharakteristik [Multidimensional scaling. Theory and Method. In book.: Statistical measurement of performance]. Ìoscow: Statistika; 1972: 95. (In Russ.)

7. Powers D. Evaluation: from precision, recall and F-measure to ROC-informedness markedness andcorrelation. J. of Machine Learning Technologies. 2011; 2(1):37–63.

8. Torgerson W.S. Theory and methods of scaling. N.Y.: Wiley; 1958: 245.

9. Oleinik N.S., Schekoldin V.Yu. Identification of anomalous observations in large-dimensional databased on the geometric ABOD approach. Science. Technologies. Innovation: collection of scientific papers: in 9 parts. Novosibirsk: NGTU; 2018: 253–257. (In Russ.)

10. Kriegel H., Schubert M., Zimek A. Angle-based outlier detection in high-dimensional data. Proc. ofthe 14th ACM SIGKDD Intern. Conf. on Knowledge Discovery & Data Mining. Las Vegas; 2008:

444–452.

11. Hawkins D. Identification of outliers. Chapman and Hall; 1980: 127.

12. Oleinik N.S., Shchekoldin V.Yu. Study of the properties of geometric ABOD-approach modificationsfor outlier detection by statistical simulation. Applied methods of statistical analysis. Statistical computation and simulation, AMSA’2019: Proc. of the Intern. Conf. Novosibirsk: NGTU; 2019: 389–395.

13. Shchekoldin V. Developing the risk classification based on ABC-analysis of possible damage andits probability. Intern. Forum: Proc. of 11th Intern. Forum on Strategic Technology (IFOST-2016). Novosibirsk; 2016:317–319.

14. Shchekoldin V.Yu. Vyyavlenie potrebiteley uslug internet-magazinov na osnove AVS-modifikatsii faktornogo analiza [Identification of consumers of online store services based on ABC-modification of factor analysis]. Pt 2. Krasnoyarsk: KGAU; 2011: 186–192. (In Russ.)

15. Barry C.A., Sarabia J.M. Majorization and the Lorenz order with application in applied mathematicsand economics. 2nd edition. Switzerland: Springer; 2018: 251.

Bibliography link:
Oleinik N.S., Schekoldin V.Y. Development for modification of Torgerson projection method using cumulative curve analysis in outlier detection problem for high-dimensional data // Computational technologies. 2020. V. 25. ¹ 3. P. 119-129
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT