Article information
2013 , Volume 18, ¹ 6, p.62-74
Zagoruiko N.G., Barakhnin V.B., Borisova I.A., Tkachev D.A.
Clusterization of text documents from the database of publications using FRiS-Tax algorithm
In this paper, a successful experience of using the FRiS-Tax algorithm for clustering of text documents, based on function of rival similarity is described. For this type of tasks, advantages of the given algorithm compared to the classical clustering algorithms are shown. A posteriori selected rules for weighting coefficient in the measure of document's similarity determination are found. The way how to use the parallel calculations in some steps of FRiS-algorithm aimed at the speeding up the computations in the text document clustering is offered. Quantitative estimations of the process time are given, which prove the advantage of the parallel realization at different stages of the program. It applies both at preliminary analysis of texts, including similarity measures calculation and at some steps of FRiS-Tax algorithm.
[full text] Keywords: text documents clustering , parallel algorithm for clustering , FRiS-Tax algorithm
Author(s): Zagoruiko Nikolay Grigoryevich Dr. , Professor Position: Head of Laboratory Office: Sobolev Institute of Mathematics SB RAS Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)363 46 83 E-mail: zag@math.nsc.ru Barakhnin Vladimir Borisovich Dr. , Associate Professor Position: Leading research officer Office: Federal Research Center for Information and Computational Technologies Address: 630090, Russia, Novosibirsk, Ac. Lavrentiev ave, 6
Phone Office: (383) 330 78 26 E-mail: bar@ict.nsc.ru SPIN-code: 1541-0448Borisova Irina Artemovna PhD. Position: Senior Research Scientist Office: Sobolev Institute of Mathematics SB RAS Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)36-34-671 E-mail: biamia@mail.ru Tkachev Dmitry Alexandrovich Office: Institute of Computational Technologies SB RAS Address: 630090, Russia, Novosibirsk, prospect Akademika Lavrentjeva, 6
Phone Office: (383) 33-07-826 E-mail: relk-tda@yandex.ru
Bibliography link: Zagoruiko N.G., Barakhnin V.B., Borisova I.A., Tkachev D.A. Clusterization of text documents from the database of publications using FRiS-Tax algorithm // Computational technologies. 2013. V. 18. ¹ 6. P. 62-74
|