In this paper, a successful experience of using the FRiS-Tax algorithm for clustering of text documents, based on function of rival similarity is described. For this type of tasks, advantages of the given algorithm compared to the classical clustering algorithms are shown. A posteriori selected rules for weighting coefficient in the measure of document's similarity determination are found. The way how to use the parallel calculations in some steps of FRiS-algorithm aimed at the speeding up the computations in the text document clustering is offered. Quantitative estimations of the process time are given, which prove the advantage of the parallel realization at different stages of the program. It applies both at preliminary analysis of texts, including similarity measures calculation and at some steps of FRiS-Tax algorithm.
Author(s):
Zagoruiko Nikolay GrigoryevichDr. , Professor
Position: Head of Laboratory
Office: Sobolev Institute of Mathematics SB RAS
Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)363 46 83
E-mail: zag@math.nsc.ru
Barakhnin Vladimir BorisovichDr. , Associate Professor
Position: Leading research officer
Office: Federal Research Center for Information and Computational Technologies
Address: 630090, Russia, Novosibirsk, Ac. Lavrentiev ave, 6
Phone Office: (383) 330 78 26
E-mail: bar@ict.nsc.ru
SPIN-code: 1541-0448
Borisova Irina ArtemovnaPhD.
Position: Senior Research Scientist
Office: Sobolev Institute of Mathematics SB RAS
Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)36-34-671
E-mail: biamia@mail.ru
Tkachev Dmitry AlexandrovichOffice: Institute of Computational Technologies SB RAS
Address: 630090, Russia, Novosibirsk, prospect Akademika Lavrentjeva, 6
Phone Office: (383) 33-07-826
E-mail: relk-tda@yandex.ru