|
Article information
2025 , Volume 30, ¹ 6, p.98-108
Krasnov F.V., Kurushin F.I.
Reducing the long tail in e-commerce via self-attention: C2T model
Purpose. To develop self-attention methods offor boosting the efficiency of product search on a high-load electronic trading Internet platform. Method. In e-commerce, the problem of the “long tail” of queries is well known, which prevents the creation of effective reverse indexes on string representations of queries. Deep learning systems for extracting products cannot cope with the full flow of requests in the required time, so popular queries need to be indexed — cached to speed up processing. However, according to analysts, the share of popular queries is less than 10% of all search queries, which makes indexing ineffective. It is difficult to translate most of the queries into the index due to the presence of up to 15% of typos and different query formulations, therefore, the task of bringing identical search queries to a single form by paraphrasing and deleting words is extremely urgent. Approaches to reducing the diversity of search queries based on textual proximity, fuzzy hashing and the collaborative paradigm are widely used, but have limitations in processing subject entities (brands, characteristics of goods and objects) that are critical for product search. The increasing efficiency of large language models in relation to text transformation tasks prompted the authors to use independent elements of the LMMarchitecture to solve problems in which it is expensive to use LLM entirely due to the heavy load. Findings. The study tested a new approach to positional weighting of search query tokens based on the influence of the query context, taking into account subject entities. The implementation of this approach in the form of a C2T machine learning model has reduced the variety of search queries by 48% according to the Perplexity metric. Value. The simplicity of the proposed method is a must for high load e-commerce environments.
Keywords: query reformulation, query index, semantic assessment, reverse index, perplexity
doi: 10.25743/ICT.2025.30.6.007
Author(s): Krasnov Fedor Vladimirovich PhD. Office: Skolkovo Innovation Center Address: 121205, Russia, Moscow, Nobel St., 5
E-mail: krasnov.fedor2@wb.ru SPIN-code: 8650-1127Kurushin Fedor Ivanovich Position: Student Office: Skolkovo Innovation Center Address: 121205, Russia, Moscow, Nobel St., 5
E-mail: kurushin.fedor@wb.ru SPIN-code: 8985-1880 Bibliography link: Krasnov F.V., Kurushin F.I. Reducing the long tail in e-commerce via self-attention: C2T model // Computational technologies. 2025. V. 30. ¹ 6. P. 98-108
|