Article information
2025 , Volume 30, ¹ 1, p.51-63
Avdeenko T.V., Timofeeva A.Y.
Method for optimal grouping of factor levels in ANOVA model
Purpose. The purpose of this work is to formulate the problem of grouping factor levels in the ANOVA model. The model has a fixed number of factor level groups that establish a mixed integer linear programming problem for finding a global optimum. Methodology. The problem of grouping the levels of input factors in the ANOVA model can be considered as a special case of feature selection, where the features are paired comparisons between factor levels. Such problem can be solved by embedded methods (CAS-ANOVA algorithm) or wrapper methods (agglomerative merging method). Globally optimal solution (according to the given criterion) can be found using these methods in particular cases. In contrast, the proposed linearization-based approach always leads to the best solution. Results. Two methods were proposed for linearizing of the explained sum of squares for grouping levels in ANOVA model. The linearization methods were compared in terms of computation time. A comparative study of methods for grouping factor levels was carried out using the example of barley yield analysis. Findings. The second linearization method, which includes a smaller number of binary variables, has advantages in terms of computation time. The proposed approach, in contrast to the CAS-ANOVAalgorithm and the agglomerative merging method, ensures the achievement of a global optimum. Due to computational complexity, an application of this method is recommended when the optimal number of factor level groups is small.
[link to elibrary.ru]
Keywords: analysis of variance, CAS-ANOVA, grouping, factor level, linearization, global optimum
doi: 10.25743/ICT.2025.30.1.006
Author(s): Avdeenko Tatiana Vladimirovna Dr. , Professor Position: Professor Office: Novosibirsk State Technical University Address: 630073, Russia, Novosibirsk, 20, prospekt K. Marksa
E-mail: avdeenko@corp.nstu.ru SPIN-code: 1085-2099Timofeeva Anastasiia Yurievna PhD. Position: Associate Professor Office: Novosibirsk State Technical University Address: 630073, Russia, Novosibirsk, 20 Prospekt K. Marksa
E-mail: a.timofeeva@corp.nstu.ru SPIN-code: 8980-1611 References: 1. Ladha L., Deepa T. Feature selection methods and algorithms. International Journal on Computer Science and Engineering. 2011; 3(5):1787–1797.
2. Flach P. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press; 2012: 396.
3. Boullé M., Ridgeway G. A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research. 2005; 6(9):1431–1452.
4. Boullé M. A robust method for partitioning the values of categorical attributes. Revue des Nouvelles Technologies de l’Information, Extraction et Gestion des Connaissances. 2004; (II):173–182. Available at: https://www.researchgate.net/publication/220786392_A_robust_method_for_partitioning_the_values_of_categorical_attributes.
5. Bondell H.D., Reich B.J. Simultaneous factor selection and collapsing levels in ANOVA. Biometrics. 2009; 65(1):169–177.
6. Post J.B., Bondell H.D. Factor selection and structural identification in the interaction ANOVA model. Biometrics. 2013; 69(1):70–79.
7. Avdeenko T.V., Timofeeva A.Y., Murtazina M.S., Razumnikova O.M. Changes in the intelligence levels and structure in Russia: an ANOVA method based on discretization and grouping of factors. Applied Sciences. 2021; 11(13):Art. 5864.
8. Prokopyev O.A., Meneses C.N., Oliveira C.A.S., Pardalos P.M. On multiple-ratio hyperbolic 0–1 programming problems. Pacific Journal of Optimization. 2005; 1(2):327–345.
9. Asghari M., Fathollahi-Fard A.M., Mirzapour Al-e-Hashem S.M.J., Dulebenets M.A. Transformation and linearization techniques in optimization: a state-of-the-art survey. Mathematics. 2022; 10(2):Art. 283.
10. Immer R.F., Hayes H.K., Powers L. Statistical determination of barley varietal adaptation. Journal of the American Society of Agronomy. 1934; (26):403–419.
11. R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria; 2021. Available at: https://www.academia.edu/30403703/R_A_Language_and_Environment_for_Statistical_Computing.
12. Sallan J.M., Lordan O., Fernandez V. Modeling and solving linear programming with R. OmniaScience. 2015: 106. Available at: https://www.omniascience.com/books/index.php/scholar/catalog/download/34/154/184-1?inline=1. Bibliography link: Avdeenko T.V., Timofeeva A.Y. Method for optimal grouping of factor levels in ANOVA model // Computational technologies. 2025. V. 30. ¹ 1. P. 51-63
|