Article information
2008 , Volume 13, Special issue, p.93-101
Hmelnov A.E., Shigarov A.O.
A method for tables extraction from a plain text
The problem of tables extraction is a part analysis of documents. Different approaches to this problem are usually based on certain media and formats. A heuristic method for a plain text table extraction from an unformatted and formatted documents is considered in this paper. This method uses some particular properties of the statistical tables, and it can also be applied to the tables of the similar structure. Additionally, the model of the table structure is proposed, which allows to transform automatically the contents of the extracted tables into relational tables.
[full text] Keywords: Document analysis and processing, information extraction, table extraction
Author(s): Hmelnov Alexey Evgenievich PhD. , Associate Professor Position: Head of Laboratory Office: Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences Address: 664033, Russia, Irkutsk, 134 Lermontov str.
Phone Office: (3952) 45-30-71 E-mail: hmelnov@icc.ru SPIN-code: 8041-3667Shigarov Alexei Olegovich PhD. Position: Senior Research Scientist Office: Institute for System Dynamics and Control Theory, Siberian Branch of RAS Address: 664033, Russia, Irkutsk, 134 Lermontov str.
Phone Office: (3952) 45-31-02 E-mail: shigarov@icc.ru
Bibliography link: Hmelnov A.E., Shigarov A.O. A method for tables extraction from a plain text // Computational technologies. 2008. V. 13. Special issue 1. P. 93-101
|
|
|