Abstract Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organiza...
The project DIANE is an EU-funded project in the ACTS Program and started in September 1995. The goal of DIANE is to establish a service, which enables an user to annotate anythin...
Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achieving high text l...
Faisal Shafait, Joost van Beusekom, Daniel Keysers...
The standard layout model used by web browsers is to lay text out in a vertical scroll using a single column. The horizontal-scroll layout model--in which text is laid out in colu...
Cameron Braganza, Kim Marriott, Peter Moulder, Mic...
Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...