Statistical machine learning techniques for data classification usually assume that all entities are i.i.d. (independent and identically distributed). However, real-world entities...
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition...
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the e...
Data reduction plays an important role in machine learning and pattern recognition with a high-dimensional data. In real-world applications data usually exists with hybrid formats...
Personalization of web search results as a technique for improving user satisfaction has received notable attention in the research community over the past decade. Much of this wo...