We propose a new method for automated large scale gathering of Web images relevant to specified concepts. Our main goal is to build a knowledge base associated with as many conce...
Today, the Web is increasingly used as a platform for distributed services, which transcend organizational boundaries to form federated applications. Consequently, there is a grow...
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...