We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
Abstract. Portal Catalogs is a popular means of searching for information on the Web. They provide querying and browsing capabilities on data organized in a hierarchy, on a categor...
Eleni G. Christodoulou, Theodore Dalamagas, Timos ...
Complex graphs, in which multi-type nodes are linked to each other, frequently arise in many important applications, such as Web mining, information retrieval, bioinformatics, and...
Bo Long, Zhongfei (Mark) Zhang, Philip S. Yu, Tian...
This paper illustrates a ranking scheme which combines fulltext, anchor text and URL structure for homepage finding in hybrid peer-to-peer networks. The experimental results show...
Text mining concerns the discovery of knowledge from unstructured textual data. One important task is the discovery of rules that relate specific words and phrases. Although exist...