A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source pr...
Chao Liu 0001, Chen Chen, Jiawei Han, Philip S. Yu
In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph....
Delia Rusu, Blaz Fortuna, Dunja Mladenic, Marko Gr...
Complex network analysis is a growing research area in a wide variety of domains and has recently become closely associated with data, text and web mining. One of the most active ...
Cristian Klen dos Santos, Alexandre Evsukoff, Beat...
— Web usage mining exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web (WWW) users. The required information is captured b...
Murat Ali Bayir, Ismail Hakki Toroslu, Ahmet Cosar