Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hie...
News articles about the same event published over time have properties that challenge NLP and IR applications. A cluster of such texts typically exhibits instances of paraphrase a...
Several algorithms based on link analysis have been developed to measure the importance of nodes on a graph such as pages on the World Wide Web. PageRank and HITS are the most pop...
We propose a new technique for clustering of text documents that relies on a biclustering structure constructed on terms and documents. Our approach makes use of a greedy algorith...
Most clustering algorithms are partitional in nature, assigning each data point to exactly one cluster. However, several real world datasets have inherently overlapping clusters i...