We analyze the persistence of information on the web, looking at the percentage of invalid URLs contained in academic articles within the CiteSeer (ResearchIndex) database. The nu...
Steve Lawrence, Frans Coetzee, Gary William Flake,...
Completely handling SQL injection consists of two activities: properly protecting the system from malicious input, and preventing any resultant error messages caused by SQL injecti...
: The World-Wide-Web contains data that cannot be constrained by a schema. Another source for such data is heterogenous corporate systems which are integrated in order to get bette...
Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing...
Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover pre...