As the largest online marketplace in the world, eBay has a huge inventory where there are plenty of great rare items with potentially large, even rapturous buyers. These items are...
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Social media such as Web forum often have dense interactions between user and content where network models are often appropriate for analysis. Joint non-negative matrix factorizat...
Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with ot...
Yang Song, Jian Huang 0002, Isaac G. Councill, Jia...
Documents in the Web are often organized using category trees by information providers (e.g. CNN, BBC) or search engines (e.g. Google, Yahoo!). Such category trees are commonly kn...