Sciweavers

2190 search results - page 342 / 438
» Unweaving a web of documents
Sort
View
DRR
2008
15 years 7 months ago
Hybrid approach combining contextual and statistical information for identifying MEDLINE citation terms
There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resou...
In-Cheol Kim, Daniel X. Le, George R. Thoma
DGO
2006
134views Education» more  DGO 2006»
15 years 7 months ago
Next steps in near-duplicate detection for eRulemaking
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
Hui Yang, Jamie Callan, Stuart W. Shulman
WWW
2005
ACM
16 years 7 months ago
A search engine for natural language applications
Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search e...
Michael J. Cafarella, Oren Etzioni
EDBT
2002
ACM
159views Database» more  EDBT 2002»
16 years 6 months ago
Cut-and-Pick Transactions for Proxy Log Mining
Web logs collected by proxy servers, referred to as proxy logs or proxy traces, contain information about Web document accesses by many users against many Web sites. This "man...
Wenwu Lou, Guimei Liu, Hongjun Lu, Qiang Yang
WWW
2010
ACM
16 years 1 months ago
Not so creepy crawler: easy crawler generation with standard xml queries
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...