A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
This paper investigates a novel approach to unsupervised morphology induction relying on community detection in networks. In a first step, morphological transformation rules are a...
Knowledge discovery is the most desirable end product of an enterprise information system. Researches from different areas recognize that a new generation of intelligent tools for...
Statistical MT has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. Syntactic approaches seek to reme...
Michel Galley, Jonathan Graehl, Kevin Knight, Dani...
While conventional malware detection approaches increasingly fail, modern heuristic strategies often perform dynamically, which is not possible in many applications due to related ...