Sciweavers

2513 search results - page 171 / 503
» Improving Generalization by Data Categorization
Sort
View
GECCO
2008
Springer
137views Optimization» more  GECCO 2008»
15 years 7 months ago
Informative sampling for large unbalanced data sets
Selective sampling is a form of active learning which can reduce the cost of training by only drawing informative data points into the training set. This selected training set is ...
Zhenyu Lu, Anand I. Rughani, Bruce I. Tranmer, Jos...
CGA
1999
15 years 6 months ago
Visualizing Large Telecommunication Data Sets
displays to abstract network data and let users interactwithit.Wehaveimplementedafull-scaleSwift3D prototype, which generated the examples we present here. Swift-3D We developed Sw...
Eleftherios Koutsofios, Stephen C. North, Daniel A...
SIGMOD
2010
ACM
362views Database» more  SIGMOD 2010»
15 years 1 months ago
Data warehousing and analytics infrastructure at facebook
Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and nonengineering. Apart from ad hoc analysis of data and ...
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba ...
ICML
2003
IEEE
16 years 7 months ago
Learning on the Test Data: Leveraging Unseen Features
This paper addresses the problem of classification in situations where the data distribution is not homogeneous: Data instances might come from different locations or times, and t...
Benjamin Taskar, Ming Fai Wong, Daphne Koller
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
16 years 7 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang