We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...
Different features have different relevance to a particular learning problem. Some features are less relevant; while some very important. Instead of selecting the most relevant fe...
Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "qu...
There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand ...
Murali Haran, Alan F. Karr, Alessandro Orso, Adam ...
When building an application that requires object class recognition, having enough data to learn from is critical for good performance, and can easily determine the success or fai...