Seminar: Labeling Large Scale Social Media Data using Budget-driven One-class SVM classification
Co-supervisors: Dr. Jian Tang & Dr. Minglun Gong
Labeling Large Scale Social Media Data using Budget-driven One-class SVM classification
Department of Computer Science
Tuesday, June 30, 2015, 1:00 pm, Room EN 2022
The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). These data are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in my thesis that is suitable for large scale social media data classification.
Our approach is based on an existing online One-class SVM learning algorithm, referred as the STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of the STOCS using synthetic data. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.