Seminar: A Comparative Study Of Feature Clustering On Gene Selection

Hanieh Marvi Korasani
M.Sc. Thesis Proposal
Supervisor: Dr. Hamid Usefi

A Comparative Study Of Feature Clustering On Gene Selection

Department of Computer Science
Friday, November 23, 2018, 14:40p.m., Room EN 2022


Abstract

Cancer datasets normally contain a small size of samples while each sample has a large number of genes as features. Selecting relevant genes that are involved with a cancer is a challenging task. Feature selection as one of the methods of dimensionality reduction is used for selecting a subset of features which lead to a better performance of the classifier. However, most feature selection algorithms suffer from high computational complexity. To tackle this issue, in this project, we intend to incorporate feature clustering in gene selection. First, genes are clustered into gene groups with a clustering algorithm so that genes belonging to the same cluster have similar expression profile. Then, from each cluster, the representative gene(s) are selected. After selecting representative genes from all clusters, feature selection is applied to this set of representatives. This method decreases the computational complexity and redundancy of genes which leads to a better classification accuracy. In this project, we conduct a comprehensive study by examining widely used clustering, feature selection, and classification methods, and applying them to different cancer datasets including Colon, Leukemia and, Lymphoma. Results in this study will show that by incorporating feature clustering to gene selection, computational complexity will be reduced and better classification accuracy will be achieved.