Seminar: A novel method combining statistical approach and machine learning method to predict patie

Wenbin Zhang
M.Sc. Candidate
Supervisor: Dr. Jian Tang

A novel method combining statistical approach and machine learning method to predict patient survival from gene expression profiles

Friday, December 18, 2015, 2:00 p.m., Room EN 2022
Department of Computer Science


Survival analysis with high dimensional data deals with the prediction of the probabilities of patient survival times based on their gene expression data. A crucial task for the accuracy of survival analysis in this context isto select the features highly co-related with the patient’s survival time. Since the information about class labels is hidden, the existing feature selection methods are not applicable. Contrary to classical statistical methods which address this issue with the Cox score, we propose to approach this problem by discretizing survival time of the patients into suitable number of subgroups via silhouettes clustering validity. To cope with patients’ censoring, we use ‘k-nearest neighbor’ based on clinical parameters that are truly associated with survival time. These are selected using penalized logistic regression and the penalized cox proportional hazards models with the EM algorithm. They are then used to estimate censored survival time. Next, the estimated class label was combined with feature selection to identify a list of genes that are correlated with the survival time and classifiers are applied to this subset of genes to determine which subtype is present in a future patient. By doing so, we expected the subgroups btained is not only biological meaningful but also differs in terms of survival. The effectiveness of the proposed method was demonstrated through the comparisons with classical statistical methods on existing real dataset and simulation dataset.



Department of Computer Science

230 Elizabeth Ave, St. John's, NL, CANADA, A1B 3X9

Postal Address: P.O. Box 4200, St. John's, NL, CANADA, A1C 5S7

Tel: (709) 864-8000