Seminar: Extracting topics from books: distinguishing books between/within genres

Kunal Dhir
M.Sc. Candidate
Supervisor: Lourdes Peña-Castillo

Extracting topics from books: distinguishing books between/within genres

Department of Computer Science
Thursday, November 29, 2018, 9:30a.m., Room EN-2022


Abstract

Text analysis involves computational analysis of unstructured documents to extract relevant information. Topic Modeling is a text analysis technique used for extracting latent themes or topics in such documents. Analysis of large texts, such as books, can significantly benefit from the extraction of broad themes. We aim to use topic modeling to analyze classic books belonging to different genres including classic literature and philosophy. We use LDA or Latent Dirichlet Allocation, an unsupervised classifier for implementing topic modeling. We assess the performance of the model based on the number of topics to be found.

Contact

Department of Computer Science

230 Elizabeth Ave, St. John's, NL, CANADA, A1B 3X9

Postal Address: P.O. Box 4200, St. John's, NL, CANADA, A1C 5S7

Tel: (709) 864-8000