COMP 3550: Introduction to Bioinformatics

This course is an elective for the  SS   Smart Systems Stream.

This course is designed as an interdisciplinary introductory course in bioinformatics for both Computer Science and Biology students and as a bridge between both disciplines. The course is intended to be a course for a mixed audience of students with different backgrounds (e.g. computer science and biology). The course will focus on the fundamental concepts, ideas and related biological applications of existing bioinformatics tools. The purpose is to provide the students with hands-on experience on the major computational approaches applied to a wide variety of bioinformatics problems.

Lab In addition to classes, this course has one structured laboratory session per week.

Prerequisites: Biology 1001; one of  COMP 1001COMP 1002 or  COMP 1510; and 6 credit hours in Computer Science or Biology course at the 2000 level or above, excluding Biology 2040, 2041, 2120; or permission of the course instructor

Availability: This course is usually offered once per year, in Fall or Winter.

Course Objectives

Bioinformatics deals with the development and application of computational methods to address biological problems. The course will focus on the fundamental concepts, ideas and related biological applications of existing bioinformatics tools. This course will provide hands-on experience in applying bioinformatics software tools and online databases to analyze experimental biological data, and it will also introduce scripting language tools typically used to automate some biological data analysis tasks.

Biology students will appreciate the impact of these approaches for addressing biological questions and will gain insight on the limitations and strengths of these approaches. Computer Science students will appreciate the practical use of the concepts they have been taught in other courses, but most importantly, the challenges posed by biological questions, and the need for the robust algorithms that deal with the very large, noisy datasets typically present in biology. Computer scientists and biologists will both recognize the large diversity of questions addressed by bioinformatics applications. Many industry and research jobs now require cross-disciplinary collaboration. With this course, students will start becoming aware of the interdisciplinary nature of bioinformatics and appreciate the contribution of people outside their field of study.

Representative Workload
  • Assignments and Projects 25%
  • Lab Work and Quizzes 20%
  • In-class Exam 25%
  • Final Exam 30%
Representative Course Outline
  • Introduction
    • What is Bioinformatics?
    • Why is Bioinformatics required?
    • Importance of interdisciplinary collaboration
  • Sequences
    • Why compare sequences?
    • Sequence similarity
    • Where to look for information about a sequence
    • Sequence alignment: Pairwise and multiple
  • Genomics
    • How are genomes sequenced?
    • How are genomes annotated?
    • Genomic variation
    • Gene expression
  • How is gene expression measured?
  • Pre-processing the data: denoising and normalization
  • Differential analysis
    • Interpreting a list of genes
  • Gene functional annotation - Gene Ontology (GO)
  • Finding over-represented gene functions in gene lists
  • Other source of annotations
    • Gene function prediction
  • Proteomics
    • Protein Interaction Networks
    • Protein Domains
    • How are proteins measured and identified?
  • Transcriptomics
    • Motif finding
    • Determining binding preferences
    • Inferring regulatory networks
  • Metabolomics
    • Detection and identification of metabolites
    • Human metabolome project
Labs

Students will be expected to attend a weekly lab session, and to submit a lab report or to answer a lab quiz at the end of each lab.

  • Script programming and using bioinformatics libraries (BioPerl)
  • Sequences
    • Using BLAST, BLAT
    • Using alignment tools (such as: ProbCons, M-Coffee)
  • Working with sequenced genomes
    • Ensembl, BioMart, UCSC Genome Browser
    • Linking own data to a Genome browser
  • Analysis of gene expression data using existing tools (such as: Babelomics, GeneXPress, Gene Pattern)
  • Annotating a list of genes with functional annotation
  • Using over-representation or enrichment analysis tools (such as: GSEA, DAVID, GenMAPP, GOMiner)
  • Using gene function prediction eystems (such as: GeneMANIA, FuncBase, NBrowse, STRING, FunCoup)
  • Using motif finding tools in a set of sequences (such as: MEME, AlignACE)
  • Using regulatory networks prediction systems (such as: COALESCE, Allegro)
Notes
  • Credit cannot be obtained for both Computer Science 3550 and Biology 3951.

Page last updated May 24th 2021