Skip to main content

Sequence Analysis and Omics

Overview

  • Credit value: 30 credits at Level 7
  • Convenor: Dr Irene Nobeli
  • Assessment: open-book online tests (60%) and coursework (40%)

Module description

The analysis of biological sequences is at the core of much of bioinformatics. Sequence analysis is also central to processing large-scale datasets of genes, transcripts and proteins in biological samples, produced by recent advances in experimental methods, and linking them to biological processes, phenotypic differences and disease.

In this module we cover classical methods of biological sequence analysis and their applications to the problems of modern biology. We also discuss different aspects of molecular evolution, from sequence to structure and function. Through a series of practicals, you are introduced to major online bioinformatics resources (e.g. Ensembl) and how they can be queried to answer questions relating to biological sequences and the data stored on these sequences. Additional practicals reinforce your programming skills through short coding challenges in the context of sequence analysis.

In the second half of the module, you are introduced to experimental methods of surveying molecules in biological samples and provided with basic training in the skills required to analyse the large-scale data generated by these methods. Currently, approximately one quarter of the module is dedicated to lectures covering the applications and challenges of next generation sequencing. The corresponding practicals reinforce your skills in R and introduce you to several popular Bioconductor packages, as well as unix-based software used to process NGS data. You will also be introduced to proteomics and immunoinformatics.

Indicative syllabus

  • Introduction to the module; genome organisation and function
  • Measuring sequence similarity
  • Optimal vs heuristic methods for pairwise sequence alignment
  • Multiple sequence alignment
  • Profiles, position-specific score matrices and motif discovery
  • Hidden Markov Models in sequence analysis
  • Models for analysing RNA sequences
  • Sequence alignment in the next-generation sequencing era
  • Comparing and classifying protein domain structures
  • Evolution of protein function
  • Introduction to mass spectrometry and its application to proteomics
  • Introduction to high-throughput sequencing technologies (NGS) and computational analysis of relevant data:
    • Genomics
    • Transcriptomics
    • Immunoinformatics

Learning objectives

By the end of this module, you will be able to:

  • describe the basics of genome organisation (with emphasis on the human genome)
  • understand the differences between genes, transcripts and proteins and be familiar with the biological mechanisms linking them
  • define and differentiate between the concepts of homology and sequence similarity
  • describe the general principles of algorithms used to align sequences (both optimal and heuristic approaches)
  • understand how dynamic programming can be used in the context of sequence analysis and describe basic algorithms used in dynamic programming
  • build a position weight matrix from an alignment of biological sequences
  • describe how Hidden Markov Models (HMMs) are constructed and how they are used to represent protein families
  • understand the basics of protein structure classification schemes and how we use such schemes to annotate new genes and proteins
  • use a number of established genome browsers and bioinformatics servers to extract information on genes and proteins
  • write basic Python programs to carry out simple tasks in sequence analysis
  • describe the fundamentals and applications of selected high-throughput technologies ('omics')
  • outline the computational steps required to process and analyse data derived from omics technologies, including but not necessarily limited to proteomics, genomics and transcriptomics
  • distinguish between the various file formats common in omics applications and demonstrate an understanding of how they are used
  • apply bioinformatics pipelines for pre-processing and cleaning up the raw NGS data, demonstrating competence in the application of relevant software and in critically analysing the outputs in different contexts
  • describe and use basic statistical methods applied in the context of analysing high-throughput data in the various omics fields.