The Institute assembles a world-class training initiative in statistical genetics in the Middle East. The program brings together a select group of statistical geneticists, all leaders in the development of methodology in the field with several years of experience in teaching these topics and providing hands-on experience to trainees. The Institute is an extension of the annual Summer Institute in Statistical Genetics (SISG) organized at the University of Washington. For more details on SISG, visit http://www.biostat.washington.edu/suminst/sisg
The program consists of a series of two-and-a-half day workshops designed to introduce students and researchers to modern methods of statistical analysis. The modular nature of the Institute enables participants to select from the following program modules:
Introduction to R
Population Genetic Data Analysis
Gene Expression Profiling
Association Mapping: GWAS and Sequencing Data
Pathway and Network Analysis for Omics Data
The WISG Program
Modules and Instructors
The WISG consists of a series of two-and-a-half day workshops designed to introduce students and researchers to modern methods of statistical analysis and to the challenges posed by modern genetic data. Statistical and genetic prerequisites are minimal, and the modular nature of the WISG enables participants to design a program best suited to their backgrounds and interests. Participants are encouraged to take two modules, one in each half of the week.
Ken Rice, Associate Professor, University of Washington Thomas Lumley, Professor and Chair of Biostatistics, University of Auckland
This module introduces the R statistical environment, assuming no prior knowledge. It provides a foundation for the use of R for computation in later modules, including discussion of the content of Bioconductor and other open source software resources.
In addition to discussing basic data management tasks in R, such as reading in data and producing summaries through R scripts, we will also introduce R’s graphics functions, its powerful package system, and simple methods of looping.
Examples and exercises will use data drawn from biological and medical applications, including infectious diseases and genetics. Hands-on use of R is a major component of this module; users require a laptop and will use it in all sessions.
Jerome Goudet, Associate Professor, Lausanne University Bruce Weir, Professor and Chair of Biostatistics, University of Washington; Director, Summer Institute in Statistical Genetics
This module deals with statistical methods for analyzing the distribution of allele frequencies in populations and is suitable for biologists studying all types of organisms. If offers a unified treatment for the analysis of discrete genetic data, starting with estimates and sample variances of allele frequencies to illustrate genetic vs statistical sampling and Bayesian approaches.
We take a detailed look at Hardy-Weinberg and linkage disequilibrium, including the use of exact tests with mid-p-values and a new look at X-chromosome Hardy-Weinberg testing. A novel characterization of population structure with F-statistics, based on allelic matching within and between populations with individual relationship estimation will be offered as a special case.
Analyses are illustrated with applications to forensic science and association mapping, with particular reference to rare variants. Concepts will be illustrated with R exercises.
The gene expression module will cover the theory and application of transcriptomics based on RNA-Seq methodologies. The focus of the module will be on the statistical basis of hypothesis testing including fundamentals of ANOVA and significance thresholds, but also covering the central role of normalization strategies as they impact inference of differential expression.
Students will have the opportunity to work examples using open source modules in R providing the opportunity to process read counts, as well as the performance of hypothesis testing. In addition, we will discuss options for downstream processing by clustering and module detection, We then discuss options for downstream processing by clustering and module detection/ comparison; extensions to methylation profiling, proteomics, and metabolomics; eQTL analysis including fine mapping of regulatory variation; and finally the relationship between transcriptomic and phenotypic variation.
This module deals primarily with upstream data processing methods that lead to the delineation of networks and pathways that are then considered in Module 6.
This course is concerned with multivariate statistical analysis of microbiome data. We will briefly cover foundational concepts in microbial ecology, molecular biology, bioinformatics, and DNA sequencing. The main focus of the course will be on developing an understanding of multivariate analysis of microbiome data.
Practical skills to be developed in this course include managing high-dimensional and structured data in metagenomics, visualization and representation of high-dimensional data, normalization, filtering, and mixture-model noise modeling of count data, as well as clustering and predictive model building.
Bruce Weir, Professor and Chair of Biostatistics, University of Washington; Director, Summer Institute in Statistical Genetics Elizabeth Blue, Assistant Professor, University of Washington
This module assumes some familiarity with probability and statistical inference, as well as some familiarity with R such as that obtained in Module 1.
Topics covered include: basic probability and Mendelian genetics; Hardy-Weinberg equilibrium; inbreeding coefficients; population structure; recombination and genetic linkage; linkage disequilibrium; measures of relatedness; haplotype frequency estimation with unphased genotypes, genetic association testing; association testing in the presence of population structure and/or relatedness.
Many methods are illustrated with implementation in R, and the module is most useful for participants with basic familiarity with R. Other public domain software will be introduced, such as HAPLOVIEW, LOCUSZOOM, and PLINK.
Networks represent the interactions among components of biological systems. In the context of high dimensional omics data, relevant networks include gene regulatory networks, protein-protein interaction networks, and metabolic networks. These networks provide a window into biological systems as well as complex diseases, and can be used to understand how biological functions are implemented and how homeostasis is maintained. On the other hand, pathway-based analyses can be used to leverage biological knowledge available from literature, gene ontologies or previous experiments in order to identify the pathways associated with disease or an outcome of interest.
In this module, various statistical learning methods for reconstruction and analysis of networks from omics data are discussed, as well as methods of pathway enrichment analysis. Particular attention will be paid to omics datasets with a large number of variables, e.g., genes, and a small number of samples, e.g., patients. The techniques discussed will be demonstrated in R.
This course assumes a previous course in regression and familiarity with R or other command line programming languages. Users require a laptop and will use it in all sessions.
Convened by Youssef Idagdhour, Assistant Professor of Biology, NYU Abu Dhabi
Organized by Greg Gibson, Professor, Georgia Institute of Technology
Bruce Weir, Professor and Chair of Biostatistics, Director, Summer Institute in Statistical Genetics, University of Washington
The Winter Institute is an extension of the annual Summer Institute in Statistical Genetics (SISG) organized at the University of Washington has been offered annually in the US since 1996, and outside the US since 2001.
Join our events mailing lists
Always be the first to know about what's going on in our community. Sign up for one of our newsletters and receive information on a wide variety of events such as exhibition, lectures, films, art performances, discussions and conferences.