Project Overview

PTmtest: Bayesian Nonparametric Multiple Testing Software

Faculty Sponsor

Will Cipolli (wcipolli@colgate.edu)

Department(s)

Mathematics

Abstract

Multiple testing, or multiplicity problems often require testing several means with the assumption of rejecting infrequently, as motivated by the need to analyze DNA microarray data. This type of problem can be thought of as your usual t-test in intro statistics done hundreds, thousands, or possibly millions of times. The goal is to keep the combined rate of false discoveries and non-discoveries as small as possible – this is not dissimilar to Type I and Type II error from introduction to statistics. A discrete approximation to a Polya tree prior that enjoys fast, conjugate updating, centered at the usual Gaussian distribution shows promising false discovery rate and estimation of key values in the mixture model with very reasonable computational speed. This new technique and the advantages of this approach have been demonstrated using extensive simulation and data analysis leading to publication. This approach relaxes many of the assumptions in the multiple testing setting since it is nonparametric and this flexibility makes it a desirable approach.. An increasing trend among statisticians is to provide R packages, perhaps the most widely available venue for the dissemination of new statistical methods, for fitting complex methodology. These packages allow others familiar with the R computing environment to implement methods otherwise not readily available, and hence allow the routine use of new methodology. Many scientists use specialized software or Excel to implement existing parametric statistical methods regardless of whether or not those parametric assumptions hold. During this project, students will help build an R package that carries out the Bayesian nonparametric testing that is already developed so that scientists might use it in microarray data analysis and perhaps in other venues. R is slow for loops and other tasks so it will be necessary to do some things in C++ but there is an interface for doing so.

Student Qualifications

1) MA105 or AP stat in high school 2) Programming skills from COSC 102 (Introduction to Computing II). - Experience with object oriented programming languages.

Number of Student Researchers

2 students

Project Length

10 weeks


Applications open on 01/15/2017 and close on 02/07/2017


<< Back to List





If you have questions, please contact Karyn Belanger (kgbelanger@colgate.edu).