Recent & Upcoming Events

Jun 30, 2016

CPCP Second Annual Retreat

A day-long retreat highlighting recent research in the Center; featuring talks, posters, and lunch.

Apr 21, 2016

CPCP Seminar: Mining Structures from Massive Bio-Text Data: A Data-Driven Approach by Dr. Jiawei Han

Jiawei Han from the BD2K KnowEng Center-UIUC discussed mining structures from massive bio-text data.

Nov 10, 2015

CPCP Seminar: Transforming Your Research with High-Throughput Computing by Lauren Michael

Lauren Michael from the CHTC discussed high-throughput computing approaches to Big Data.

Oct 15, 2015

Big Privacy: Policy Meets Data Science Symposium

A symposium on the legal, policy, & technical issues at the intersection of privacy and data science

Jun 12, 2015

CPCP First Annual Retreat

Training Resources

CPCP Seminar: Transforming Your Research Through High Throughput Computing Seminar Video

Presented by Lauren Michael

Big Privacy Symposium: Introductory and Welcoming Remarks Symposium Video

Presented by David Page, PhD

Big Privacy Symposium: Big Data, Big Headaches: Cultivating Public Trust in an Age of Unconsented Access to Identifiable Data Symposium Video

Presented by Barbara J. Evans PhD, JD, LLM

Big Privacy Symposium: Does Publishing a Predictive Model for Precision Medicine Put Patient Privacy at Risk? Symposium Video

Presented by Matt Fredrikson, PhD

Big Privacy Symposium: Panel Discussion Symposium Video

Panel Members: Barbara Evans, Matt Fredrikson, Arvind Narayanan, Pilar Ossorio, Vitaly Shmatikov

Recent Publications

A hierarchical framework for state space matrix inference and clustering. Zuo C, Chen K, Hewitt K, Bresnick EH, Keles S. Annals of Applied Statistics, 2016

Anytime exploration for multi-armed bandits using confidence information. Jun K-S, Nowak R. Proceedings of the 33rd International Conference on Machine Learning, 2016

A multi-task graph-clustering approach for chromosome conformation capture data sets identifies conserved modules of chromosomal interactions. Siahpirani A, Ay F, Roy S. Genome Biology 17:114, 2016

A MAD-Bayes algorithm for state-space inference and clustering with application to querying large collections of ChIP-seq data sets. Zuo C, Chen K, Keles S. Proceedings of the 20th Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2016

Distance shrinkage and Euclidean embedding via regularized kernel estimation. Zhang L, Wahba G, Yuan M. Journal of the Royal Statistical Society B, doi:DOI: 10.1111/rssb.12138, 2016

Recent Resources

GADGET software

GADGET is a web tool that for finding and ranking genes and metabolites that are associated with a given query in the biomedical literature. It's like a version of PubMed that returns genes and metabolites instead of articles.

rvalues software

rvalues is an R package for computing "r-values" from various kinds of user input such as a list of effect size estimates and associated standard errors. Given a large collection of measurement units, the r-value, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top r-fraction of units.

atSNP software

atSNP (Affinity Test for regulatory SNP detection) is an R package for computing and testing large-scale motif-SNP interactions. It provides three main functions: (1) Computing the binding affinity scores for both the reference and the SNP alleles based on position weight matrices; (2) Computing the p-values of the affinity scores for each allele; (3) Computing the p-values of the affinity score changes between the reference and the SNP alleles.

MBASIC software

MBASIC (Matrix Based Analysis for State-space Inference and Clustering) is a statistical framework for integrative analysis of many related experiments where observations are collected over a set of units. The MBASIC framework allows for simultaneous projection of the observations onto a discrete state space and clustering the units based on their state-space profiles. MBASIC is applicable to many high throughput sequencing datasets. For example, it enables analysis of large collections of ChIP-seq datasets by simultaneously identifying genomic loci with peaks and similar peak patterns across multiple conditions.