- Data Sets
- Training Resources
Results for: Training
CPCP Retreat 2016: High-Throughput Computing in Support of High-Throughput Phenotyping Symposium Video
Dr. Miron Livny describes the opportunities available in terms of High-Throughput Computing at the UW Madison. In the past year, his team worked with close to 200 research teams, utilizing a total of 320 million computing hours. The High-Throughput Computing group facilitates data processing by "submitting locally and running globally" using many resources including the Open Science Grid (OSG). One of the most important resources the group has to offer is their team of expert consultants/liaisons who help scientists learn how to use High-Throughput Computing to effectively and efficiently accomplish their research goals.
CPCP Retreat 2016: High-Throughput Predictive Phenotyping from Electronic Health Records Symposium Video
Ross Kleiman describes his work as part of the EHR-based Phenotyping project at the CPCP creating predictive models of diseases, such as heart attack or breast cancer, from electronic medical health records. Using the extensive medical health record data collected by Marshfield clinic in Marshfield, WI over the past 40 years and High-throughput computing resources, the EHR-based phenotyping project is able to make prediction about medical outcomes of patients. Recent work predicts the risks of specific patients developing specific disease, as well as risk of patients being readmitted to the hospital in the next thirty days. This model serves as a machine learning pipeline for forming diagnoses from EHR records and is an initial baseline for this new area of pan-diagnostic machine learning research.
CPCP Retreat 2016: Using Active Learning to Phenotype Electronic Medical Records Symposium Video
In the analysis of Electronic Medical Records, labelled examples are examples for which we know what medical conditions, such as cataracts or diabetes, are indicated by a specific patients health record. The availability of this type of labelled records is essential for the development of robust machine learning models because the labels serve as a ground truth for researchers to test their machine learning models against. Unfortunately, these labelled examples are difficult and expensive to obtain because the labeling is typically done by a medical expert who will spend anywhere from 30 minutes to 6 hours determining each disease label for each patient. In this talk, Ari Biswas describes how his research in the EHR-based Phenotyping research group at the CPCP addressing this labeling problem with an active learning approach. This active learning method learns to label EHR's by iteratively determining a labeling model and then improving it's labeling process by querying a medical expert for labels of examples that the method is most uncertain about the label. The results of this research on a test example show that this active learning method learns to label patients using fewer labelled examples compared to a model that learns from randomly labelled example records.
CPCP Retreat 2016: Entity Matching for EHR- and Transcriptome-based Phenotyping Symposium Video
Dr. AnHai Doan describes the task of entity matching across EHR- and transcriptome-based data and introduces a new tool called Magellan that allows non-experts to perform entity matching on their datasets. Entity matching allows matching of data across multiple data sets, for example identifying all of a patient's data when they have been treated at different medical offices or selecting all patients who have been treated with a specific drug. This is a challenging task because of variation in the data such as spelling mistakes or the use of abbreviations. Magellan fills an important gap in the data science pipeline by providing a step-by-step workflow for individuals to perform entity matching on their own data without becoming experts in the field. This can potentially save research groups thousands of dollars that would otherwise be spent hiring an expert. The Magellen package will be released in 2016 as a python package.
CPCP Retreat 2016: Computational Phenotyping for Breast Cancer Risk Assessment Symposium Video
Recommendations for mammogram frequency vary widely. This causes confusion among primary care physicians and their patients. Dr. Beth Burnside describes her work with the phenotype models for breast cancer screening project in conjunction with the Low-dimensional Representations Lab at the CPCP to model to predict risks factors for individual patients. Dr. Burnside and her research group integrate genetic and imaging data to predict risk of breast cancer in patients. By applying machine learning methods to this data, the lab is working to create a clinical decision support tool to support evidence-based conversations between physicians and patients. Use of this type of tool in the clinic will assist physicians and their patients in making informed decisions about how often an individual patient should have a mammogram.
CPCP Retreat 2016: Multi-Armed Bandit Algorithms and Applications to Experiment Selection Symposium Video
Dr. Kwang-Sung Jun works in the Value to Information Lab at the CPCP. In this talk, he describes a new computational method that can be used by researchers to suggest which experiments would be best to perform next. Biology labs often have large numbers of experiments that they could perform, knowing which to perform first saves researchers time and money. This method, an adaptation of an algorithm originally designed to predict gambling wins, helps researchers with this experiment selection task. In addition to this task, Dr. Jun also describes the application of this new algorithm to the New Yorker cartoon caption selection task as part of their crowd sourcing algorithm.
CPCP Retreat 2016: Neuroimage-Based Phenotyping and the Problem of AD Symposium Video
Alzheimer's Disease is clinically characterized by the appearance of plaques and tangles in the brain, the phenotypic manifestation in patients is dementia. In order to prevent Alzheimer's disease it is necessary to develop a predictive computational framework that can differentiate the effects of normal aging on the brain from the preclinical Alzheimer's disease. Dr. Sterling Johnson describes the work of the Neuroimage-based Phenotyping Project at the CPCP toward this predictive goal. The results of this work will be applied to identify people who are candidates for early intervention trials addressing prevention of Alzheimer's Disease during the earliest stages of the disease. These early stages often occur more then 20 years before the onset of dementia.
CPCP Seminar: Transforming Your Research Through High Throughput Computing Seminar Video
Lauren Michael works as a research computing facilitator in the high throughput computing (HTC) center at the University of Wisconsin. In this first part of this talk, she defines HTC, describing what it does and how utilizing this powerful computing resource can be used to facilitate research. Knowing what HTC is and how it works helps researchers to determine what types of problems can be addressed by HTC. HTC is a type of parallel computing, allowing a problem to be broken up into smaller problems that can be solved simultaneously on multiple computers. The second part of this talk covers specific information about using HTCcondor, the HTC computing resource that we have here at the University of Wisconsin-Madison
Big Privacy Symposium: Introductory and Welcoming Remarks Symposium Video
Sponsored by the CPCP, the Big Privacy Symposium focuses on solving the privacy concerns that relate to Big Data problems. The solution to this problem lays at the intersection of policy and computational approaches. This symposium focuses on bringing these two fields together to address privacy concerns in Big Data.
Big Privacy Symposium: Big Data, Big Headaches: Cultivating Public Trust in an Age of Unconsented Access to Identifiable Data Symposium Video
Dr. Barbara Evans discusses the numerous challenges associated with the use of Big Data. Drawing from her background in computational modeling and her years of practicing law, Dr. Evans describes the legal challenges associated with privacy. People are more willing to share their data if they know it will be used for their good or the good of society. However, after data is released for research purposes, how do we guarantee that it will be used for the good of society? How should the issue be addressed if the data is used in a biased or otherwise nefarious manner? Dr. Evans also addressed the issue of de-identification of data. Research shows that people are more willing to share their data if they know that it will be de-identified. Unfortunately, in this age of increasingly ubiquitous collection of rich datasets, we are encountering increasing numbers of datasets where re-identification is possible due to the richness of the data. Based on these concerns, Dr. Evans makes some recommendations for future policies that protect the privacy of those who share their data while ensuring that society can continue to benefit from the computational mining of Big Data.