Data Management Lab

This lab is focused on designing a scalable infrastructure for predictive modeling, developing solutions for data extraction, cleaning, and integration, and scaling up algorithms for predictive modeling so that they operate efficiently with very large data sets.

Related CPCP Publications

Magellan: toward building entity matching management systems. Konda P, Das S, Suganthan P, Doan A, Ardalan A, Ballard JR, Li H, Panahi F, Zhang H, Naughton J, Prasad S, Krishnan G, Deep R, Raghavendra V. Proceedings of the 42nd International Conference on Very Large Databases, 2016

Publication details

Lead

AnHai Doan

Investigators

Jignesh Patel

Udip Pant

Haojun Zhang

Resources

Magellan software

MetaSRA pipeline software

MetaSRA: normalized metadata for the Sequence Read Archive data

CPCP Retreat 2016: Entity Matching for EHR- and Transcriptome-based Phenotyping Symposium Video

CPCP Seminar: Towards Demystifying Big Data Technologies Seminar Video