Data Management Lab

This lab is focused on designing a scalable infrastructure for predictive modeling, developing solutions for data extraction, cleaning, and integration, and scaling up algorithms for predictive modeling so that they operate efficiently with very large data sets.

Related CPCP Publications

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bernstein M, Doan A, Dewey C. Bioinformatics, 2017

Publication details

Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services. Das S, Suganthan P, Doan A, Naughton J, Krishnan G, Deep R, Arcaute E, Raghavendra V, Park Y. Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2017

Publication details

Towards interactive debugging of rule-based entity matching. Panahi F, Wu W, Doan A, Naughton J. Proceedings of the International Conference on Extending Database Technology (EDBT), 2017

Publication details

Ava: From data to insights through conversation. John RJL, Potti N, Patel J. Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2017

Publication details

Magellan: toward building entity matching management systems. Konda P, Das S, Suganthan P, Doan A, Ardalan A, Ballard JR, Li H, Panahi F, Zhang H, Naughton J, Prasad S, Krishnan G, Deep R, Raghavendra V. Proceedings of the 42nd International Conference on Very Large Databases (VLDB), 2016

Publication details

Lead

AnHai Doan

Investigators

Jignesh Patel

Udip Pant

Haojun Zhang

Resources

BigGorilla software

Magellan software

MetaSRA pipeline software

MetaSRA: normalized metadata for the Sequence Read Archive data

CPCP Retreat 2016: Entity Matching for EHR- and Transcriptome-based Phenotyping Symposium Video

CPCP Seminar: Towards Demystifying Big Data Technologies Seminar Video