Data Management Lab

This lab is focused on designing a scalable infrastructure for predictive modeling, developing solutions for data extraction, cleaning, and integration, and scaling up algorithms for predictive modeling so that they operate efficiently with very large data sets.

Related CPCP Publications

MatchCatcher: A debugger for blocking in entity matching. Li H, Konda P, Suganthan P, Doan A, Snyder B, Park Y, Krishnan G, Deep R, Raghavendra V. Proceedings of International Conference on Extending Database Technology (EDBT), 2018

Publication details

CloudMatcher: A cloud/crowd service for entity matching. Govind Y, Paulson E, Ashok M, Suganthan G.C. P, Hitawala A, Doan A, Park Y, Peissig P, LaRose E, Badger J. KDD Workshop on Big data analytics-as-a-Service: Architecture, Algorithms, and Applications in Health Informatics, 2017

Publication details

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bernstein M, Doan A, Dewey C. Bioinformatics 33(18):2914–2923, 2017

Publication details

Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services. Das S, Suganthan P, Doan A, Naughton J, Krishnan G, Deep R, Arcaute E, Raghavendra V, Park Y. Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2017

Publication details

Towards interactive debugging of rule-based entity matching. Panahi F, Wu W, Doan A, Naughton J. Proceedings of the International Conference on Extending Database Technology (EDBT), 2017

Publication details


AnHai Doan


Jignesh Patel

Udip Pant

Haojun Zhang


CPCP 2017 Retreat: Entity Matching Using Magellan - Matching Drug Reference Tables Symposium Video

BigGorilla software

Magellan software

MetaSRA pipeline software

MetaSRA: normalized metadata for the Sequence Read Archive data

CPCP Retreat 2016: Entity Matching for EHR- and Transcriptome-based Phenotyping Symposium Video

CPCP Seminar: Towards Demystifying Big Data Technologies Seminar Video