Privacy concerns are a major obstacle to deriving the scientific insights now possible from increasing data collection and powerful new analysis techniques. The goal of privacy technology is to permit data mining and analysis to be carried out over a collection of sensitive records donated by individuals. Ideally, individuals receive a guarantee that the analysis does not lead to harmful disclosures about them. At the same time, data miners and scientists hope to study the data with little disruption to their methods and results.
Differential privacy has emerged as an important standard for protection of individual's sensitive information. Differential privacy guarantees that the output the analyst receives is statistically indistinguishable from the output the analyst would have received if any one individual had opted out of the collection. Differentially private algorithms have been developed to support many common data mining tasks. However, many of these have yet to see widespread adoption in real-world systems. The goal of this research project is to work towards closing the gap between theory and practice.
This summer project is part of a larger, multi-year NSF-sponsored project. A website describing that effort can be found here: https://www.dpcomp.org/ Interested students are strongly encouraged to review this website before applying.
Responsibilities: Duties will include software development and data analysis. Research assistants are expected to be flexible and willing to learn new skills as needed. The specific job description will be crafted based on the current needs of the project and the skill set of the student.
Student qualifications: Sufficient background in computer science: the equivalent of COSC 101, 102 is required. Strong computer programming skill (in any language) is desired. This can include programming completed through course work and/or projects completed outside of courses such as internships or individual projects. Additional skills that would be beneficial include: data analysis and statistics.