@ -14,10 +14,6 @@ date: September 04, 2019
## It's really difficult!
# What topics to cover?
## A really, really vast field
@ -173,7 +169,7 @@ date: September 04, 2019
## Todos for you
0. Complete the [course survey](https://forms.gle/NvYx3BM7HVkuzYdG6)
1. Explore the [course website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/)
2. Think about which lecture you want to present and summarize
2. Think about which lecture you want to present
3. Think about which lecture you want to summarize
4. Form project groups and brainstorm topics
@ -217,10 +213,28 @@ date: September 04, 2019
- Really, really hard to think about side information
- May not even be public at time of data release!
## Netflix challenge
## Netflix prize
- Database of movie ratings
- Published: ID number, movie rating, and rating date
- Attack: from public IMDB ratings, recover names for Netflix data
- Competition: predict which movies IDs will like
- Tons of teams competed
- Winner: beat Netflix's best by **10%**
> A triumph for machine learning contests!
## Privacy flaw?
- Public info on IMDB: names, ratings, dates
- Reconstruct names for Netflix IDs
- Netflix settled lawsuit ($10 million)
- Netflix canceled future challenges
## "Blending in a crowd"
- Only release records that are similar to others
@ -233,14 +247,17 @@ date: September 04, 2019
- First few queries fine, then suddenly total violation
- Again, interacts poorly with side-information
## Differential privacy
- Proposed by Dwork, McSherry, Nissim, Smith (2006)
# Differential privacy
## Yet another privacy definition
> A new approach to formulating privacy goals: the risk to one’s privacy, or in
> general, any type of risk... should not substantially increase as a result of
> participating in a statistical database. This is captured by differential
## Basic setting
- Private data: set of records from individuals
- Each individual: one record
@ -256,3 +273,10 @@ subset $S$ of outputs, we have:
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
## Basic reading
> Output of program doesn't depend too much on any single person's data
- Property of the algorithm/query/program
- No: "this data is differentially private"
- Yes: "this query is differentially private"