FAIR AND ACCURATE: “Data comes from people and people are noisy,” UC Santa Cruz Computer Science Associate Professor Yang Liu said. Liu joined the UC Santa Cruz faculty in 2019 after completing his postdoctoral training at Harvard University. There, he began looking at the way large government and financial data sets used by algorithms to make life-changing decisions can be riddled with prejudice (like mortgage decisions or bail amounts, which have been excoriated for unfairly penalizing African Americans). “If data encodes the bias of the people, machine learning algorithms are learning this bias too.” Machine learning identifies patterns in large data sets (whether it’s understanding a spoken word, recognizing an image or predicting Alzheimer’s) and then uses feedback to train itself to identify those patterns more accurately. However, in the process of optimizing itself, machine learning can inadvertently reinforce preexisting bias. Liu wants to create algorithms with fairness programmed into them. “The data science community has been working on it for the past three years,” Liu said. It’s a three stage process: “First we have to define what fairness is, for example credit applicants should not be penalized by race, then we must define that definition mathematically and finally we add it into the program so it will continue to optimize for accuracy but doing so under the constraints we’ve created so that we can be fair.”
TALKATIVE FRUIT: Has your iPhone begun to anticipate what you’re about to write with unnerving accuracy? Big data projects like text prediction often require processing extremely sensitive data. Keeping it secure is a top priority. UC Santa Cruz Computer Science and Engineering Associate Professor Abhradeep Guha Thakurta created a system for Apple called differential privacy which allows algorithms to learn from keyboard entries without disclosing it. The way this particular system works it is designed from the ground up for privacy. “Privacy is an English word,” Thakurta said. “To protect you must create a semantic statement that can be refuted, meaning a mathematical statement that tells what it means for a system to be protected, and that can be refuted, so to that end there’s this notion called differential privacy which one kind of formalized statement that gives it semantic meaning.” Differential privacy essentially allows a large, decentralized system to process confidential information in situ without any other part of the pipeline coming into contact with that information. So those confidential messages you’ve typed to your spouse or doctor never leave your device.