We can have security, privacy and confidentiality in pay gap analysis across many companies
[Want to learn the basics about Transition to Practice? Read an introduction to the Trusted CI Cybersecurity Technology Transition to Practice (TTP) program >>]
Boston University (BU) has been working with the Boston Women's Workforce Council (BWWC) since 2014 to help them understand the root causes and range of the gender wage gap and measure progress toward the goal of 100% equal pay for equal work. Secure multiparty computation (secure MPC) was the only way to get this measurement without compromising the privacy of the data, and without the risk and expense of a third-party data arbiter.
There is a lot of value from sharing data, but the more data you have, the more it can be breached, or others can use in ways you don’t want. MPC allows secure collaborative analysis of private data.
To learn more, we spoke with Mayank Varia, a research associate professor in the computer science department at Boston University (BU).
His personal research interests center around cryptography—both innovations within the field of cryptography and connecting it to problems throughout the rest of computer science and beyond such as the social sciences. He is also the co-director of the Center for Reliable Information Systems and Cyber Security (RISCS).
Trusted CI: Please tell us about the scope of your work, and how secure MPC fits into that.
M.V. RISCS is a group of people throughout BU who have either interest in cybersecurity research or interest in research studies dealing with cybersecurity generally. That includes areas like law, economics, philosophy, or areas beyond traditional computer science that are impacted by things like cybercrime, nation-state influencing, legal questions, and so on. Our group wants to see secure multiparty computation deployed around the world.
Trusted CI: What is secure MPC? What do you see as the primary benefits of secure MPC? What value does it bring to users?
M.V. Secure MPC allows organizations, state officials, companies, governments, etc., that each have private data to do collaborative analysis to learn about the data without sharing any of the data with anyone else.
You can do collaborative analysis of data that remains in these organizations’ own systems without it being breached or revealed to any other party. They don't need to find some trusted arbiter who holds the data in order to compute things for them. They can get the benefits of doing collaborative analysis, such as any kind of data science, without sacrificing the privacy or the security of the underlying data.
Trusted CI: Tell us about the use case example with the city of Boston, who the client(s) are, how the connection was made between the researchers and the users, what value they received from secure MPC, how long it took for the project, and whether they still use it.
M.V. BU has been working with the Boston Women's Workforce Council since 2014, but the story starts in 2013. BWWC was created by Mayor Thomas Menino before he retired. Creating it was one of his last big initiatives. His goal was to make Boston the premier city for working women. He brought together lots of people who had been thinking about the gender equity problem.
They wanted to understand root causes of wage gap and address them—what gets measured gets done. The goal was 100% equal pay for equal work.
They also wanted to measure how well they were doing towards that goal. The initial pledge called on any company that signs on to agree to participate in data analysis to determine what the wage gap was across the city of Boston.
When Mayor Menino retired, he met Azer Bestavros here at BU and mentioned they were stuck on the measurement component because the data was sensitive and had to remain private. Azer was familiar with secure MPC. Within a year, they convinced about 90 companies to join the compact.
MPC was the only way to get this measurement to happen. It's much cheaper, safer, easier, and more effective for nobody to have access to the data than for somebody to have access to the data.
Trusted CI: How does secure MPC keep the data secure?
M.V. Each company has their own payroll information. Each company goes to the website (100talent.org) where they can drag and drop a spreadsheet that represents their payroll information. It’s the same format they already use for the Equal Employment Opportunity Commission.
They click a button, sending the data to two different places. Data is being split such that the real data is not going anywhere, but fake encoded data is going to two separate places.
The data is being encoded in such a way that one piece of the data is going to the Boston Women's Workforce Council and one piece of the data is going to Boston University. And these two pieces have the property that individually they look like random garbage. There's no meaning in the data that Boston University gets or that the Boston Women's Workforce Council gets. But the data has the property that the two of us working together can still do an analysis over the data even though each one of us individually has no idea what it says.
Trusted CI: Who would be the broader set of target users for secure MPC? What challenges would they have that secure MPC might solve?
M.V. MPC has value but there are a few constraints for an application to be amenable to MPC. It must have pieces that involve multiple organizations. Or rather, that it crosses privacy silos. It could be even various divisions within one company that are not interested in sharing data with each other. It doesn't have to be a corporate boundary, but there must be a privacy/security boundary that's being crossed.
You'd want scenarios where there's some interesting data analysis that has either commercial or social value—where the result of the calculation is something that is safe to share, safe to make public. It should have social benefit, but the data is sensitive, protected, and can't be revealed. This is when MPC can help.
It takes a while to tease out of the researchers what they really want, as opposed to just the questions they think they can answer. The benefit of MPC is it helps them figure out what is the real question they are after. Social scientists are very good at thinking about those questions. And we can help them with how to go about doing that in a way that doesn't breach privacy and confidentiality.
Trusted CI: What if people want to use the secure MPC assets, how do they access them?
M.V. We have several software packages that are available and are open-source on GitHub (github.com/multiparty) that anyone can use:
1. web-MPC - very easy to use
2. JIFF – web-based but more flexible, can do more complicated analysis, but requires more tuning
3. Conclave - for high-performance data processing at scale, where you have hundreds of gigabytes of data, and can run on the cloud
Trusted CI: Is there any type of support structure?
M.V. We have a group of professional software engineers and we are happy to collaborate with any interested parties. We also have a Collaboratory of many different interested companies. And we're always happy to have new members join. We have a website that's separate from our GitHub repository: multiparty.org.
Trusted CI: Please tell us more about the secure MPC Transition to Practice journey.
M.V. After I joined BU in 2015, I connected with Professor Azer Bestavros who, as previously stated, had learned about Boston’s need through former Mayor Menino. Azer is not a cryptographer, but he knew of MPC, so he started working with our group. It was all very serendipitous.
Since then we've been working with the Boston Women's Workforce Council for the past five years. The goal is to do an analysis every year or every other year to get a longitudinal analysis of whether we are moving rapidly towards a world of equal pay for equal work. The first calculation happened in 2015 once we built the software and started running it. The second one in 2016 and the third one in 2017. They chose not to do one in 2018. The most recent one happened in 2019. All of the 2016-17 data analyses are publicly available on the City of Boston website.
Not only did they have a problem where MPC could help, they tried solving the problem without MPC and failed. But one of the hardest pieces towards getting adoption of MPC is for people to even know that it's possible. If they had found a trusted third party that they were all somehow magically willing to use and that was willing to take the data, then this probably never would have happened.
Trusted CI: What is the chronology of MPC?
M.V. MPC has been researched since the mid-1980s as a theoretical concept, but there have been rapid advances in the last 5 years to make it practical and take it out of the lab, and benefit from faster computers. While BU has been doing theoretical research in MPC for a while, the interaction with the Mayor's office has spurred several tech transition opportunities and catalyzed even more research from our group. We are very grateful to the National Science Foundation for sponsoring all of these recent endeavors under grants #1430145 (SCOPE), #1414119 (MACS), #1718135, #1739000, 1915763 and 1931714.
Trusted CI: Are you creating a business model for transitioning secure MPC to practice, like a services model?
M.V. We are very interested in working with technology transfer partners to deploy this technology. Companies like Red Hat and Honda are interested in partnering with us and giving us grants as a university to continue this development. Because it's a symbiotic relationship, it's in their interest to see these products continue to be developed, to continue to be matured, to continue to be made faster, to be made better. Everything is also open source, so anyone is free to use it.
This work is partially supported by the National Science Foundation under Grants #1430145 (SCOPE), #1414119 (MACS), #1718135, #1739000, 1915763 and 1931714. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.