Trusted CI Blog: trustworthy data

Showing posts with label trustworthy data. Show all posts

Tuesday, April 9, 2024

Trusted CI Webinar: SPHERE - Security and Privacy Heterogeneous Environment for Reproducible Experimentation, Monday April 22nd

Dr. Jelena Mirkovic and David Balenson are presenting the talk, SPHERE - Security and Privacy Heterogeneous Environment for Reproducible Experimentation, on April 22nd at 12pm Eastern time.

NOTE: This webinar is scheduled one hour later than the usual time.

Please register here.

Cybersecurity and privacy threats increasingly impact our daily lives, our national infrastructures, and our industry. Recent newsworthy attacks targeted nationally important infrastructure, our government, our researchers, and research facilities. The landscape of what needs to be protected and from what threats is rapidly evolving as new technologies are released and threat actors improve their capabilities through experience and close collaboration. Meanwhile, defenders often work in isolation, use private data and facilities, and produce defenses that are quickly outpaced by new threats. To transform cybersecurity and privacy research into a highly integrated, community-wide effort, researchers need a common, rich, representative research infrastructure that meets the needs across all members of the community, and facilitates reproducible science.

To meet these needs, USC Information Sciences Institute and Northeastern University have been funded by the NSF mid-scale research infrastructure program to build Security and Privacy Heterogeneous Environment for Reproducible Experimentation (SPHERE). This infrastructure will offer access to an unprecedented variety of hardware, software, and other resources connected by user-configurable network substrate, and protected by a set of security policies uniquely aligned with cybersecurity and privacy research needs. SPHERE will offer six user portals, closely aligned with needs of different user groups. It will support reproducible research through a combination of infrastructure services (easy experiment packaging, sharing and reuse) and community engagement activities (development of realistic experimentation environments and contribution of high-quality research artifacts).

Speaker Bios:

Dr. Jelena Mirkovic is Principal Scientist at USC-ISI and Research Associate Professor at USC. She received her MS and PhD from UCLA, and her BSc from University of Belgrade, Serbia. Jelena's research interests span networking and cybersecurity fields, as well as testbed experimentation. Her current research is focused on authentication, use of machine learning for network attack detection, large-scale dataset labeling for security, and user privacy. She is the lead PI on the SPHERE project.

Mr. David Balenson is Senior Supervising Computer Scientist and Associate Director of the Networking and Cybersecurity Division at USC-ISI. He received his MS and BS in Computer Science from the University of Maryland. His current research interests include cybersecurity and privacy for critical infrastructure and cyber-physical systems including automotive and autonomous vehicles, experimentation and test, technology transition, and multidisciplinary research. He is the Community Outreach Director for SPHERE.

---

Join Trusted CI's announcements mailing list for information about upcoming events. To submit topics or requests to present, see our call for presentations. Archived presentations are available on our site under "Past Events."

Monday, March 4, 2024

Trusted CI Webinar: Lessons from the ACCORD project, March 18th @11am Eastern

Ron Hutchins and Tho Nguyen are presenting the talk, Lesson from the ACCORD Project, on March 18th at 11am Eastern time.

Please register here.

The ACCORD cyberinfrastructure project at the University of Virginia (UVA) successfully developed and deployed a community infrastructure providing access to secure research computing resources for users at underserved, minority-serving, and non-PhD-granting institutions. ACCORD's operational model is built around balancing data protection with accessibility. In addition to providing secure research computing resources and services, key outcomes of ACCORD include creation of a set of policies that enable researchers external to UVA to access and use ACCORD. While the ACCORD expedition achieved its technical and operational goals, its broader mission of broadening access to underserved users had limited success. Toward gaining a better understanding of the barriers to researchers accessing ACCORD, our team carried out two community outreach efforts to engage with researchers and computing service leaders to hear their pain points as well as solicit their input for an accessible community infrastructure.

In this talk, we will describe the ACCORD infrastructure and its operational model. We will also discuss insights from our effort to develop policies to balance accessibility with security. And finally, we wil share lessons learned from community outreach efforts to understand institutional and social barriers to access.

Speaker Bios:

Ron Hutchins: In the early 1980’s, Ron worked at the Georgia Institute of Technology to create a networking laboratory in the College of Computing teaching data communications courses there. After moving to the role of Director of Campus Networks in 1991, Ron founded and led the Southern Crossroads network aggregation (SoX) across the Southeast. In 2001 after receiving his PhD in computer networks, he took on the role of Chief Technology Officer for the campus. In August of 2015, Ron moved into the role of Vice President of Information Technology for the University of Virginia, working to build partnerships across the campus. Recently, Ron has moved from VP to research faculty in the Computer Science department at UVA and is participating broadly across networking and research computing in general including work with the State of California building out the broadband fiber network backbone across the state.

Tho Nguyen is a computer science and policy expert. He served as project manager for the ACCORD effort from 2019-2021, and continues to support the project implementation and growth. Nguyen is currently a Senior Program Officer at the National Academies of Sciences, Engineering, and Medicine. From 2015-2021 Nguyen was on the research staff in the Department of Computer Science at the University of Virginia where he worked on compute-in-memory and developing HPCs for research. Prior to UVA, he was a AAAS Science and Technology Policy Fellow at the National Science Foundation where he worked primarily on the Cyber Physical Systems program. Nguyen holds a PhD in Systems & Controls (Electrical Engineering) from the University of Washington.

---

Tuesday, September 12, 2023

Trusted CI Webinar: Improving the Privacy and Security of Data for Wastewater-based Epidemiology, Sept. 25th @ 11am ET

Arizona State University's Ni Trieu is presenting the talk, Improving the Privacy and Security of Data for Wastewater-based Epidemiology, on September 25th at 11am Eastern time.

Please register here.

As the use of wastewater for public health surveillance continues to expand, inevitably sample collection will move from centralized wastewater treatment plants to sample collection points within the sewer collection system to isolate individual neighborhoods and communities. Collecting data at this geospatial resolution will help identify variation in select biomarkers within neighborhoods, ultimately making the wastewater-derived data more actionable. However a challenge in achieving this is the nature of the wastewater collection system, which aggregates and commingles wastewater from various municipalities. Thus various stakeholders from different cities must collectively provide information to separate wastewater catchments to achieve neighborhood-specific public health information. Data sharing restrictions and the need for anonymity complicates this process.

This talk presents our approaches to enabling data privacy in wastewater-based epidemiology. Our methodology is built upon a cryptographic technique, Homomorphic Encryption (HE), ensuring privacy. Additionally, we outline a technique to enhance the performance of HE, which could be of independent interest.

Speaker Bio:

Ni Trieu is currently an Assistant Professor at Arizona State University (ASU). Her research interests lie in the area of cryptography and security, with a specific focus on secure computation and its applications such as private set intersection, private database queries, and privacy-preserving machine learning. Prior to joining ASU, she was a postdoc at UC Berkeley. She received her Ph.D. degree from Oregon State University.

---

Monday, June 12, 2023

Trusted CI Webinar: SecureMyResearch at Indiana University: Effective Cybersecurity for Research, June 26th@11am EST

Members from Indiana University's Center for Applied Cybersecurity Research are presenting the talk, SecureMyResearch at Indiana University: Effective Cybersecurity for Research, June 26th at 11am (Eastern).

Please register here.

The tension between research and cybersecurity has long hampered efforts to secure research. It has kept past institutional cybersecurity effort concentrated on the most sensitive research, but new threats to research integrity and recent federal initiatives such as NSPM-33 are now pointing to a future where securing research holistically is no longer optional. Indiana University launched a pilot in 2020 called SecureMyResearch to expand to the entire campus a research cybersecurity model culminating from years of interaction with biomedical researchers in the School of Medicine. Turning the traditional approach on its head, it aimed to reduce the cybersecurity and compliance burden on the researcher by making cybersecurity invisible. It was laser-focused on the research mission and on accommodating the pace of research. Three years later, the results are showing great promise in breaking the research versus security impasse. Not only have we reached 80 percent penetration on campus, researchers are embracing the service voluntarily and research is being accelerated measurably. In this webinar we will share IU’s research cybersecurity journey and the SecureMyResearch implementation.

https://cacr.iu.edu/projects/SecureMyResearch/index.html

Speaker Bios:

Anurag Shankar provides leadership at CACR in regulatory compliance (HIPAA, FISMA, and DFARS/CMMC), research cybersecurity, and cyber risk management. He developed and leads the SecureMyResearch effort at IU. He has over three decades of experience conducting research, developing and delivering research computing services, building HIPAA compliant solutions for biomedical researchers, conducting cybersecurity assessments, and providing consulting. He is a computational astrophysicist by training (Ph.D. 1990, U. of Illinois).

Will Drake is a senior security analyst, CISO at CACR, and the SecureMyResearch lead. Will has worked in various IT roles with Indiana University since 2012, including Operations Supervisor for UITS Data Center Operations and Lead Systems Engineer for the Campus Communications Infrastructure team where he was responsible for ensuring the security of IU’s critical telecommunications infrastructure. Will holds an Associate’s Degree in Computer Information Technology from Ivy Tech and is currently pursuing a Bachelor’s Degree in Informatics with a specialization in Legal Informatics from IUPUI’s School of Informatics and Computing.

Tim Daniel is an information security analyst at CACR and a member of the SecureMyResearch team. Previously, Tim worked for a contract research organization carrying out phase 1 and pre-phase 1 clinical trials for veterinary medicine. He holds a bachelor’s degree in biology with a focus in chemistry, and an associate's degree in applied biotechnology. After high school, Tim worked for Stone Belt, a nonprofit that provides resources and supports for individuals with disabilities, where he learned patience and listening skills.

---

Friday, November 20, 2020

Open Science Cyber Risk Profile (OSCRP), and Data Confidentiality and Data Integrity Reports Updated

In April 2017, Trusted CI released the Open Science Cyber Risk Profile (OSCRP), a document designed to help principal investigators and their supporting information technology professionals assess cybersecurity risks related to open science projects. The OSCRP was the culmination of extensive discussions with research and education community leaders, and has since become a widely-used resource, including numerous references in recent National Science Foundation (NSF) solicitations.

The OSCRP has always been intended to be a living document. In order to gather material for continued refreshing of ideas, Trusted CI has spent the past couple of years performing in-depth examination of additional topics for inclusion in a revised OSCRP. In 2019, Trusted CI examined the causes of random bit flips in scientific computing and common measures used to mitigate the effects of “bit flips.” Its report, “An Examination and Survey of Random Bit Flips and Scientific Computing,” was issued in December 2019. In order to address the community's need for insights on how to start thinking about computing on sensitive data, in 2020, Trusted CI examined data confidentiality issues and solutions in academic research computing. Its report, “An Examination and Survey of Data Confidentiality Issues and Solutions in Academic Research Computing,” was issued in September 2020.

Both reports have now been updated, with the current versions being made available at the links to the report titles above. In conjunction, the Open Science Cyber Risk Profile (OSCRP) itself has also been refreshed with insights from both data confidentiality and data integrity reports.

All of these documents will continue to be living reports that will be updated over time to serve community needs. Comments, questions, and suggestions about this post, and both documents are always welcome at info@trustedci.org.

Thursday, November 19, 2020

Trusted CI Webinar: Trustworthy Data panel Mon Dec 7 @11am Eastern

The Trustworthy Data Working Group is hosting a panel on Monday December 7th at 11am (Eastern) to discuss tools, standards, community practices for trustworthy scientific data sharing. Our panelists are:

Jim Basney: Deputy Director, Trusted CI
Sandra Gesing: Associate Research Professor, Department of Computer Science and Engineering; Computational Scientist, Center for Research Computing; University of Notre Dame (PresQT)
Bob Hanisch: Director of Office of Data & Informatics, NIST
Rebecca Koskela: Executive Director, Research Data Alliance - US (RDA-US)

Please register here. Be sure to check spam/junk folder for registration confirmation email.

The Trustworthy Data Working Group (TDWG) is a collaborative effort of Trusted CI, the four NSF Big Data Innovation Hubs, the NSF CI CoE Pilot, the Ostrom Workshop on Data Management and Information Governance, the NSF Engagement and Performance Operations Center (EPOC), the Indiana Geological and Water Survey, the Open Storage Network, and other interested community members. The goal of the working group is to understand scientific data security concerns and provide guidance on ensuring the trustworthiness of data.

This year the TDWG published a survey report about data security concerns and practices amongst the scientific community. And, building off the insights of the survey report, the working group published a guidance report on trustworthy data for science projects, including science gateways, that covers the topics that the panel will be discussing.

This panel is an opportunity to discuss the work of the TDWG in the larger context of related work by PresQT, NIST, and RDA-US.

Monday, November 2, 2020

PEARC20: Another successful workshop and training at PEARC

Trusted CI had another successful exhibition at PEARC20.

We hosted our Fourth Workshop on Trustworthy Scientific Cyberinfrastructure for our largest audience to date. The topics covered during the year's workshop were:

Community Survey Results from the Trustworthy Data Working Group (slides)

Presenters: Jim Basney, NCSA / Trusted CI; Jeannette Dopheide, NCSA / Trusted CI; Kay Avila, NCSA / Trusted CI; Florence Hudson, Northeast Big Data Innovation Hub / Trusted CI

Characterization and Modeling of Error Resilience in HPC Applications (slides)

Presenter: Luanzheng Guo, University of California-Merced

Trusted CI Fellows Panel (slides)

Moderator: Dana Brunson, Internet2
Panelists: Jerry Perez, University of Texas at Dallas; Laura Christopherson, Renaissance Computing Institute; Luanzheng Guo, University of California, Merced; Songjie Wang, University of Missouri; Smriti Bhatt, Texas A&M University - San Antonio; Tonya Davis, Alabama A&M University

Analysis of attacks targeting remote workers and scientific computing infrastructure during the COVID19 pandemic at NCSA/UIUC (slides)

Presenters: Phuong Cao, NCSA / University of Illinois at Urbana-Champaign; Yuming Wu, Coordinated Science Laboratory / University of Illinois at Urbana-Champaign; Satvik Kulkarni, University of Illinois at Urbana-Champaign; Alex Withers, NCSA / University of Illinois at Urbana-Champaign; Chris Clausen, NCSA / University of Illinois at Urbana-Champaign

Regulated Data Security and Privacy: DFARS/CUI, CMMC, HIPAA, and GDPR (slides)

Presenters: Erik Deumens, University of Florida; Gabriella Perez, University of Iowa; Anurag Shankar, Indiana University

Securing Science Gateways with Custos Services (slides)

Presenters: Marlon Pierce, Indiana University; Enis Afgan, Johns Hopkins University; Suresh Marru, Indiana University; Isuru Ranawaka, Indiana University; Juleen Graham, Johns Hopkins University

We will post links to the recordings when they are made public.

In addition to the workshop, Trusted CI team member Kay Avila co-presented a Jupyter security tutorial titled “The Streetwise Guide to Jupyter Security” (event page) with Rick Wagner. This presentation was based on the “Jupyter Security” training developed by Rick Wagner, Matthias Bussonnier, and Trusted CI’s Ishan Abhinit and Mark Krenz for the 2019 NSF Cybersecurity Summit.

Monday, October 12, 2020

Trusted CI Webinar: Enforcing Security and Privacy Policies to Protect Research Data Mon Oct 26 @11am Eastern

University of Virginia's Yuan Tian is presenting the webinar, Enforcing Security and Privacy Policies to Protect Research Data, on Monday October 26th at 11am (Eastern).

Please register here. Be sure to check spam/junk folder for registration confirmation email.

Advances in computer systems over the past decade have laid a solid foundation for data collection at a staggering scale. Data generated from end-user devices has tremendous value to the research community. For example, mobile and Internet-of-Things devices can participate in large-scale Internet-based measurement or monitoring of patient's health conditions. While ground-breaking discovered may occur, malicious attacks or unintentional data leaks threaten the research data. Such a threat is hard to predict and difficult to recover from once it happens. Preventative and defensive measures should be taken where data is generated in order to protect private, valuable data from the attackers. Currently, there are efforts that try to regulate data management, for example, a research application might have a privacy policy that describes how the user data is being collected and protected. However, there is a disconnect between these documented policies and the implementations of a research project.

In this talk, I’ll present our research, which interprets the documented policies automatically with NLP (natural language processing) and enforce them in the code of research projects, in order to protect the privacy of research data. This work can significantly reduce researchers' overhead in implementing policy-compliant code and reduce the complexity of protecting research datasets.

Speaker Bio:

Yuan Tian is an Assistant Professor of Computer Science at University of Virginia. Her research focuses on security and privacy and its interactions with systems, and machine learning. Her work has a real-world impact on platforms (such as iOS, Chrome, and Azure). She is a recipient of NSF CAREER Award 2020, Amazon Faculty Fellowship 2019, CSAW Best Paper Award 2019, Rising Stars in EECS 2016.

Thursday, October 1, 2020

Requesting Feedback on Initial Report and Upcoming Webinar: Guidance for Trustworthy Data Management in Science Projects

The Trustworthy Data Working Group has published an initial draft report at https://doi.org/10.5281/zenodo.4056241 on guidance for trustworthy management in science projects.

We invite the community’s feedback on the initial version of this report and input toward our revisions via the working group mailing list. You may also send input directly to Jim Basney at jbasney@illinois.edu. Please attend the Science Gateways webinar on Wednesday, October 7th at 1pm Eastern, where Jim will be presenting an overview of the guidance report.

This report builds off key findings from its previously published survey report regarding trustworthy data and provides recommendations to address those concerns. The report covers stakeholders of trustworthy data, the definition of trustworthiness, findings from the survey report, barriers to trustworthiness, tools and technologies for trustworthy data, and communication of trustworthiness.

We thank all the members of the Trustworthy Data Working Group for their help with developing this guidance as well as their participation throughout the year. The working group will be revising its guidance in November, incorporating community input received in October, to be included in the working group's final report in December.

Working group membership is open to all who are interested. Please visit https://www.trustedci.org/2020-trustworthy-data for details.

Thursday, September 10, 2020

Data Confidentiality Issues and Solutions in Academic Research Computing

Many universities have needs for computing with “sensitive” data, such as data containing protected health information (PHI), personally identifiable information (PII), or proprietary information. Sometimes this data is subject to legal restrictions, such as those imposed by HIPAA, CUI, FISMA, DFARS, GDPR, or the CCPA, and at other times, data may simply not be sharable per a data use agreement. It may be tempting to think that such data is typically only in the domain of DOD and NIH funded research, but it turns out that this assumption is far from reality. While this issue arises in numerous scientific domains, including ones that people might immediately think of, such as medical research, it also arises in numerous others, including economics, sociology, and other social sciences that might look at financial data, student data or psychological records; chemistry and biology particularly that which relates to genomic analysis and pharmaceuticals, manufacturing, and materials; engineering analyses, such as airflow dynamics; underwater acoustics; and even computer science and data analysis, including advanced AI research, quantum computing, and research involving system and network logs. Such research is funded by an array of sponsors, including the National Science Foundation (NSF) and private foundations.

Few organizations currently have computing resources appropriate for sensitive data. However, many universities have started thinking about how to enable computing of sensitive data, but may not know where to start.

In order to address the community need for insights on how to start thinking about computing on sensitive data, in 2020, Trusted CI examined data confidentiality issues and solutions in academic research computing. Its report, “An Examination and Survey of Data Confidentiality Issues and Solutions in Academic Research Computing,” was issued in September 2020. The report is available at the following URL:

https://escholarship.org/uc/item/7cz7m1ws

The report examined both the varying needs involved in analyzing sensitive data and also a variety of solutions currently in use, ranging from campus and PI-operated clusters to cloud and third-party computing environments to technologies like secure multiparty computation and differential privacy. We also discussed procedural and policy issues involved in campuses handling sensitive data.

Our report was the result of numerous conversations with members of the community. We thank all of them and are pleased to acknowledge those who were willing to be identified here and also in the report:

Thomas Barton, University of Chicago, and Internet2
Sandeep Chandra, Director for the Health Cyberinfrastructure Division and Executive Director for Sherlock Cloud, San Diego Supercomputer Center, University of California, San Diego
Erik Deumens, Director of Research Computing, University of Florida
Robin Donatello, Associate Professor, Department of Mathematics and Statistics, California State University, Chico
Carolyn Ellis, Regulated Research Program Manager, Purdue University
Bennet Fauber, University of Michigan
Forough Ghahramani, Associate Vice President for Research, Innovation, and Sponsored Programs, Edge, Inc.
Ron Hutchins, Vice President for Information Technology, University of Virginia
Valerie Meausoone, Research Data Architect & Consultant, Stanford Research Computing Center
Mayank Varia, Research Associate Professor of Computer Science, Boston University

For the time being, this report is intended as a standalone initial draft for use by the academic computing community. Later in 2020, this report will be accompanied by an appendix with additional technical details on some of the privacy-preserving computing methods currently available.

Finally, in late 2020, we also expect to integrate issues pertaining to data confidentiality into a future version of the Open Science Cyber Risk Profile (OSCRP). The OSCRP is a document that was first created in 2016 to develop a “risk profile” for scientists to help understand risks to their projects via threats posed through scientific computing. While the first version included issues in data confidentiality, a revised version will include some of our additional insights gained in developing this report.

As with many Trusted CI reports, both the data confidentiality report and the OSCRP are intended to be living reports that will be updated over time to serve community needs. It is our hope that this new report helps answer many of the questions that universities are asking, but also that begins conversations in the community and results in questions and feedback that will help us to make improvements to this report over time. Comments, questions, and suggestions about this post, and both documents are always welcome at info@trustedci.org

Going forward, the community can expect additional reports from us on the topics mentioned above, as well as a variety of other topics. Please watch this space for future blog posts on these studies.

Thursday, July 9, 2020

PEARC20: Join us at the Fourth Workshop on Trustworthy Scientific Cyberinfrastructure

Join us at the Fourth Workshop on Trustworthy Scientific Cyberinfrastructure at PEARC20 on Monday July 27th, 8:00am - 12:00pm Pacific Time (11:00am - 3:00pm Eastern Time / 15:00 - 19:00 UTC). The workshop provides an opportunity for sharing experiences, recommendations, and solutions for addressing cybersecurity challenges in research computing. It also provides a forum for information sharing and discussion among a broad range of attendees, including cyberinfrastructure operators, developers, and users.

The workshop is organized according to the following goals:

Increase awareness of activities and resources that support the research computing community's cybersecurity needs.
Share information about cybersecurity challenges, opportunities, and solutions among a broad range of participants in the research computing community.
Identify shared cybersecurity approaches and priorities among workshop participants through interactive discussions.

Schedule

See our workshop page for the full presentation abstracts. The order of presentations is subject to change and will be posted to the workshop page.

8:00 am Pacific / 11:00 am Eastern

Community Survey Results from the Trustworthy Data Working Group

Presenters: Jim Basney, NCSA / Trusted CI
Jeannette Dopheide, NCSA / Trusted CI
Kay Avila, NCSA / Trusted CI
Florence Hudson, Northeast Big Data Innovation Hub / Trusted CI

8:30 am Pacific / 11:30 am Eastern

Characterization and Modeling of Error Resilience in HPC Applications

Presenter: Luanzheng Guo, University of California-Merced

9:00 am Pacific / 12:00 pm Eastern

Trusted CI Fellows Panel

Moderator: Dana Brunson, Internet2
Panelists: Jerry Perez, University of Texas at Dallas
Laura Christopherson, Renaissance Computing Institute
Luanzheng Guo, University of California, Merced
Songjie Wang, University of Missouri
Smriti Bhatt, Texas A&M University - San Antonio
Tonya Davis, Alabama A&M University

9:30 - 10:30 am Pacific / 12:30 pm - 1:30 pm Eastern ***Break/Lunch***
10:30 am Pacific / 1:30 pm Eastern

Analysis of attacks targeting remote workers and scientific computing infrastructure during the COVID19 pandemic at NCSA/UIUC

Presenters: Phuong Cao, NCSA/U of Illinois at Urbana-Champaign
Yuming Wu, Coordinated Science Lab/UIUC
Satvik Kulkarni, U of Illinois at Urbana-Champaign
Alex Withers, NCSA/U of Illinois at Urbana-Champaign
Chris Clausen, NCSA/U of Illinois at Urbana-Champaign

11:00 am Pacific / 2:00 pm Eastern

Regulated Data Security and Privacy: DFARS/CUI, CMMC, HIPAA, and GDPR

Presenters: Erik Deumens, University of Florida
Gabriella Perez, University of Iowa
Anurag Shankar, Indiana University

11:30 am Pacific / 2:30 pm Eastern

Securing Science Gateways with Custos Services

Presenters: Marlon Pierce, Indiana University
Enis Afgan, Johns Hopkins University
Suresh Marru, Indiana University
Isuru Ranawaka, Indiana University
Juleen Graham, Johns Hopkins University

For any questions regarding this workshop, please contact workshop-cfp@trustedci.org.

Thursday, July 2, 2020

Survey Report: Scientific Data Security Concerns and Practices

The Trustworthy Data Working Group has published a report at https://doi.org/10.5281/zenodo.3906865 that summarizes the results from our survey of scientific data security concerns and practices. 111 participants completed the survey from a wide range of positions and roles within their organizations and projects. We invite the community’s feedback on this report and input to the ongoing work of the working group via the working group mailing list. You may also send input directly to Jim Basney at jbasney@illinois.edu.

Next, the working group will be developing guidance on trustworthy data for science projects and cyberinfrastructure developers, based on the survey results and on resources from NIST, RDA, ESIP and others. Related work includes NIST 1800-25, the TRUST Principles for Digital Repositories, and Risk Assessment for Scientific Data. The working group will also be providing input into the next revision of the Open Science Cyber Risk Profile (OSCRP).

Working group membership is open to all who are interested. Please visit https://www.trustedci.org/2020-trustworthy-data for details.

Tuesday, June 23, 2020

Fantastic Bits and Why They Flip

In 2019, Trusted CI examined the causes of random bit flips in scientific computing and common measures used to mitigate the effects of “bit flips.” (In a separate effort, we will also be issuing a similar report on data confidentiality needs in science, as well.) Its report, “An Examination and Survey of Random Bit Flips and Scientific Computing,” was issued a few days before the winter holidays in December 2019. As news of the report was buried amidst the winter holidays and New Year, we are pleased to highlight the report in a bit more detail now. This post is longer than most of Trusted CI’s blog posts to give you a feel for the report and hopefully entice you to read it.

For those reading this that are not computer scientists, some background: What in the world is “bit,” how can one “flip” and what makes one occur randomly? Binary notation is the base-2 representation of numbers as combinations of the digits 0 and 1, in contrast to the decimal notation most of us are used to in our daily lives that represents digits as combinations of the digits 0 through 9. In binary notation, A “bit” is the atomic element of the representation of a 1 or a 0. Bits --- 0s or 1s --- can be combined together to represent numbers larger than 0 or 1 in the same way that decimal digits can be put together to represent numbers larger than 9.

Binary notation has been in use for many hundreds of years. The manipulation of binary numbers made significant advances in the mid 19th century through the efforts of George Boole, who introduced what was later referred to as Boolean algebra or Boolean logic. This advance in mathematics, combined with electronic advances in switching circuits and logic gates by Claude Shannon (and others) in the 1930s led to binary storage and logic as the basis of computing. As such, binary notation, with numbers represented as bits, are the basis of how most computers have stored and processed information since the inception of electronic computers.

However, while we see the binary digits 0 and 1 as discrete, opposite, and rigid representations, in the same way that North and South represent directions, the components of a computer that underlie these 0 and 1 representations are analog components that reveal that 0 and 1 are in fact closer to shades of grey. In fact, 0 and 1 are typically stored magnetically and transmitted through electrical charges. In reality, both magnetism and electrical charges can either degrade or otherwise be altered through external forces, including cosmic rays or other forms of radiation and magnetism. To a computer, a “bit flip” is the change of the representation of a number from a 0 to a 1 or vice versa. Underlying that “flip” could have been a sudden burst of radiation that suddenly and instantly altered magnetic storage or electrical transmission, or could also have been the slow degradation of the magnetism of a magnetically stored bit from something close to 1, or a “full” magnetic charge, to something less than 0.5, at which point it would be recognized and interpreted as a 0.

The use of error correction in computing and communication was pioneered in the 1940s and 1950s by Richard Hamming, who used some form of redundancy to help to identify and mask the effects of bit flips. Despite the creation of these techniques 70–80 years ago, it is still not the case that error correction is universally used. And, even when it is, there are limits to the amount of errors that can be incurred in a particular blob of data (a number, a file, a database) before the errors can fail to be correctable, or even to be detected at all.

The report that Trusted CI published last year describes the methods for why bit flips occur. These include isolated single errors due to some kind of interference, bursty faults of a number of sequential bits, due to some kind of mechanical failure or electrical interference; or malicious tampering. The document then narrows to focus on isolated errors. Malicious tampering is the focus of future reports, for example, as are data errors or loss due to improper scientific design, mis-calibrated sensors, and outright bugs, including unaccounted-for non-determinism in computational workflows, improper roundoff and truncation errors, hardware failures, and “natural” faults.

The report then describes why single bit faults occur — such as via cosmic rays, ionizing radiation, and corrosion in metal — their potential odds of faults occurring for a variety of different components in computing, and potential mitigation mechanisms. The goal is to help scientists understand the risk that bit faults can either lead to scientific data that is in some way incorrect, due to bit flips, or an inability to reproduce scientific results in the future, which is of course a cornerstone of the scientific process.

As part of the process of documenting mitigation mechanisms, the authors of the report surveyed an array of scientists with scientific computing workflows, as well as operators of data repositories, and computing systems ranging from small clusters to large-scale DOE and NSF high-performance computing systems. The report also discusses the impact of bit flips on science. For example, in some cases, including certain types of metadata, corrupt data might be catastrophic. In other cases, such as images,, or situations where there are already multiple data streams collecting that cross-validate each other, the flip of a single bit or even a small handful of bits is largely or entirely lost in the noise. Finally, the report collects these mechanisms into a set of practices, divided by components involved in scientific computing, that scientists may wish to consider implementing in order to protect their data and computation — for example, using strong hashing before storing or transmitting data, file systems with automated integrity repair built in, disks with redundancy built in, and even leveraging fault tolerant algorithms where possible.

For the time being, this report is intended as a standalone first draft for use by the scientific computing community. Later in 2020, this report will be combined with insights from the Trusted CI “annual challenge” on trustworthy data to more broadly offer guidance on integrity issues beyond bit flips. Finally, in late 2020, we expect to integrate issues pertaining to bit flips into a future version of the Open Science Cyber Risk Profile (OSCRP). The OSCRP is a document that was first created in 2016 to develop a “risk profile” for scientists to help understand risks to their projects via threats posed through scientific computing. While the first version included issues in data integrity, a revised version will include bit flips more directly and in greater detail.

As with many Trusted CI reports, both the bit flip report and the OSCRP are intended to be living reports that will be updated over time to serve community needs. As such, comments, questions, and suggestions about this post, and both documents are always welcome at info@trustedci.org
Going forward the community can expect additional reports from us on the topics mentioned above, as well as a variety of other topics. Please watch this space for future blog posts on these studies.

Tuesday, April 21, 2020

Study of scientific data security concerns and practices

The Trustworthy Data Working Group invites scientific researchers and the cyberinfrastructure professionals who support them to complete a short survey about scientific data security concerns and practices.

The working group is a collaborative effort of Trusted CI, the four NSF Big Data Innovation Hubs, the NSF CI CoE Pilot, the Ostrom Workshop on Data Management and Information Governance, the NSF Engagement and Performance Operations Center (EPOC), the Indiana Geological and Water Survey, the Open Storage Network, and other interested community members. The goal of the working group is to understand scientific data security concerns and provide guidance on ensuring the trustworthiness of data.

The purpose of this survey is to:

Improve broad understanding of the range of data security concerns and practices for open science
Provide input and help shape new guidance for science projects and cyberinfrastructure providers
Serve as an opportunity to consider local data security concerns during a voluntary, follow-up interview

Please visit https://surveys.illinois.edu/sec/281601 to complete the survey (before May 31st), and please share this announcement to help us obtain a broad set of responses representing a diversity of perspectives across the scientific community. Multiple individuals from the same organization/project are welcome to take the survey.

Survey results, along with the analysis and applicable guidance, will be published by the Trustworthy Data Working Group as a freely available report by the end of 2020. Please visit https://trustedci.org/trustworthy-data for updated information about the study.

Any questions/comments, please contact Jim Basney at jbasney@illinois.edu.

Friday, January 24, 2020

Invitation to Join Trustworthy Data Working Group

In February 2020, Trusted CI will launch a new Trustworthy Data Working Group. With the recent renewal by NSF, Trusted CI is focusing each year on a new challenge to NSF science, and this working group represents our inaugural effort for 2020. The group will survey science projects to learn about data security concerns and practices. We will analyze the survey results and develop guidance for science projects and cyberinfrastructure developers, including references to existing resources. We will then publish the survey results, along with the analysis and guidance, as a freely-available report by the end of 2020, and we will share the results in a Trusted CI webinar and in other venues.

To form this working group, Trusted CI is collaborating with the four NSF Big Data Innovation Hubs, the NSF CI CoE Pilot, the Ostrom Workshop on Data Management and Information Governance, the NSF Engagement and Performance Operations Center (EPOC), the Indiana Geological and Water Survey, the Open Storage Network, and other interested community members. Participation in the working group is open to all.

To participate:

Visit https://trustedci.org/2020-trustworthy-data to comment on the working group’s draft charter and join the mailing list.
Complete the poll at https://doodle.com/poll/hhswbyu4927638dr to help us select a weekly time for a working group call.
Respond to our survey when it is announced and comment on our draft report when it is published.

Any questions/comments? Join the discussion on the mailing list or contact the working group chair (Jim Basney).

Tuesday, July 10, 2018

CCoE Webinar July 23rd at 11am ET: Trustworthy Computing for Scientific Workflows

Mayank Varia and Andrei Lapets are presenting the talk "RSARC: Trustworthy Computing over Protected Datasets" on Monday July 23rd at 11am (Eastern).

Please register here. Be sure to check spam/junk folder for registration confirmation email.

There has been an unprecedented increase in the quantity of research data available in digital form. Combining these information sources within analyses that leverage cloud computing frameworks and big data analytics platforms has the potential to lead to groundbreaking innovations and scientific insights. As developers and operators of the widely used Dataverse repository and the Massachusetts Open Cloud platform, we have been working to advance this innovative revolution by colocating datasets in common platforms, curating and tagging datasets with both functional and legal access policies, offering helper services such as search and easy citation to promote sharing, and providing on-demand computational platforms to ease analytics. Unfortunately, we observe that a certain segment of our scientific user base cannot enjoy the full transformative capacity achievable within our cyberinfrastructure. Due to concerns over the privacy and confidentiality of their data sources, or the potential of commercial exploitation of their raw data sets, these researchers are isolating themselves within siloed data repositories and well-protected computational enclaves rather than sharing their datasets with fellow scientists.

This talk will describe cryptographic technological enhancements that are ready to provide scientific researchers with mechanisms to do collaborative analytics over their datasets while keeping those datasets protected and confidential. Secure multi-party computation (MPC) is a cryptographic technology that allows independent organizations to compute an analytic jointly over their data in such a manner that nobody learns anything other than the desired output. Hence, MPC empowers organizations to make their data available for collective data aggregation and analysis while still adhering to pre-existing confidentiality constraints, legal restrictions, or corporate policies governing data sharing. Our new Conclave framework can connect to many existing backend stacks where the data already live, can automatically analyze a query to identify when a computation must cross data silos, and can leverage MPC in a scalable and usable manner when it is necessary to enable the computation.

In summary, while data sharing cyberinfrastructures today are intended to allow everyone to benefit from the initial cost of having one researcher collect data, privacy concerns (and the resulting breakdown of data sharing) transform this burden into a marginal cost that every researcher who wants access to the data must pay. We will describe how a holistic integration of secure MPC into a scientific computing infrastructure addresses a growing need in research computing: enabling scientific workflows involving collaborative experiments or replication/extension of existing results when the underlying data are encumbered by privacy constraints.

Mayank Varia is a research associate professor of computer science at Boston University and the co-director of the Center on Reliable Information Systems & Cyber Security (RISCS). His research interests span theoretical and applied cryptography and their application to problems throughout and beyond computer science. He currently directs an NSF Frontier project that addresses grand challenges in cloud security, aiming to design an architecture where the security of the system as a whole can be derived in a modular, composable fashion from the security of its components (bu.edu/macs). He received a Ph.D. in mathematics from MIT for his work on program obfuscation.

Andrei Lapets is Associate Professor of the Practice in Computer Science, Director of Research Development at the Hariri Institute for Computing, and Director of the Software & Application Innovation Lab at Boston University. His research interests include cybersecurity, formal methods and domain-specific programming language design, and data science. He holds a Ph.D. from Boston University, and A.B. and S.M. degrees from Harvard University.

Presentations are recorded and include time for questions with the audience.

Join Trusted CI's announcements mailing list for information about upcoming events. To submit topics or requests to present, see our call for presentations. Archived presentations are available on our site under "Past Events."

Tuesday, April 2, 2013

trust-HUB: an online community for hardware security and trust

A colleague pointed out the NSF-funded trust-HUB project to me last week: http://www.trust-hub.org/. trust-HUB, similar to CTSC's trustedci.org website, looks to build an online community working with hardware security and trust.