Wednesday, April 11, 2018

Trusted CI Webinar April 23rd at 11am ET: Toward Security-Managed Virtual Science Networks

Duke University's Jeff Chase and RENCI's Paul Ruth are presenting the talk, "Toward Security-Managed Virtual Science Networks" on April 23rd at 11am (Eastern).

Please register here. Be sure to check spam/junk folder for registration confirmation email.
Data-intensive science collaborations increasingly provision dedicated network circuits to share and exchange datasets securely at high speed, leveraging national-footprint research fabrics such as ESnet or I2/AL2S.   This talk first gives an overview of new features to automate circuit interconnection of science resources across campuses and in network cloud testbeds, such as GENI (e.g., ExoGENI) and NSFCloud (e.g., Chameleon).    Taken together, these tools can enable science teams to deploy secure bandwidth-provisioned virtual science networks that link multiple campuses and/or virtual testbed slices, with integrated in-network processing on virtual cloud servers.

Next, we outline a software framework to address security issues arising in these virtual science networks.   We show how to deploy virtual science networks with integrated security management programmatically, using software-defined networking and network function virtualization (SDN/NFV).   As an example, we describe a prototype virtual Network Service Provider that implements SDX-like functionality for policy-based interconnection of its customers, and incorporates out-of-band monitoring of permitted flows using Bro intrusion detection instances hosted on cloud VMs.  We also describe how to use a new logical trust system called SAFE to express and enforce access policies for edge peering and permitted flows, and to validate IP prefix ownership and routing authority (modeling RPKI and BGPSEC protocols) in virtual science networks.

This material is based upon work supported by the National Science Foundation under Grants No. (ACI-1642140, ACI-1642142, CNS-1330659, CNS-1243315) and through the Global Environment for Network Innovations (GENI) program.  Any opinions, findings, and conclusions or recommendations do not necessarily reflect the views of NSF.
Jeffrey S. Chase is a Professor of Computer Science at Duke University.  He joined Duke in 1995 after receiving his PhD in Computer Science from the University of Washington (Seattle).    He was an early leader in automated management for cluster services, cloud hosting systems, and server energy management.   He served as an architect in NSF’s GENI project and is a principal of ExoGENI, a multi-campus networked cloud testbed.

Paul Ruth is a Senior Research Scientist at RENCI-UNC Chapel Hill.  He received his PhD in Computer Science from Purdue University in 2007.  He has been a primary contributor to the ExoGENI testbed since 2011 and is currently the networking lead for the NSF Chameleon testbed.

Join Trusted CI's announcements mailing list for information about upcoming events. To submit topics or requests to present, see our call for presentations. Archived presentations are available on our site under "Past Events."

Monday, April 9, 2018

Cyberinfrastructure Vulnerabilities 2018 Q1 Report

The Cyberinfrastructure Vulnerabilities team provides concise announcements on critical vulnerabilities that affect science cyberinfrastructure (CI) of research and education centers, including those threats which may impact scientific instruments. This service is available to all CI community members by subscribing to Trusted CI’s mailing lists.

We monitor a number of sources for software vulnerabilities of interest. For those issues which warrant alerts to the Trusted CI mailing lists, we also provide guidance on how operators and developers can reduce risks and mitigate threats. We coordinate with XSEDE and the NSF supercomputing centers on drafting and distributing alerts to minimize duplication of effort and benefit from community expertise.

Some of the sources we monitor for possible threats to CI include:


In 1Q2018 the Cyberinfrastructure Vulnerabilities team issued the following 3 vulnerability alerts to 91 subscribers:


If you wish to subscribe to the Cyberinfrastructure Vulnerability Alerts mailing list you may do so through https://list.iu.edu/sympa/subscribe/cv-announce-l. This mailing list is public and the archives are available through https://list.iu.edu/sympa/arc/cv-announce-l.

If you believe you have information on a cyberinfrastructure vulnerability, let us know by sending us an email at alerts@trustedci.org.

Tuesday, April 3, 2018

Single vs multiple users on a cluster node?

Trusted CI recently received the following query from Chester Langin and are sharing his question and our answer with his permission:
As a security person, can you tell me the advantages and disadvantages of allowing more than one than one user on a cluster node at a time?  I ask because we just moved from Rocks/SGE to OpenHPC/SLURM.  Our old cluster allowed multiple users per node so, with 20 cores as an example, users with jobs running 8, 8, and 4 cores could all be running on the same compute node.  This provides high efficiency.  Our new cluster apparently restricts this so if the first user runs a job with, say 8 cores, nobody else can use that same node and 12 cores are not being used.  So, our users will be noticing that jobs will be backing up in queue.
Should we configure SLURM to allow multiple users per node?  Do you have a recommendation?  Can you give me pros and cons?
This is a classic example of a risk/reward trade-off. As you note in your question, allowing only a single user per node has the down side of lower efficiency. So what do you gain? 

There are risks with allowing multiple users per node in that user accounts are not as strong a guarantee of isolating users from each other as is having them on separate nodes. Bugs in the underlying system  (and hypervisor if we’re talking virtual machines), misconfigurations of the operating system, and errors in setting file permissions can allow information, potentially sensitive information and credentials, to leak between users on the same node. Some examples include CVE-15566, CVE-2017-5715, CVE-2017-4924. Additionally we've seen two recent cases in our software assessments where we found file system permissions were set too permissive allowing users to see each other data.

Hence you gain some risk reduction. We assume you can estimate the value of the efficient reduction in terms of lost CPU time, but how to you estimate the benefits of the risk reduction so you can compare these two things?

Unfortunately, quantifying this trade-off isn’t trivial - it’s a judgement call. Some questions to ask to determine which path makes sense for your system involved gauging the consequences of the security risks:
  • How big and diverse is your user community? If your users are all from a collaborating community or within the same institution, the consequences of data leakage could be lower. But if you have users who are competing research groups or companies, the stakes could be higher
  • What type of data does your system handle? Is it regulated data or other sensitive data that would increase the impact of the risks in question?
  • How you handle an incident can greatly impact its consequences. How poised are you to handle a incident if it occurs? Do you have a incident response plan in place that you regularly exercise? 
  • What is the risk tolerance of your stakeholders? Are you expected to squeeze every ounce of performance out of the system or is reputation considered more important? Is there any recent history related to security incidents that may impact this?