Thursday, March 12, 2020

Transition to Practice success story: Simplifying scientist access to cyberinfrastructure with CILogon

Service provides identity management, so research projects don’t have to.

[Want to learn the basics about Transition to Practice? Read an introduction to the Trusted CI Cybersecurity Technology Transition to Practice (TTP) program >>] 

CILogon enables researchers to log on to cyberinfrastructure (CI). CILogon provides an integrated open source identity and access management platform for research collaborations, combining federated identity management (Shibboleth, InCommon) with collaborative organization management (COmanage).

Jim Basney is a senior research scientist, cybersecurity division, National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign. Jim is also deputy director for Trusted CI. We spoke with Jim about CILogon and about its transition to practice.

TRUSTED CI: Please tell us about the scope of your work, and how CILogon fits into that.

I'm here in the security group at NCSA. We are focused on enabling secure access to computational resources for scientists.

One aspect of that is working with Trusted CI. In my role as the deputy director for Trusted CI, I help researchers with their cybersecurity challenges. That includes identity and access management but also cybersecurity policies, data management, and operational security topics -- a wide range of cybersecurity topics.

Outside of my Trusted CI work, I mainly focus on the topic of identity and access management. CILogon is one of the projects that I work on in that context.

I also work on a related project called SciTokens which is about using JSON Web Tokens for access to scientific cyberinfrastructure.

We are integrating the research that's coming out of the SciTokens project into the CILogon service.

TRUSTED CI: How will that help CILogon?

It's going to give researchers more options for authorizing access to the variety of scientific services that they're using. Right now, CILogon is providing ID tokens that identify the researcher. This allows research collaborations to do attribute-based access control and identity-based access control using the researcher’s login.

SciTokens also adds capability-based access control so that you can have a least-privilege access control policy based on a potentially complex set of policy rules to say, “Yes, you are authorized to access this file” or “You're authorized to access this cloud resource or this space on the wiki.” It does not need to be based on your individual identity.

TRUSTED CI: Users can get lots of information on the CILogon website. Tell us in your own words what you see as the primary benefit and what value it brings to users.

Our goal is to enable logon to scientific cyberinfrastructure. We want to make it seamless for researchers to access the cyberinfrastructure that they need to conduct their research and their scientific collaborations.

Part of making that seamless is we want researchers to be able to use their existing identities. In most cases that's a campus identity through their campus identity provider. That could be part of the InCommon Federation or globally part of the eduGAIN interfederation service, in many cases using the open source Shibboleth single sign-on software. But it could also be identities from other providers like Google or GitHub or ORCID.

In addition to enabling that logon, we want to enable the providers of cyberinfrastructure to manage the access to those resources through onboarding and offboarding procedures that control how researchers log on; the duration of the collaboration; the ability to set collaboration-specific attributes, groups, and roles; and to do that in one place so that researchers have a consistent level of access across all the different cyberinfrastructure services that they're using.

Enabling that consistency means that we need to provide a service that supports many APIs and protocols for integrating identity management with the variety of research applications that the scientists need to use.

In CILogon, we support a long list of standards including OpenID Connect, OAuth, JSON Web Tokens, SAML [Security Assertion Markup Language], LDAP, certificates, and public keys.

We provide all these capabilities in a nonprofit, open-source, reliable, hosted software-as-a-service offering from NCSA, which manages our resources, contracting, and subscription process.

The goal of providing it as a service is that we understand that identity and access management software is fairly complex to operate, so we have a team on the CILogon project with the needed operational experience. We provide that as a service to a variety of research projects so they don't have to become experts in the software themselves -- they can just rely on us.

Institutions can make it available to the research projects that their researchers are part of. Because we're using standards like SAML, Shibboleth, and the InCommon Federation, we connect with what the institutions are doing because so many institutions in the US and around the world are part of these academic research and education federations.

We are compatible with the identity and access management services that are already on campus, and we're providing the glue to make that work with research cyberinfrastructure.

TRUSTED CI: Can you give some specific examples or scenarios of the kind of infrastructure you're describing; who might be connecting to that and why?

First, I'll talk about different types of applications.

We see in different science projects that scientists may use a science gateway, which is a web portal that hosts a variety of science applications and data through a web interface. They may be logging in to an HPC cluster to submit a large simulation. They may create a Jupyter Notebook to develop their reproducible workflow for their scientific work. They may be posting results and having discussions on wikis or mailing lists. They might also be developing services and deploying them on Kubernetes. These are some of the services that we get requests to integrate with a common identity and access management system.

LIGO [Laser Interferometer Gravitational-Wave Observatory] is an example of a scientific collaboration that uses many of these services and is a CILogon subscriber. LIGO is an international collaboration making it possible for the researchers that are part of that collaboration to access all of these different applications in a convenient way. This means that they can get access to the signals from the scientific instrument so that they can quickly analyze those results and publish their scientific results in a collaborative and secure way.

We're focused on the academic research and scholarship use case and that's a very broad set of researchers -- thousands of researchers on thousands of campuses across the US and many more globally.

On one end of the scale, we serve the research project that is only one or two investigators with some grad students on one campus. Then on the other end of the scale are international collaborations that may have thousands of participants. By offering a software-as-a-service platform that has these common integration points and is easy to get connected to, we intend to make it easy both for the small projects and larger projects to take advantage of the services.

TRUSTED CI: Do they pay for this service?

We have a free tier and then we have paid tiers that provide additional functionality and that also provide the contracted service-level agreements that especially the larger research projects depend on.

TRUSTED CI: Any restrictions on your target audience? In other words, do you have to be a US facility to be a paid client or a free client or could it be any other country?

It's not restricted to US facilities or just to NSF projects. Our requirement is that you do need to be focused on academic research. We're not serving the commercial research space.

In part, our target audience is meant to be compatible with what's called the REFEDS Research and Scholarship Entity category. That's an internationally recognized identity management policy about information sharing between academic institutions to support research using Federated Identity. That really enables all the work that we do with CILogon.

It's very important for us to stay within the bounds of that policy focused on the academic research use case.

TRUSTED CI: Do you have many international users?

Yes. We currently have about 8,000 active users each month and a significant percentage of those users are international. For example, we have over 100 active users from CERN [the European Organization for Nuclear Research]. We also see users from Germany, the UK, Italy, the Czech Republic, South Korea, Australia, and elsewhere.

TRUSTED CI: Anything else our readers need to know that is not documented on the website?

Everything should be documented on the CILogon website, and users can log in right from there.

TRUSTED CI: Talk a bit more about your support structure and particularly the paid tiers.

We have three tiers that are described on the website where your readers can find more details.

We call the no-charge tier our basic authentication tier. As the name implies, it's just providing our authentication service without any group management or attribute management -- just a basic authentication service with best-effort support.

The first paid tier is called Essential Collaboration Management. That adds the collaboration support -- the onboarding and offboarding, groups, attributes, and roles that are managed through open source software called COmanage. We publish that information into an LDAP directory and a SAML attribute authority providing multiple standard interfaces to the information about the researcher’s role in the collaboration. When a collaboration subscribes to that tier, that gives them the ability to manage that information about their collaboration in our environment.

The full-service tier includes all those capabilities plus it adds the SciTokens capability and adds Grouper for advanced access management and also provides dedicated service instances for more customized capabilities and improved performance.

TRUSTED CI: What is the chronology of CILogon?

CILogon grew out of NSF grants back in 2004 called GridShib for grid computing and Shibboleth. Combining those two technologies, we've built up the capability thanks to several NSF grants over the years, along with a Department of Energy grant. We had our first CILogon award from NSF in 2009 but we built that using software that was developed from the 2004 GridShib award [NSF award 0438385]. CILogon went live in 2010 with the free service tier.

In 2019, we transitioned from grant funding to the subscription funding model. We're now in our second year of subscription funding support.

Except for some core operational support that we get from XSEDE [the Extreme Science and Engineering Discovery Environment], which is really critical for the sustainability of that free tier, we are fully subscriber-funded.

TRUSTED CI: Are there other collaborators that you want to mention?

Scott Koranda is my co-PI. Scott works for a company called Spherical Cow Group. And of course, none of this would be possible without InCommon.

TRUSTED CI: Are there other things you've spawned from CILogon that are adding additional value?

Grouper and COmanage are existing products that we integrated into the CILogon service offering. Out of CILogon, SciTokens is one example where we spun off research building on some of the existing CILogon technology, developed new capabilities, and are bringing it back into the CILogon operational service.

TRUSTED CI: Is the software available to others?

All of our software is open source and published on GitHub.

The RCauth.eu service in Europe is an example of offering similar services using our open source software. Other large infrastructure providers can take the software and operate it themselves if they’d like, though we believe there is significant value provided by the CILogon operational team through our software-as-a-service offering.
___
This material is based upon work supported by the National Science Foundation under grant numbers 0850557, 0943633, 1053575, 1440609, 1547268, and 1548562 and by the Department of Energy under award number DE-SC0008597. CILogon operations is supported by subscribers.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the United States Government or any agency thereof.