1. Introduction
Open source license compliance involves effort. There is not yet a simple tool or approach that will solve all aspects of license compliance for you. This process describes one approach to license compliance. It provides a framework for thinking about the major steps involved in compliance. Any and all aspects should be molded to fit your development processes.
2. Identifying Software and Dependencies
In order to take any meaningful action towards open source license compliance, the first step is to know what software and dependencies are present and relevant.
The most obvious place to start is by inspecting your own source code. Presumably, a significant part of your code will consist of code you have in fact written! But it is also common for developers to incorporate code from third parties directly into their projects. You or your fellow developers may have copied code from a third party directly into your own source code files, or copied third party files into your project directories.
This is easier to identify and manage when third-party code is stored in designated third-party directories. Some programming language ecosystems may have standardized approaches to handling third-party dependencies in project repositories — e.g. /node_modules/ or /vendor/ directories.
The scope of relevant software goes beyond just what is checked into your project repo. You may also want to examine the build-time and install-time dependencies for your project. Take a look at the applicable files for dependency manifests — e.g., POM files for Java projects, package.json files for JavaScript projects, requirements.txt for Python projects, and so on.
Also think about the full picture of how your project is bundled and shipped. In an increasingly container-centric development environment, it is easy for developers to make their project dependent on containerized third-party software that bundles together a deep set of unseen dependencies. If you are shipping this software, it is likely relevant whether it is in your repo or in a three-levels-deep base container.
3. Identifying Licenses
Just because a project has a LICENSE.txt file doesn’t mean that that is the only relevant license. Files within the project code may have different licenses. Your project may depend upon code from somebody else who didn’t pay attention to license compliance, making your job harder.
There are a variety of tools to help with this, but the easiest initial approach doesn’t require any special tooling. In your project, just search for some common license-related terms:
- grep -nri (or your favorite command line arguments)
- or press Ctrl-F (or the equivalent) in your favorite editor
Here are some terms you could search for:
- licen
- redist
- copyright
- common license fragments: bsd, gpl, general public, cddl, …
This will give you an idea of the files with licenses and third-party notices in your codebase. If you’ve gathered a list of dependencies (build-time, install-time, run-time, containers, etc.), you can also take a look at their websites or source code repositories for license information.
This is a lightweight approach for a basic look at your project’s license profile. A variety of tools exist that can help you get a more complete view of applicable licenses. There are several proprietary / commercial products and offerings. There are also a variety of open source tools, including the following:
FOSSology is a Linux Foundation project that scans for license and copyright notices contained in your project’s source code (as well as object code, documentation, images and any other content). It contains two primary scanners: a regular expression engine and a keyword bulk text matching agent. It is designed to flag for review not only exact matches to license texts, but also keywords and phrases that might be of interest in potentially identifying relevant licenses. It is most useful if you are planning to do a deep analysis of your project code’s licenses, and want to have an element of human review to clean up false positives and false negatives.
ScanCode
toolkit
ScanCode toolkit, a project started by nexB, is a command-line application that similarly scans for license and copyright notices in a source code directory. ScanCode is particularly useful when you are looking to get a fully-automated one-time analysis of a source code directory without a human review step within the tool itself.
OSS Review Toolkit
(ORT)
OSS Review Toolkit (ORT), a project started by HERE Technologies, provides several tools designed to assist in the workflow of looking across your project’s dependencies and licenses. It includes tooling to assist with analyzing dependencies, retrieving the source code for those dependencies, and using existing scanners to generate license reports. It is particularly useful if you want to focus on a deep review of license notices contained in your project’s dependencies.
Tern, a project started by VMware, is focused on identifying components and licenses that are installed by a container image or manifest. Although still in an early stage of development, it is particularly relevant if you want to understand the licenses that apply to the dependencies (and subdependencies) when you use containers for your product.
Quartermaster (QMSTR), a project started by Endocode, is focused on providing tooling to assist with open source compliance as part of a CI/CD toolchain, and analyzing the results of a project’s build process. Although still in an early stage of development, it is particularly relevant if you are looking to integrate open source license compliance into your CI/CD infrastructure.
4. Understanding Context of Uses
Now you hopefully know what third-party software your project contains or otherwise uses, and what license(s) apply to it. The next step is to take a closer look at how your project uses that software.
For example, in many open source licenses, the primary obligations — e.g. to make source code available, to provide attribution notices, etc. — are conditioned upon distribution of the software. So, to know what your obligations are, you need to know whether you are distributing the software. (Note that different licenses will have different definitions of “distribution” or similar words — or no definition at all!)
In other licenses obligations might be triggered by, for example, providing access to the software to users through a network. Or, obligations may differ depending on whether you have modified the software.
In addition, it can be highly relevant to understand the purpose for which your project uses a particular component. For example, the compliance considerations and impact on your project may be different for a dependency that provides run-time functionality, vs. a dependency that is used as a testing framework or a build tool. All open source software used by your project will have licenses that should be complied with, but the effect and extent of required actions may vary depending on the way a dependency is used.
Thus, understanding the context of uses of your software means understanding, in practice, how your project, product or service offering interacts with and uses its third-party dependencies.
Note that this likely has benefits beyond just license compliance. Understanding how your software uses its dependencies likely has security benefits as well. For instance, it may help you in evaluating whether a vulnerability in a dependency is likely to affect your project, and in determining whether you are able to remediate it easily.
5. Addressing Any Incompatibilities
After you have a picture of the applicable software and licenses, and the context of how they interact, the next step is to deal with any incompatible licenses.
What counts as an “incompatibility” is often a case-by-case consideration, depending on your project’s own goals and license(s), and the full set of dependencies you are using. As a non-exhaustive list of examples, incompatibilities might include any of the following:
- A proprietary product is using a third party proprietary library without a license.
- A proprietary product has incorporated GPL-2.0-or-later source code into its own code, in a way that subjects the proprietary product to the applicable copyleft provisions — and does not want to release its own source code under the GPL.
- A permissively-licensed open source project has incorporated GPL-2.0-or-later source code into its own code, in a way that subjects the project to the applicable copyleft provisions — and does not want to relicense the project to the GPL.
- An Apache-2.0 project, with a policy that it wants to be fully licensable under Apache-2.0, is making use of a library that is only licensed for non-commercial use.
When incompatibilities are discovered, the project or product can first evaluate whether it is able to resolve the incompatibility by relicensing part or all of its applicable source code. For example, if a project or proprietary product has sufficient rights to do so, it could decide to relicense under a copyleft license in order to resolve an incompatibility with a copyleft dependency.
Otherwise, a product or project will typically deal with an incompatibility by removing and remediating the dependency. This may require finding an alternative third-party dependency under a more compatible license, or rewriting the applicable functionality yourself (without reference to the incompatible dependency). Or, it may require the product or project to move forward without the removed functionality.
6. Communicating License Information
At this stage, the remaining steps focus on outputs that you provide to external parties. One of these outputs is license information about your project and about its dependencies. There are two reasons for this.
One reason is focused on meeting your own compliance requirements. As mentioned above, most open source licenses — whether they are permissive, copyleft, weak copyleft, etc. — include attribution requirements. Typically, these are some form of a requirement to retain or reproduce copyright and license notices from the original code. If your project is redistributing dependencies in source code form, then you may already be all set here, particularly if the dependencies’ source code itself contains those notices. If you are providing a proprietary offering without source code, then you may be including these notices in your physical or electronic documentation, and/or in a location accessible from within your offering itself. Individuals and organizations take many different approaches for how these notices are provided.
The second reason, however, is focused on improving ease of compliance for everyone. If all you are providing are copyright and license notices in a text format, this may not be especially helpful to downstream users and redistributors. They will likely be going through a compliance process similar to the one you’ve just gone through. If you can make their lives easier by providing relevant information in a machine-readable format, that can lead to improved automation of compliance efforts, increased use of your project, more contributions back to it, and a cleaner ecosystem overall.
SPDX
The SPDX project and community makes available various tools to assist in the creation, use and transformation of SPDX documents. These include the official Java-based toolset, as well as similar toolsets written in Python, Golang (for older and newer versions of the SPDX spec), and others.
Additionally, many of the tools described in the Identifying Licenses section above can generate SPDX documents as part of their scanning functionality, and some tools such as FOSSology are additionally able to import SPDX documents.
Within your own source code, another good way to assist in communicating license information is to use SPDX short-form license identifiers. These are one-line comments that you can add to the top of your source code files (as well as documentation and other files), to clearly communicate the applicable license(s) in a human- and machine-readable fashion. Several significant projects, notably including the Linux kernel, have adopted SPDX short-form license IDs in their code for this reason.
REUSE Software Initiative
The Free Software Foundation Europe’s REUSE Software initiative provides additional best practices for communicating license information, utilizing SPDX short-form license identifiers together with additional recommendations for structuring files and folders to improve the clarity and consistency of license information.
7. Providing Source Code, If Required
Finally, as described above, a key aspect of compliance with copyleft-style licenses in particular is to provide or make available the applicable source code. Again, your specific obligations here will depend both on the particular license and on the context of your uses of those dependencies. Depending on what is permitted or required under the license, you might decide to deliver the source code together with your product or service offering; or to provide a written offer to enable recipients to come to you in the future and obtain a copy of the source code; or to make it available in another permitted fashion.
If you’ve made changes to a third party dependency, also consider contributing it back upstream to the dependency’s project. You might be focusing your compliance efforts on your obligations to downstream recipients, which is fine. But by contributing your modifications back upstream, you can help minimize the challenges that come from maintaining a separate “fork” of a project. If they accept and release your changes, then you can now consume a subsequent “unmodified” version of that dependency. In addition to easing your compliance responsibilities, this will also make it much easier for you to manage security vulnerabilities and other bug fixes as they arise from time to time.
Here are a couple of useful resources relating to publishing and releasing source code:
- Publishing Source Code for FOSS Compliance: Lightweight Process and Checklists, by Ibrahim Haddad
- Practical GPL Compliance, by Armijn Hemel and Shane Coughlan