Some open source projects make it very simple to understand what license applies to the published works. Unfortunately, despite all of the best intentions to share with the world, determining what licensing terms apply is more complicated than just looking for the one license file. In this article, we will explore license declarations in open source code, packages, and projects, and how these may apply to your final product.
By now, if you are a developer, you have probably downloaded an open source package, played with it a bit, and found it helpful enough to use in your product. Then, you probably looked to see if the license fits within your organization’s rules. But wait – where’s the license hiding?
By definition, open source projects come with licenses. Often you discover not one, but many, and wonder what to make of it. Sometimes you can choose between multiple licenses. Other times, they refer to various subcomponents of composite open source projects. And there are times when you just can’t find the license.
While open source developers are passionate software artists (I really love those “Code Poet” shirts), when they venture in the world of licenses (read: legalese) they typically rely more on the right side of their brain. Often, they get it right, but sometimes, the context is too complex. Even for the trained professional, open source licensing terms can be subjective at times. Many licenses were not written by lawyers, and for the ones that were, they suffered through many revisions.
It is equally challenging to arrive to a conclusion as to which license or licenses apply to a package. There are many files in a package, and sometimes the licensing information associated with them is incomplete, ambiguous, or even contradictory.
Why bother with the license?
Software is like a piece of art (back to that “Code Poet” shirt). The author automatically owns the copyright, and no one can use it unless you have the author’s permission. An open source license is that blanket permission, complete with its conditions and provisos. No license, no permission to use – it is that simple.
Where to look
License information may be hiding in any and all of the files in a package. You need to collect this information, categorize it, and then synthesize it in order to come to a conclusion as to what license(s) should apply to the package. You can find most licensing information here:
- License files – files that contain the complete license text of the applicable license(s).
- License header in source files – a reference to the license, or even the full license text found in the comments of the source code files.
- License reference – a reference to the licenses and their applicability, generally found in other documentation files.
In the best possible case, you will find a clear reference to the licensing terms in each source code file, the same license file in the root folder, and a license reference in the README file. If the information all agrees, you can conclude on the licensing terms with a very high degree of certainty. Unfortunately, this does not happen very often. In a survey of Protecode’s GIPS database (for Global IP Signatures, essentially a database of all open source software in the world), only 13 percent of the open source packages in circulation contain license header information for all source files.
Most licenses require the full license text to be included in the package. Search for file names like LICENSE, LICENSE.txt, COPYING, and COPYING.txt, and ones that even contain the license name itself, such as MIT.txt. By including the full license text with the package, then it implies that at least some, if not all, of the files within the package are released under those licensing terms.
Sometimes, there could also be more than one such license file found in the package, or alternatively, more than one license in the same file. This could be the case if the author of the package intended to release the package under a choice of licenses. We see this when an author wants its package to be used in a GPL distribution, and yet offer flexibility for commercial entities that wish to use the package with fewer restrictions.
The presence of more than one license in a package may also indicate that it contains contents from other projects that have differing licensing terms. For such composite projects, it is possible that all licenses come into play for the whole package or for the individual subcomponents of the package. If the licenses permit sublicensing the main package author may opt to use a single overarching license.
The location of the license files is also important to note, as it may imply that the scope is limited to that directory. A license file located in the root directory would imply that the scope is for the entire package, where as a license file located in a subdirectory would imply a scope limited to that portion of the package, typically an indication of repackaged third party components.
License headers in source code
Most source code files will contain some comments at the top of the file, commonly known as the header. In the header, you often find copyright and license information, along with documentation about the code itself.
Some of the well-known licenses, such as GPL, have defined header templates to simplify adoption and minimize ambiguity. With GPL being a copyleft (share-alike) license, developers are more inclined to mark their “territory” and hopefully increasing participation into their projects.
According to the Protecode GIPS database, approximately 44 percent of the open source code files have license information in the header. That is a significant but not surprising number, considering that the most popular open source license is GPL (41 percent of those open source files with license headers are GPL).
When there are no license headers, you have to look for nearby license files. This also means that if the context of the file changes (for example, it is copied into another project, or moved to a different directory) then the concluded license may be different. This is why it is important for open source developers to clearly identify their licensing intentions at the file level. Moreover, you may also look deeper into the source files and find some code snippets that come from other projects with different licensing terms.
Many public repositories, such as GitHub, are now promoting the use of open source licenses for those with public repositories, but unfortunately do not promote the use of license references in each source file. As a consequence, many of the new open source projects have less clear licensing terms.
Often, there are some auxiliary files in the package, such as README build scripts, which contain further licensing details. These additional license references are indispensable for resolving ambiguity or conflicts between licenses. They may contain clarification on the will of the author, along with a project copyright statement, or some permission notice. This text is just as important as the license itself, as any additional restrictions should be considered as being part of the license itself as well.
In build files, you often find information on third party dependencies, which may help clarify the licensing information. In README and other documentation files, you will often find information on whether the presence of multiple licenses in a package has a conjunctive (and) or disjunctive (or) effect on the package.
There is also an emerging standard called the Software Package Data Exchange (SPDX) that defines a file format to be used to store licensing information about a package or set of packages. If such a SPDX file is present in the package, although very rare, then someone else has done the work for you and you can probably rely on this information.
Some packages only contain references to the project website. Unfortunately license information in websites can change over time, which is why it is best to trust the information found in the package itself.
Completing the puzzle
Now that you’ve located all the licensing information in the package, determining which license applies to a given package can still be quite the puzzle. The information you have found may be ambiguous, or perhaps even contradictory, where certain terms in the licenses can’t be reconciled.
Alternatively, if you use packages that only contain binary artifacts of open source projects into your product, the challenge does not stop there, as you don’t see the full picture unless you have access to the source code with its license declarations – it is open source after all.
If you can’t find all the source code, or are faced with ambiguous or contradictory information, your only hope at this point is to reach out to the author of the package and ask for clarification. Rectification of the package license “bug” may even be necessary. Otherwise, you can ask yourself this – if I don’t know where the license is, how can I be certain it is open source?