Appendix: License Documentation in Python and Other Projects
Abstract
There are multiple ways used or recommended to document licenses. This document contains the results of a comprehensive survey of license documentation in Python and other languages.
The key takeaways from the survey, which have guided the recommendations of PEP 639, are as follows:
- Most package formats use a single
License
field. - Many modern package systems use some form of license expression to optionally combine more than one license identifier together. SPDX and SPDX-like syntaxes are the most popular in use.
- SPDX license identifiers are becoming the de facto way to reference common licenses everywhere, whether or not a full license expression syntax is used.
- Several package formats support documenting both a license expression and the paths of the corresponding files that contain the license text. Most Free and Open Source Software licenses require package authors to include their full text in a Distribution Package.
License Documentation in Python
Core Metadata
There are two overlapping Core Metadata fields to document a license: the
license Classifier
strings prefixed with License ::
and the License
field as free text.
The Core Metadata License
field documentation is currently:
License
=======
.. versionadded:: 1.0
Text indicating the license covering the distribution where the license
is not a selection from the "License" Trove classifiers. See
:ref:`"Classifier" <metadata-classifier>` below.
This field may also be used to specify a
particular version of a license which is named via the ``Classifier``
field, or to indicate a variation or exception to such a license.
Examples::
License: This software may only be obtained by sending the
author a postcard, and then the user promises not
to redistribute it.
License: GPL version 3, excluding DRM provisions
Even though there are two fields, it is at times difficult to convey anything but simpler licensing. For instance, some classifiers lack precision (GPL without a version) and when multiple license classifiers are listed, it is not clear if both licenses must apply, or the user may choose between them. Furthermore, the list of available license classifiers is rather limited and out-of-date.
Setuptools and Wheel
Beyond a license code or qualifier, license text files are documented and included in a built package either implicitly or explicitly, and this is another possible source of confusion:
- In the Setuptools and Wheel projects,
license files are automatically added to the distribution (at their source
location in a source distribution/sdist, and in the
.dist-info
directory of a built wheel) if they match one of a number of common license file name patterns (LICEN[CS]E*
,COPYING*
,NOTICE*
andAUTHORS*
). Alternatively, a package author can specify a list of license file paths to include in the built wheel under thelicense_files
key in the[metadata]
section of the project’ssetup.cfg
, or as an argument to thesetuptools.setup()
function. At present, following the Wheel project’s lead, Setuptools flattens the collected license files into the metadata directory, clobbering files with the same name, and dumps license files directly into the top-level.dist-info
directory, but there is a desire to resolve both these issues, contingent on PEP 639 being accepted. - Both tools also support an older, singular
license_file
parameter that allows specifying only one license file to add to the distribution, which has been deprecated for some time but still sees some use. - Following the publication of an earlier draft of PEP 639, Setuptools
added support for
License-File
in distribution metadata as described in this specification. This allows other tools consuming the resulting metadata to unambiguously locate the license file(s) for a given package.
PyPA Packaging Guide and Sample Project
Both the PyPA beginner packaging tutorial and its more
comprehensive packaging guide state that it is
important that every package include a license file. They point to the
LICENSE.txt
in the official PyPA sample project as an example, which is
explicitly listed under the license_files
key in
its setup.cfg
, following existing practice formally specified by PEP 639.
Both the beginner packaging tutorial and the
sample project only use classifiers to declare a
package’s license, and do not include or mention the License
field.
The full packaging guide does mention this field, but
states that authors should use the license classifiers instead, unless the
project uses a non-standard license (which the guide discourages).
Python source code files
Note: Documenting licenses in source code is not in the scope of PEP 639.
Beside using comments and/or SPDX-License-Identifier
conventions, the
license is sometimes documented in Python code files using
a “dunder” module-level constant, typically named __license__
.
This convention, while perhaps somewhat antiquated, is recognized by the
built-in help()
function and the standard pydoc
module.
The dunder variable will show up in the help()
DATA section for a module.
License Documentation in Other Projects
Linux distribution packages
Note: in most cases, the texts of the most common licenses are included
globally in a shared documentation directory (e.g. /usr/share/doc
).
- Debian documents package licenses with machine readable copyright files. It defines its own license expression syntax and list of identifiers for common licenses, both of which are closely related to those of SPDX.
- Fedora packages specify how to include License Texts and use a License field that must be filled with appropriate short license identifier(s) from an extensive list of “Good Licenses”. Fedora uses SPDX license expression syntax.
- OpenSUSE packages use SPDX license expressions with SPDX license IDs and a list of additional license identifiers.
- Gentoo ebuild uses a
LICENSE
variable. This field is specified in GLEP-0023 and in the Gentoo development manual. Gentoo also defines a list of allowed licenses and a license expression syntax, which is rather different from SPDX. - The FreeBSD package Makefile provides
LICENSE
andLICENSE_FILE
fields with a list of custom license symbols. For non-standard licenses, FreeBSD recommends usingLICENSE=UNKNOWN
and addingLICENSE_NAME
andLICENSE_TEXT
fields, as well as sophisticatedLICENSE_PERMS
to qualify the license permissions andLICENSE_GROUPS
to document a license grouping. TheLICENSE_COMB
allows documenting more than one license and how they apply together, forming a custom license expression syntax. FreeBSD also recommends the use ofSPDX-License-Identifier
in source code files. - Arch Linux PKGBUILD defines its
own license identifiers.
The value
'unknown'
can be used if the license is not defined. - OpenWRT ipk packages use the
PKG_LICENSE
andPKG_LICENSE_FILES
variables and recommend the use of SPDX License identifiers. - NixOS uses SPDX identifiers and some extra license IDs in its license field.
- GNU Guix (based on NixOS) has a single License field, uses its own license symbols list and specifies how to use one license or a list of them.
- Alpine Linux packages recommend using SPDX identifiers in the license field.
Language and application packages
- In Java, Maven POM defines a
licenses
XML tag with a list of licenses, each with a name, URL, comments and “distribution” type. This is not mandatory, and the content of each field is not specified. - The JavaScript NPM package.json uses a single license field with
a SPDX license expression, or the
UNLICENSED
ID if none is specified. A license file can be referenced as an alternative usingSEE LICENSE IN <filename>
in the singlelicense
field. - Rubygems gemspec specifies either a single or list of license strings. The relationship between multiple licenses in a list is not specified. They recommend using SPDX license identifiers.
- CPAN Perl modules use a single license field, which is either a
single or a list of strings. The relationship between the licenses in
a list is not specified. There is a list of custom license identifiers plus
these generic identifiers:
open_source
,restricted
,unrestricted
,unknown
. - Rust Cargo specifies the use of an SPDX license expression
(v2.1) in the
license
field. It also supports an alternative expression syntax using slash-separated SPDX license identifiers, and there is also alicense_file
field. The crates.io package registry requires that eitherlicense
orlicense_file
fields are set when uploading a package. - PHP composer.json uses a
license
field with an SPDX license ID orproprietary
. Thelicense
field is either a single string with resembling the SPDX license expression syntax withand
andor
keywords; or is a list of strings if there is a (disjunctive) choice of licenses. - NuGet packages previously used only a simple license URL, but now specify using a SPDX license expression and/or the path to a license file within the package. The NuGet.org repository states that they only accept license expressions that are “approved by the Open Source Initiative or the Free Software Foundation.”
- Go language modules
go.mod
have no provision for any metadata beyond dependencies. Licensing information is left for code authors and other community package managers to document. - The Dart/Flutter spec recommends using a single
LICENSE
file that should contain all the license texts, each separated by a line with 80 hyphens. - The JavaScript Bower
license
field is either a single string or list of strings using either SPDX license identifiers, or a path/URL to a license file. - The Cocoapods podspec
license
field is either a single string, or a mapping withtype
,file
andtext
keys. This is mandatory unless there is aLICENSE
/LICENCE
file provided. - Haskell Cabal accepts an SPDX license expression since
version 2.2. The version of the SPDX license list used is a function of
the Cabal version. The specification also provides a mapping between
legacy (pre-SPDX) and SPDX license Identifiers. Cabal also specifies a
license-file(s)
field that lists license files to be installed with the package. - Erlang/Elixir mix/hex package specifies a
licenses
field as a required list of license strings, and recommends using SPDX license identifiers. - D Langanguage dub packages define their own list of license identifiers and license expression syntax, similar to the SPDX standard.
- The R Package DESCRIPTION defines its own sophisticated license
expression syntax and list of licenses identifiers. R has a unique way of
supporting specifiers for license versions (such as
LGPL (>= 2.0, < 3)
) in its license expression syntax.
Other ecosystems
- The
SPDX-License-Identifier
header is a simple convention to document the license inside a file. - The Free Software Foundation (FSF) promotes the use of SPDX license identifiers for clarity in the GPL and other versioned free software licenses.
- The Free Software Foundation Europe (FSFE) REUSE project
promotes using
SPDX-License-Identifier
. - The Linux kernel uses
SPDX-License-Identifier
and parts of the FSFE REUSE conventions to document its licenses. - U-Boot spearheaded using
SPDX-License-Identifier
in code and now follows the Linux approach. - The Apache Software Foundation projects use RDF DOAP with a single license field pointing to SPDX license identifiers.
- The Eclipse Foundation promotes using
SPDX-license-Identifiers
. - The ClearlyDefined project promotes using SPDX license identifiers and expressions to improve license clarity.
- The Android Open Source Project uses
MODULE_LICENSE_XXX
empty tag files, whereXXX
is a license code such asBSD
,APACHE
,GPL
, etc. It also uses aNOTICE
file that contains license and notice texts.