PEP: 545 Title: Python Documentation Translations Author: Julien Palard
<julien@palard.fr>, Inada Naoki <songofacandy@gmail.com>, Victor Stinner
<vstinner@python.org> Status: Final Type: Process Content-Type:
text/x-rst Created: 04-Mar-2017 Resolution:
https://mail.python.org/pipermail/python-dev/2017-May/147957.html

Abstract

The intent of this PEP is to make existing translations of the Python
Documentation more accessible and discoverable. By doing so, we hope to
attract and motivate new translators and new translations.

Translated documentation will be hosted on python.org. Examples of two
active translation teams:

-   http://docs.python.org/fr/: French
-   http://docs.python.org/ja/: Japanese

http://docs.python.org/en/ will redirect to http://docs.python.org/.

Sources of translated documentation will be hosted in the Python
organization on GitHub: https://github.com/python/. Contributors will
have to accept a Documentation Contribution Agreement.

Motivation

On the French #python-fr IRC channel on freenode, it's not rare to meet
people who don't speak English and so are unable to read the Python
official documentation. Python wants to be widely available to all users
in any language: this is also why Python 3 supports any non-ASCII
identifiers: 3131#rationale

There are at least 4 groups of people who are translating the Python
documentation to their native language (French[1][2][3], Japanese[4][5],
Spanish[6], Hungarian[7][8]) even though their translations are not
visible on d.p.o. Other, less visible and less organized groups, are
also translating the documentation, we've heard of Russian[9], Chinese
and Korean. Others we haven't found yet might also exist. This PEP
defines rules describing how to move translations on docs.python.org so
they can easily be found by developers, newcomers and potential
translators.

The Japanese team has (as of March 2017) translated ~80% of the
documentation, the French team ~20%. French translation went from 6% to
23% in 2016[10] with 7 contributors[11], proving a translation team can
be faster than the rate the documentation mutates.

Quoting Xiang Zhang about Chinese translations:

  I have seen several groups trying to translate part of our official
  doc. But their efforts are disperse and quickly become lost because
  they are not organized to work towards a single common result and
  their results are hold anywhere on the Web and hard to find. An
  official one could help ease the pain.

Rationale

Translation

Issue tracker

Considering that issues opened about translations may be written in the
translation language, which can be considered noise but at least is
inconsistent, issues should be placed outside bugs.python.org (b.p.o).

As all translation must have their own GitHub project (see Repository
for Po Files), they must use the associated GitHub issue tracker.

Considering the noise induced by translation issues redacted in any
languages which may beyond every warnings land in b.p.o, triage will
have to be done. Considering that translations already exist and are not
actually a source of noise in b.p.o, an unmanageable amount of work is
not to be expected. Considering that Xiang Zhang and Victor Stinner are
already triaging, and Julien Palard is willing to help on this task,
noise on b.p.o is not to be expected.

Also, language team coordinators (see Language Team) should help with
triaging b.p.o by properly indicating, in the language of the issue
author if required, the right issue tracker.

Branches

Translation teams should focus on last stable versions, and use tools
(scripts, translation memory, …) to automatically translate what is done
in one branch to other branches.

Note

Translation memories are a kind of database of previously translated
paragraphs, even removed ones. See also Sphinx Internationalization.

The three currently stable branches that will be translated are[12]:
2.7, 3.5, and 3.6. The scripts to build the documentation of older
branches needs to be modified to support translation[13], whereas these
branches now only accept security-only fixes.

The development branch (main) should have a lower translation priority
than stable branches. But docsbuild-scripts should build it anyway so it
is possible for a team to work on it to be ready for the next release.

Hosting

Domain Name, Content negotiation and URL

Different translations can be identified by changing one of the
following: Country Code Top Level Domain (CCTLD), path segment,
subdomain or content negotiation.

Buying a CCTLD for each translations is expensive, time-consuming, and
sometimes almost impossible when already registered, this solution
should be avoided.

Using subdomains like "es.docs.python.org" or "docs.es.python.org" is
possible but confusing ("is it es.docs.python.org or
docs.es.python.org?"). Hyphens in subdomains like pt-br.doc.python.org
is uncommon and SEOMoz[14] correlated the presence of hyphens as a
negative factor. Usage of underscores in subdomain is prohibited by the
1123, section 2.1. Finally, using subdomains means creating TLS
certificates for each language. This not only requires more maintenance
but will also cause issues in language switcher if, as for version
switcher, we want a preflight to check if the translation exists in the
given version: preflight will probably be blocked by same-origin-policy.
Wildcard TLS certificates are very expensive.

Using content negotiation (HTTP headers Accept-Language in the request
and Vary: Accept-Language) leads to a bad user experience where they
can't easily change the language. According to Mozilla: "This header is
a hint to be used when the server has no way of determining the language
via another way, like a specific URL, that is controlled by an explicit
user decision."[15]. As we want to be able to easily change the
language, we should not use the content negotiation as a main language
determination, so we need something else.

Last solution is to use the URL path, which looks readable, allows for
an easy switch from a language to another, and nicely accepts hyphens.
Typically something like: "docs.python.org/de/" or, by using a hyphen:
"docs.python.org/pt-BR/".

As for the version, sphinx-doc does not support compiling for multiple
languages, so we'll have full builds rooted under a path, exactly like
we're already doing with versions.

So we can have "docs.python.org/de/3.6/" or "docs.python.org/3.6/de/". A
question that arises is: "Does the language contain multiple versions or
does the version contain multiple languages?". As versions exist in any
case and translations for a given version may or may not exist, we may
prefer "docs.python.org/3.6/de/", but doing so scatters languages
everywhere. Having "/de/3.6/" is clearer, meaning: "everything under
/de/ is written in German". Having the version at the end is also a
habit taken by readers of the documentation: they like to easily change
the version by changing the end of the path.

So we should use the following pattern:
"docs.python.org/LANGUAGE_TAG/VERSION/".

The current documentation is not moved to "/en/", instead
"docs.python.org/en/" will redirect to "docs.python.org".

Language Tag

A common notation for language tags is the IETF Language Tag <5646> [16]
based on ISO 639, although gettext uses ISO 639 tags with underscores
(ex: pt_BR) instead of dashes to join tags[17] (ex: pt-BR). Examples of
IETF Language Tags: fr (French), ja (Japanese), pt-BR (Orthographic
formulation of 1943 -Official in Brazil).

It is more common to see dashes instead of underscores in URLs[18], so
we should use IETF language tags, even if sphinx uses gettext
internally: URLs are not meant to leak the underlying implementation.

It's uncommon to see capitalized letters in URLs, and docs.python.org
doesn't use any, so it may hurt readability by attracting the eye on it,
like in: "https://docs.python.org/pt-BR/3.6/library/stdtypes.html".
5646#section-2.1.1 (Tags for Identifying Languages (IETF)) section-2.1
states that tags are not case sensitive. As the RFC allows lower case,
and it enhances readability, we should use lowercased tags like pt-br.

We may drop the region subtag when it does not add distinguishing
information, for example: "de-DE" or "fr-FR". (Although it might make
sense, respectively meaning "German as spoken in Germany" and "French as
spoken in France"). But when the region subtag actually adds
information, for example "pt-BR" or "Portuguese as spoken in Brazil", it
should be kept.

So we should use IETF language tags, lowercased, like /fr/, /pt-br/,
/de/ and so on.

Fetching And Building Translations

Currently docsbuild-scripts are building the documentation[19]. These
scripts should be modified to fetch and build translations.

Building new translations is like building new versions so, while we're
adding complexity it is not that much.

Two steps should be configurable distinctively: Building a new language,
and adding it to the language switcher. This allows a transition step
between "we accepted the language" and "it is translated enough to be
made public". During this step, translators can review their
modifications on d.p.o without having to build the documentation
locally.

From the translation repositories, only the .po files should be opened
by the docsbuild-script to keep the attack surface and probable bug
sources at a minimum. This means no translation can patch sphinx to
advertise their translation tool. (This specific feature should be
handled by sphinx anyway[20]).

Community

Mailing List

The doc-sig mailing list will be used to discuss cross-language changes
on translated documentation.

There is also the i18n-sig list but it's more oriented towards i18n APIs
[21] than translating the Python documentation.

Chat

Due to the Python community being highly active on IRC, we should create
a new IRC channel on freenode, typically #python-doc for consistency
with the mailing list name.

Each language coordinator can organize their own team, even by choosing
another chat system if the local usage asks for it. As local teams will
write in their native languages, we don't want each team in a single
channel. It's also natural for the local teams to reuse their local
channels like "#python-fr" for French translators.

Repository for PO Files

Considering that each translation team may want to use different
translation tools, and that those tools should easily be synchronized
with git, all translations should expose their .po files via a git
repository.

Considering that each translation will be exposed via git repositories,
and that Python has migrated to GitHub, translations will be hosted on
GitHub.

For consistency and discoverability, all translations should be in the
same GitHub organization and named according to a common pattern.

Given that we want translations to be official, and that Python already
has a GitHub organization, translations should be hosted as projects of
the Python GitHub organization.

For consistency, translation repositories should be called
python-docs-LANGUAGE_TAG[22], using the language tag used in paths:
without region subtag if redundant, and lowercased.

The docsbuild-scripts may enforce this rule by refusing to fetch outside
of the Python organization or a wrongly named repository.

The CLA bot may be used on the translation repositories, but with a
limited effect as local coordinators may synchronize themselves with
translations from an external tool, like transifex, and lose track of
who translated what in the process.

Versions can be hosted on different repositories, different directories
or different branches. Storing them on different repositories will
probably pollute the Python GitHub organization. As it is typical and
natural to use branches to separate versions, branches should be used to
do so.

Translation tools

Most of the translation work is actually done on Transifex[23].

Other tools may be used later like https://pontoon.mozilla.org/ and
http://zanata.org/.

Documentation Contribution Agreement

Documentation does require a license from the translator, as it involves
creativity in the expression of the ideas.

There's multiple solutions, quoting Van Lindberg from the PSF asked
about the subject:

  1.  Docs should either have the copyright assigned or be under CCO. A
      permissive software license (like Apache or MIT) would also get
      the job done, although it is not quite fit for task.
  2.  The translators should either sign an agreement or submit a
      declaration of the license with the translation.
  3.  We should have in the project page an invitation for people to
      contribute under a defined license, with acceptance defined by
      their act of contribution. Such as:

  "By posting this project on Transifex and inviting you to participate,
  we are proposing an agreement that you will provide your translation
  for the PSF's use under the CC0 license. In return, you may noted that
  you were the translator for the portion you translate. You signify
  acceptance of this agreement by submitting your work to the PSF for
  inclusion in the documentation."

It looks like having a "Documentation Contribution Agreement" is the
most simple thing we can do as we can use multiple ways (GitHub bots,
invitation page, …) in different context to ensure contributors are
agreeing with it.

Language Team

Each language team should have one coordinator responsible for:

-   Managing the team.
-   Choosing and managing the tools the team will use (chat, mailing
    list, …).
-   Ensure contributors understand and agree with the documentation
    contribution agreement.
-   Ensure quality (grammar, vocabulary, consistency, filtering spam,
    ads, …).
-   Redirect issues posted on b.p.o to the correct GitHub issue tracker
    for the language.

Alternatives

Simplified English

It would be possible to introduce a "simplified English" version like
Wikipedia did[24], as discussed on python-dev[25], targeting English
learners and children.

Pros: It yields a single translation, theoretically readable by everyone
and reviewable by current maintainers.

Cons: Subtle details may be lost, and translators from English to
English may be hard to find as stated by Wikipedia:

> The main English Wikipedia has 5 million articles, written by nearly
140K active users; the Swedish Wikipedia is almost as big, 3M articles
from only 3K active users; but the Simple English Wikipedia has just
123K articles and 871 active users. That's fewer articles than
Esperanto!

Changes

Get a Documentation Contribution Agreement

The Documentation Contribution Agreement have to be written by the PSF,
then listed at https://www.python.org/psf/contrib/ and have its own page
like https://www.python.org/psf/contrib/doc-contrib-form/.

Migrate GitHub Repositories

We (authors of this PEP) already own French and Japanese Git
repositories, so moving them to the Python documentation organization
will not be a problem. We'll however be following the New Translation
Procedure.

Setup a GitHub bot for Documentation Contribution Agreement

To help ensuring contributors from GitHub have signed the Documentation
Contribution Agreement, We can setup the "The Knights Who Say Ni" GitHub
bot customized for this agreement on the migrated repositories[26].

Patch docsbuild-scripts to Compile Translations

Docsbuild-script must be patched to:

-   List the language tags to build along with the branches to build.
-   List the language tags to display in the language switcher.
-   Find translation repositories by formatting
    github.com:python/python-docs-{language_tag}.git (See Repository for
    Po Files)
-   Build translations for each branch and each language.

Patched docsbuild-scripts must only open .po files from translation
repositories.

List coordinators in the devguide

Add a page or a section with an empty list of coordinators to the
devguide, each new coordinator will be added to this list.

Create sphinx-doc Language Switcher

Highly similar to the version switcher, a language switcher must be
implemented. This language switcher must be configurable to hide or show
a given language.

The language switcher will only have to update or add the language
segment to the path like the current version switcher does. Unlike the
version switcher, no preflight are required as destination page always
exists (translations does not add or remove pages). Untranslated (but
existing) pages still exists, they should however be rendered as so, see
Enhance Rendering of Untranslated and Fuzzy Translations.

Update sphinx-doc Version Switcher

The patch_url function of the version switcher in version_switch.js have
to be updated to understand and allow the presence of the language
segment in the path.

Enhance Rendering of Untranslated and Fuzzy Translations

It's an opened sphinx issue[27], but we'll need it so we'll have to work
on it. Translated, fuzzy, and untranslated paragraphs should be
differentiated. (Fuzzy paragraphs have to warn the reader what he's
reading may be out of date.)

New Translation Procedure

Designate a Coordinator

The first step is to designate a coordinator, see Language Team, The
coordinator must sign the CLA.

The coordinator should be added to the list of translation coordinators
on the devguide.

Create GitHub Repository

Create a repository named "python-docs-{LANGUAGE_TAG}" (IETF language
tag, without redundant region subtag, with a dash, and lowercased.) on
the Python GitHub organization (See Repository For Po Files.), and grant
the language coordinator push rights to this repository.

Setup the Documentation Contribution Agreement

The README file should clearly show the following Documentation
Contribution Agreement:

    NOTE REGARDING THE LICENSE FOR TRANSLATIONS: Python's documentation is
    maintained using a global network of volunteers. By posting this
    project on Transifex, GitHub, and other public places, and inviting
    you to participate, we are proposing an agreement that you will
    provide your improvements to Python's documentation or the translation
    of Python's documentation for the PSF's use under the CC0 license
    (available at
    `https://creativecommons.org/publicdomain/zero/1.0/legalcode`_). In
    return, you may publicly claim credit for the portion of the
    translation you contributed and if your translation is accepted by the
    PSF, you may (but are not required to) submit a patch including an
    appropriate annotation in the Misc/ACKS or TRANSLATORS file. Although
    nothing in this Documentation Contribution Agreement obligates the PSF
    to incorporate your textual contribution, your participation in the
    Python community is welcomed and appreciated.

    You signify acceptance of this agreement by submitting your work to
    the PSF for inclusion in the documentation.

Add support for translations in docsbuild-scripts

As soon as the translation hits its first commits, update the
docsbuild-scripts configuration to build the translation (but not
displaying it in the language switcher).

Add Translation to the Language Switcher

As soon as the translation hits:

-   100% of bugs.html with proper links to the language repository issue
    tracker.
-   100% of tutorial.
-   100% of library/functions (builtins).

the translation can be added to the language switcher.

Previous Discussions

[Python-ideas] Cross link documentation translations (January, 2016)

[Python-Dev] Translated Python documentation (February 2016)

[Python-ideas] https://docs.python.org/fr/ ? (March 2016)

References

[2] [Doc-SIG] Localization of Python docs
(https://mail.python.org/pipermail/doc-sig/2013-September/003948.html)

Copyright

This document has been placed in the public domain.

[1] French translation (https://www.afpy.org/doc/python/)

[2] French translation on Gitea
(https://git.afpy.org/AFPy/python-docs-fr)

[3] French mailing list
(http://lists.afpy.org/mailman/listinfo/traductions)

[4] Japanese translation (http://docs.python.jp/3/)

[5] Japanese translation on GitHub
(https://github.com/python-doc-ja/python-doc-ja)

[6] Spanish translation
(https://docs.python.org/es/3/tutorial/index.html)

[7] Python-oktató
(http://web.archive.org/web/20170526080729/http://harp.pythonanywhere.com/python_doc/tutorial/index.html)

[8] The Python-hu Archives
(https://mail.python.org/pipermail/python-hu/)

[9] Документация Python 2.7!
(http://python-lab.ru/documentation/index.html)

[10] French translation progression (https://mdk.fr/pycon2016/#/11)

[11] French translation contributors
(https://github.com/AFPy/python_doc_fr/graphs/contributors?from=2016-01-01&to=2016-12-31&type=c)

[12] Passing options to sphinx from Doc/Makefile
(https://github.com/python/cpython/commit/57acb82d275ace9d9d854b156611e641f68e9e7c)

[13] Passing options to sphinx from Doc/Makefile
(https://github.com/python/cpython/commit/57acb82d275ace9d9d854b156611e641f68e9e7c)

[14] Domains - SEO Best Practices | Moz
(https://moz.com/learn/seo/domain)

[15] Accept-Language
(https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language)

[16] IETF language tag (https://en.wikipedia.org/wiki/IETF_language_tag)

[17] GNU Gettext manual, section 2.3.1: Locale Names
(https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html)

[18] Semantic URL: Slug (https://en.wikipedia.org/wiki/Clean_URL#Slug)

[19] Docsbuild-scripts GitHub repository
(https://github.com/python/docsbuild-scripts/)

[20] i18n: Highlight untranslated paragraphs
(https://github.com/sphinx-doc/sphinx/issues/1246)

[21] [I18n-sig] Hello Python members, Do you have any idea about Python
documents?
(https://mail.python.org/pipermail/i18n-sig/2013-September/002130.html)

[22] [Python-Dev] Translated Python documentation: doc vs docs
(https://mail.python.org/pipermail/python-dev/2017-February/147472.html)

[23] Python-doc on Transifex
(https://www.transifex.com/python-doc/public/)

[24] Wikipedia: Simple English
(https://simple.wikipedia.org/wiki/Main_Page)

[25] Python-dev discussion about simplified English
(https://mail.python.org/pipermail/python-dev/2017-February/147446.html)

[26] [Python-Dev] PEP 545: Python Documentation Translations
(https://mail.python.org/pipermail/python-dev/2017-April/147752.html)

[27] i18n: Highlight untranslated paragraphs
(https://github.com/sphinx-doc/sphinx/issues/1246)