PEP: 444 Title: Python Web3 Interface Version: $Revision$ Last-Modified:
$Date$ Author: Chris McDonough <chrism@plope.com>, Armin Ronacher
<armin.ronacher@active-4.com> Discussions-To: web-sig@python.org Status:
Deferred Type: Informational Content-Type: text/x-rst Created:
19-Jul-2010

Abstract

This document specifies a proposed second-generation standard interface
between web servers and Python web applications or frameworks.

PEP Deferral

Further exploration of the concepts covered in this PEP has been
deferred for lack of a current champion interested in promoting the
goals of the PEP and collecting and incorporating feedback, and with
sufficient available time to do so effectively.

Note that since this PEP was first created, PEP 3333 was created as a
more incremental update that permitted use of WSGI on Python 3.2+.
However, an alternative specification that furthers the Python 3 goals
of a cleaner separation of binary and text data may still be valuable.

Rationale and Goals

This protocol and specification is influenced heavily by the Web
Services Gateway Interface (WSGI) 1.0 standard described in PEP 333. The
high-level rationale for having any standard that allows Python-based
web servers and applications to interoperate is outlined in PEP 333.
This document essentially uses PEP 333 as a template, and changes its
wording in various places for the purpose of forming a different
standard.

Python currently boasts a wide variety of web application frameworks
which use the WSGI 1.0 protocol. However, due to changes in the
language, the WSGI 1.0 protocol is not compatible with Python 3. This
specification describes a standardized WSGI-like protocol that lets
Python 2.6, 2.7 and 3.1+ applications communicate with web servers. Web3
is clearly a WSGI derivative; it only uses a different name than "WSGI"
in order to indicate that it is not in any way backwards compatible.

Applications and servers which are written to this specification are
meant to work properly under Python 2.6.X, Python 2.7.X and Python 3.1+.
Neither an application nor a server that implements the Web3
specification can be easily written which will work under Python 2
versions earlier than 2.6 nor Python 3 versions earlier than 3.1.

Note

Whatever Python 3 version fixed http://bugs.python.org/issue4006 so
os.environ['foo'] returns surrogates (ala PEP 383) when the value of
'foo' cannot be decoded using the current locale instead of failing with
a KeyError is the true minimum Python 3 version. In particular, however,
Python 3.0 is not supported.

Note

Python 2.6 is the first Python version that supported an alias for bytes
and the b"foo" literal syntax. This is why it is the minimum version
supported by Web3.

Explicability and documentability are the main technical drivers for the
decisions made within the standard.

Differences from WSGI

-   All protocol-specific environment names are prefixed with web3.
    rather than wsgi., eg. web3.input rather than wsgi.input.
-   All values present as environment dictionary values are explicitly
    bytes instances instead of native strings. (Environment keys however
    are native strings, always str regardless of platform).
-   All values returned by an application must be bytes instances,
    including status code, header names and values, and the body.
-   Wherever WSGI 1.0 referred to an app_iter, this specification refers
    to a body.
-   No start_response() callback (and therefore no write() callable nor
    exc_info data).
-   The readline() function of web3.input must support a size hint
    parameter.
-   The read() function of web3.input must be length delimited. A call
    without a size argument must not read more than the content length
    header specifies. In case a content length header is absent the
    stream must not return anything on read. It must never request more
    data than specified from the client.
-   No requirement for middleware to yield an empty string if it needs
    more information from an application to produce output (e.g. no
    "Middleware Handling of Block Boundaries").
-   Filelike objects passed to a "file_wrapper" must have an __iter__
    which returns bytes (never text).
-   wsgi.file_wrapper is not supported.
-   QUERY_STRING, SCRIPT_NAME, PATH_INFO values required to be placed in
    environ by server (each as the empty bytes instance if no associated
    value is received in the HTTP request).
-   web3.path_info and web3.script_name should be put into the Web3
    environment, if possible, by the origin Web3 server. When available,
    each is the original, plain 7-bit ASCII, URL-encoded variant of its
    CGI equivalent derived directly from the request URI (with %2F
    segment markers and other meta-characters intact). If the server
    cannot provide one (or both) of these values, it must omit the
    value(s) it cannot provide from the environment.
-   This requirement was removed: "middleware components must not block
    iteration waiting for multiple values from an application iterable.
    If the middleware needs to accumulate more data from the application
    before it can produce any output, it must yield an empty string."
-   SERVER_PORT must be a bytes instance (not an integer).
-   The server must not inject an additional Content-Length header by
    guessing the length from the response iterable. This must be set by
    the application itself in all situations.
-   If the origin server advertises that it has the web3.async
    capability, a Web3 application callable used by the server is
    permitted to return a callable that accepts no arguments. When it
    does so, this callable is to be called periodically by the origin
    server until it returns a non-None response, which must be a normal
    Web3 response tuple.

Specification Overview

The Web3 interface has two sides: the "server" or "gateway" side, and
the "application" or "framework" side. The server side invokes a
callable object that is provided by the application side. The specifics
of how that object is provided are up to the server or gateway. It is
assumed that some servers or gateways will require an application's
deployer to write a short script to create an instance of the server or
gateway, and supply it with the application object. Other servers and
gateways may use configuration files or other mechanisms to specify
where an application object should be imported from, or otherwise
obtained.

In addition to "pure" servers/gateways and applications/frameworks, it
is also possible to create "middleware" components that implement both
sides of this specification. Such components act as an application to
their containing server, and as a server to a contained application, and
can be used to provide extended APIs, content transformation,
navigation, and other useful functions.

Throughout this specification, we will use the term "application
callable" to mean "a function, a method, or an instance with a __call__
method". It is up to the server, gateway, or application implementing
the application callable to choose the appropriate implementation
technique for their needs. Conversely, a server, gateway, or application
that is invoking a callable must not have any dependency on what kind of
callable was provided to it. Application callables are only to be
called, not introspected upon.

The Application/Framework Side

The application object is simply a callable object that accepts one
argument. The term "object" should not be misconstrued as requiring an
actual object instance: a function, method, or instance with a __call__
method are all acceptable for use as an application object. Application
objects must be able to be invoked more than once, as virtually all
servers/gateways (other than CGI) will make such repeated requests. If
this cannot be guaranteed by the implementation of the actual
application, it has to be wrapped in a function that creates a new
instance on each call.

Note

Although we refer to it as an "application" object, this should not be
construed to mean that application developers will use Web3 as a web
programming API. It is assumed that application developers will continue
to use existing, high-level framework services to develop their
applications. Web3 is a tool for framework and server developers, and is
not intended to directly support application developers.)

An example of an application which is a function (simple_app):

    def simple_app(environ):
        """Simplest possible application object"""
        status = b'200 OK'
        headers = [(b'Content-type', b'text/plain')]
        body = [b'Hello world!\n']
        return body, status, headers

An example of an application which is an instance (simple_app):

    class AppClass(object):

        """Produce the same output, but using an instance.  An
        instance of this class must be instantiated before it is
        passed to the server.  """

      def __call__(self, environ):
            status = b'200 OK'
            headers = [(b'Content-type', b'text/plain')]
            body = [b'Hello world!\n']
            return body, status, headers

    simple_app = AppClass()

Alternately, an application callable may return a callable instead of
the tuple if the server supports asynchronous execution. See information
concerning web3.async for more information.

The Server/Gateway Side

The server or gateway invokes the application callable once for each
request it receives from an HTTP client, that is directed at the
application. To illustrate, here is a simple CGI gateway, implemented as
a function taking an application object. Note that this simple example
has limited error handling, because by default an uncaught exception
will be dumped to sys.stderr and logged by the web server.

    import locale
    import os
    import sys

    encoding = locale.getpreferredencoding()

    stdout = sys.stdout

    if hasattr(sys.stdout, 'buffer'):
        # Python 3 compatibility; we need to be able to push bytes out
        stdout = sys.stdout.buffer

    def get_environ():
        d = {}
        for k, v in os.environ.items():
            # Python 3 compatibility
            if not isinstance(v, bytes):
                # We must explicitly encode the string to bytes under
                # Python 3.1+
                v = v.encode(encoding, 'surrogateescape')
            d[k] = v
        return d

    def run_with_cgi(application):

        environ = get_environ()
        environ['web3.input']        = sys.stdin
        environ['web3.errors']       = sys.stderr
        environ['web3.version']      = (1, 0)
        environ['web3.multithread']  = False
        environ['web3.multiprocess'] = True
        environ['web3.run_once']     = True
        environ['web3.async']        = False

        if environ.get('HTTPS', b'off') in (b'on', b'1'):
            environ['web3.url_scheme'] = b'https'
        else:
            environ['web3.url_scheme'] = b'http'

        rv = application(environ)
        if hasattr(rv, '__call__'):
            raise TypeError('This webserver does not support asynchronous '
                            'responses.')
        body, status, headers = rv

        CLRF = b'\r\n'

        try:
            stdout.write(b'Status: ' + status + CRLF)
            for header_name, header_val in headers:
                stdout.write(header_name + b': ' + header_val + CRLF)
            stdout.write(CRLF)
            for chunk in body:
                stdout.write(chunk)
                stdout.flush()
        finally:
            if hasattr(body, 'close'):
                body.close()

Middleware: Components that Play Both Sides

A single object may play the role of a server with respect to some
application(s), while also acting as an application with respect to some
server(s). Such "middleware" components can perform such functions as:

-   Routing a request to different application objects based on the
    target URL, after rewriting the environ accordingly.
-   Allowing multiple applications or frameworks to run side by side in
    the same process.
-   Load balancing and remote processing, by forwarding requests and
    responses over a network.
-   Perform content postprocessing, such as applying XSL stylesheets.

The presence of middleware in general is transparent to both the
"server/gateway" and the "application/framework" sides of the interface,
and should require no special support. A user who desires to incorporate
middleware into an application simply provides the middleware component
to the server, as if it were an application, and configures the
middleware component to invoke the application, as if the middleware
component were a server. Of course, the "application" that the
middleware wraps may in fact be another middleware component wrapping
another application, and so on, creating what is referred to as a
"middleware stack".

A middleware must support asynchronous execution if possible or fall
back to disabling itself.

Here a middleware that changes the HTTP_HOST key if an X-Host header
exists and adds a comment to all html responses:

    import time

    def apply_filter(app, environ, filter_func):
        """Helper function that passes the return value from an
        application to a filter function when the results are
        ready.
        """
        app_response = app(environ)

        # synchronous response, filter now
        if not hasattr(app_response, '__call__'):
            return filter_func(*app_response)

        # asynchronous response.  filter when results are ready
        def polling_function():
            rv = app_response()
            if rv is not None:
                return filter_func(*rv)
        return polling_function

    def proxy_and_timing_support(app):
        def new_application(environ):
            def filter_func(body, status, headers):
                now = time.time()
                for key, value in headers:
                    if key.lower() == b'content-type' and \
                       value.split(b';')[0] == b'text/html':
                        # assumes ascii compatible encoding in body,
                        # but the middleware should actually parse the
                        # content type header and figure out the
                        # encoding when doing that.
                        body += ('<!-- Execution time: %.2fsec -->' %
                                 (now - then)).encode('ascii')
                        break
                return body, status, headers
            then = time.time()
            host = environ.get('HTTP_X_HOST')
            if host is not None:
                environ['HTTP_HOST'] = host

            # use the apply_filter function that applies a given filter
            # function for both async and sync responses.
            return apply_filter(app, environ, filter_func)
        return new_application

    app = proxy_and_timing_support(app)

Specification Details

The application callable must accept one positional argument. For the
sake of illustration, we have named it environ, but it is not required
to have this name. A server or gateway must invoke the application
object using a positional (not keyword) argument. (E.g. by calling
body, status, headers = application(environ) as shown above.)

The environ parameter is a dictionary object, containing CGI-style
environment variables. This object must be a builtin Python dictionary
(not a subclass, UserDict or other dictionary emulation), and the
application is allowed to modify the dictionary in any way it desires.
The dictionary must also include certain Web3-required variables
(described in a later section), and may also include server-specific
extension variables, named according to a convention that will be
described below.

When called by the server, the application object must return a tuple
yielding three elements: status, headers and body, or, if supported by
an async server, an argumentless callable which either returns None or a
tuple of those three elements.

The status element is a status in bytes of the form b'999 Message here'.

headers is a Python list of (header_name, header_value) pairs describing
the HTTP response header. The headers structure must be a literal Python
list; it must yield two-tuples. Both header_name and header_value must
be bytes values.

The body is an iterable yielding zero or more bytes instances. This can
be accomplished in a variety of ways, such as by returning a list
containing bytes instances as body, or by returning a generator function
as body that yields bytes instances, or by the body being an instance of
a class which is iterable. Regardless of how it is accomplished, the
application object must always return a body iterable yielding zero or
more bytes instances.

The server or gateway must transmit the yielded bytes to the client in
an unbuffered fashion, completing the transmission of each set of bytes
before requesting another one. (In other words, applications should
perform their own buffering. See the Buffering and Streaming section
below for more on how application output must be handled.)

The server or gateway should treat the yielded bytes as binary byte
sequences: in particular, it should ensure that line endings are not
altered. The application is responsible for ensuring that the string(s)
to be written are in a format suitable for the client. (The server or
gateway may apply HTTP transfer encodings, or perform other
transformations for the purpose of implementing HTTP features such as
byte-range transmission. See Other HTTP Features, below, for more
details.)

If the body iterable returned by the application has a close() method,
the server or gateway must call that method upon completion of the
current request, whether the request was completed normally, or
terminated early due to an error. This is to support resource release by
the application amd is intended to complement PEP 325's generator
support, and other common iterables with close() methods.

Finally, servers and gateways must not directly use any other attributes
of the body iterable returned by the application.

environ Variables

The environ dictionary is required to contain various CGI environment
variables, as defined by the Common Gateway Interface specification[1].

The following CGI variables must be present. Each key is a native
string. Each value is a bytes instance.

Note

In Python 3.1+, a "native string" is a str type decoded using the
surrogateescape error handler, as done by os.environ.__getitem__. In
Python 2.6 and 2.7, a "native string" is a str types representing a set
of bytes.

REQUEST_METHOD

    The HTTP request method, such as "GET" or "POST".

SCRIPT_NAME

    The initial portion of the request URL's "path" that corresponds to
    the application object, so that the application knows its virtual
    "location". This may be the empty bytes instance if the application
    corresponds to the "root" of the server. SCRIPT_NAME will be a bytes
    instance representing a sequence of URL-encoded segments separated
    by the slash character (/). It is assumed that %2F characters will
    be decoded into literal slash characters within PATH_INFO, as per
    CGI.

PATH_INFO

    The remainder of the request URL's "path", designating the virtual
    "location" of the request's target within the application. This may
    be a bytes instance if the request URL targets the application root
    and does not have a trailing slash. PATH_INFO will be a bytes
    instance representing a sequence of URL-encoded segments separated
    by the slash character (/). It is assumed that %2F characters will
    be decoded into literal slash characters within PATH_INFO, as per
    CGI.

QUERY_STRING

    The portion of the request URL (in bytes) that follows the "?", if
    any, or the empty bytes instance.

SERVER_NAME, SERVER_PORT

    When combined with SCRIPT_NAME and PATH_INFO (or their raw
    equivalents), these variables can be used to complete the URL. Note,
    however, that HTTP_HOST, if present, should be used in preference to
    SERVER_NAME for reconstructing the request URL. See the URL
    Reconstruction section below for more detail. SERVER_PORT should be
    a bytes instance, not an integer.

SERVER_PROTOCOL

    The version of the protocol the client used to send the request.
    Typically this will be something like "HTTP/1.0" or "HTTP/1.1" and
    may be used by the application to determine how to treat any HTTP
    request headers. (This variable should probably be called
    REQUEST_PROTOCOL, since it denotes the protocol used in the request,
    and is not necessarily the protocol that will be used in the
    server's response. However, for compatibility with CGI we have to
    keep the existing name.)

The following CGI values may present be in the Web3 environment. Each
key is a native string. Each value is a bytes instances.

CONTENT_TYPE

    The contents of any Content-Type fields in the HTTP request.

CONTENT_LENGTH

    The contents of any Content-Length fields in the HTTP request.

HTTP_ Variables

    Variables corresponding to the client-supplied HTTP request headers
    (i.e., variables whose names begin with "HTTP_"). The presence or
    absence of these variables should correspond with the presence or
    absence of the appropriate HTTP header in the request.

A server or gateway should attempt to provide as many other CGI
variables as are applicable, each with a string for its key and a bytes
instance for its value. In addition, if SSL is in use, the server or
gateway should also provide as many of the Apache SSL environment
variables[2] as are applicable, such as HTTPS=on and SSL_PROTOCOL. Note,
however, that an application that uses any CGI variables other than the
ones listed above are necessarily non-portable to web servers that do
not support the relevant extensions. (For example, web servers that do
not publish files will not be able to provide a meaningful DOCUMENT_ROOT
or PATH_TRANSLATED.)

A Web3-compliant server or gateway should document what variables it
provides, along with their definitions as appropriate. Applications
should check for the presence of any variables they require, and have a
fallback plan in the event such a variable is absent.

Note that CGI variable values must be bytes instances, if they are
present at all. It is a violation of this specification for a CGI
variable's value to be of any type other than bytes. On Python 2, this
means they will be of type str. On Python 3, this means they will be of
type bytes.

They keys of all CGI and non-CGI variables in the environ, however, must
be "native strings" (on both Python 2 and Python 3, they will be of type
str).

In addition to the CGI-defined variables, the environ dictionary may
also contain arbitrary operating-system "environment variables", and
must contain the following Web3-defined variables.

+-------------------+-------------------------------------------------+
| Variable          | Value                                           |
+===================+=================================================+
| web3.version      | The tuple (1, 0), representing Web3 version     |
|                   | 1.0.                                            |
+-------------------+-------------------------------------------------+
| web3.url_scheme   | A bytes value representing the "scheme" portion |
|                   | of the URL at which the application is being    |
|                   | invoked. Normally, this will have the value     |
|                   | b"http" or b"https", as appropriate.            |
+-------------------+-------------------------------------------------+
| web3.input        | An input stream (file-like object) from which   |
|                   | bytes constituting the HTTP request body can be |
|                   | read. (The server or gateway may perform reads  |
|                   | on-demand as requested by the application, or   |
|                   | it may pre- read the client's request body and  |
|                   | buffer it in-memory or on disk, or use any      |
|                   | other technique for providing such an input     |
|                   | stream, according to its preference.)           |
+-------------------+-------------------------------------------------+
| web3.errors       | An output stream (file-like object) to which    |
|                   | error output text can be written, for the       |
|                   | purpose of recording program or other errors in |
|                   | a standardized and possibly centralized         |
|                   | location. This should be a "text mode" stream;  |
|                   | i.e., applications should use "\n" as a line    |
|                   | ending, and assume that it will be converted to |
|                   | the correct line ending by the server/gateway.  |
|                   | Applications may not send bytes to the 'write'  |
|                   | method of this stream; they may only send text. |
|                   |                                                 |
|                   | For many servers, web3.errors will be the       |
|                   | server's main error log. Alternatively, this    |
|                   | may be sys.stderr, or a log file of some sort.  |
|                   | The server's documentation should include an    |
|                   | explanation of how to configure this or where   |
|                   | to find the recorded output. A server or        |
|                   | gateway may supply different error streams to   |
|                   | different applications, if this is desired.     |
+-------------------+-------------------------------------------------+
| web3.multithread  | This value should evaluate true if the          |
|                   | application object may be simultaneously        |
|                   | invoked by another thread in the same process,  |
|                   | and should evaluate false otherwise.            |
+-------------------+-------------------------------------------------+
| web3.multiprocess | This value should evaluate true if an           |
|                   | equivalent application object may be            |
|                   | simultaneously invoked by another process, and  |
|                   | should evaluate false otherwise.                |
+-------------------+-------------------------------------------------+
| web3.run_once     | This value should evaluate true if the server   |
|                   | or gateway expects (but does not guarantee!)    |
|                   | that the application will only be invoked this  |
|                   | one time during the life of its containing      |
|                   | process. Normally, this will only be true for a |
|                   | gateway based on CGI (or something similar).    |
+-------------------+-------------------------------------------------+
| web3.script_name  | The non-URL-decoded SCRIPT_NAME value. Through  |
|                   | a historical inequity, by virtue of the CGI     |
|                   | specification, SCRIPT_NAME is present within    |
|                   | the environment as an already URL-decoded       |
|                   | string. This is the original URL-encoded value  |
|                   | derived from the request URI. If the server     |
|                   | cannot provide this value, it must omit it from |
|                   | the environ.                                    |
+-------------------+-------------------------------------------------+
| web3.path_info    | The non-URL-decoded PATH_INFO value. Through a  |
|                   | historical inequity, by virtue of the CGI       |
|                   | specification, PATH_INFO is present within the  |
|                   | environment as an already URL-decoded string.   |
|                   | This is the original URL-encoded value derived  |
|                   | from the request URI. If the server cannot      |
|                   | provide this value, it must omit it from the    |
|                   | environ.                                        |
+-------------------+-------------------------------------------------+
| web3.async        | This is True if the webserver supports async    |
|                   | invocation. In that case an application is      |
|                   | allowed to return a callable instead of a tuple |
|                   | with the response. The exact semantics are not  |
|                   | specified by this specification.                |
+-------------------+-------------------------------------------------+

Finally, the environ dictionary may also contain server-defined
variables. These variables should have names which are native strings,
composed of only lower-case letters, numbers, dots, and underscores, and
should be prefixed with a name that is unique to the defining server or
gateway. For example, mod_web3 might define variables with names like
mod_web3.some_variable.

Input Stream

The input stream (web3.input) provided by the server must support the
following methods:

  Method              Notes
  ------------------- -------
  read(size)          1,4
  readline([size])    1,2,4
  readlines([size])   1,3,4
  __iter__()          4

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1.  The server is not required to read past the client's specified
    Content-Length, and is allowed to simulate an end-of-file condition
    if the application attempts to read past that point. The application
    should not attempt to read more data than is specified by the
    CONTENT_LENGTH variable.
2.  The implementation must support the optional size argument to
    readline().
3.  The application is free to not supply a size argument to
    readlines(), and the server or gateway is free to ignore the value
    of any supplied size argument.
4.  The read, readline and __iter__ methods must return a bytes
    instance. The readlines method must return a sequence which contains
    instances of bytes.

The methods listed in the table above must be supported by all servers
conforming to this specification. Applications conforming to this
specification must not use any other methods or attributes of the input
object. In particular, applications must not attempt to close this
stream, even if it possesses a close() method.

The input stream should silently ignore attempts to read more than the
content length of the request. If no content length is specified the
stream must be a dummy stream that does not return anything.

Error Stream

The error stream (web3.errors) provided by the server must support the
following methods:

  Method            Stream   Notes
  ----------------- -------- -------
  flush()           errors   1
  write(str)        errors   2
  writelines(seq)   errors   2

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1.  Since the errors stream may not be rewound, servers and gateways are
    free to forward write operations immediately, without buffering. In
    this case, the flush() method may be a no-op. Portable applications,
    however, cannot assume that output is unbuffered or that flush() is
    a no-op. They must call flush() if they need to ensure that output
    has in fact been written. (For example, to minimize intermingling of
    data from multiple processes writing to the same error log.)
2.  The write() method must accept a string argument, but needn't
    necessarily accept a bytes argument. The writelines() method must
    accept a sequence argument that consists entirely of strings, but
    needn't necessarily accept any bytes instance as a member of the
    sequence.

The methods listed in the table above must be supported by all servers
conforming to this specification. Applications conforming to this
specification must not use any other methods or attributes of the errors
object. In particular, applications must not attempt to close this
stream, even if it possesses a close() method.

Values Returned by A Web3 Application

Web3 applications return a tuple in the form (status, headers, body). If
the server supports asynchronous applications (web3.async), the response
may be a callable object (which accepts no arguments).

The status value is assumed by a gateway or server to be an HTTP
"status" bytes instance like b'200 OK' or b'404 Not Found'. That is, it
is a string consisting of a Status-Code and a Reason-Phrase, in that
order and separated by a single space, with no surrounding whitespace or
other characters. (See 2616, Section 6.1.1 for more information.) The
string must not contain control characters, and must not be terminated
with a carriage return, linefeed, or combination thereof.

The headers value is assumed by a gateway or server to be a literal
Python list of (header_name, header_value) tuples. Each header_name must
be a bytes instance representing a valid HTTP header field-name (as
defined by 2616, Section 4.2), without a trailing colon or other
punctuation. Each header_value must be a bytes instance and must not
include any control characters, including carriage returns or linefeeds,
either embedded or at the end. (These requirements are to minimize the
complexity of any parsing that must be performed by servers, gateways,
and intermediate response processors that need to inspect or modify
response headers.)

In general, the server or gateway is responsible for ensuring that
correct headers are sent to the client: if the application omits a
header required by HTTP (or other relevant specifications that are in
effect), the server or gateway must add it. For example, the HTTP Date:
and Server: headers would normally be supplied by the server or gateway.
The gateway must however not override values with the same name if they
are emitted by the application.

(A reminder for server/gateway authors: HTTP header names are
case-insensitive, so be sure to take that into consideration when
examining application-supplied headers!)

Applications and middleware are forbidden from using HTTP/1.1
"hop-by-hop" features or headers, any equivalent features in HTTP/1.0,
or any headers that would affect the persistence of the client's
connection to the web server. These features are the exclusive province
of the actual web server, and a server or gateway should consider it a
fatal error for an application to attempt sending them, and raise an
error if they are supplied as return values from an application in the
headers structure. (For more specifics on "hop-by-hop" features and
headers, please see the Other HTTP Features section below.)

Dealing with Compatibility Across Python Versions

Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+
requires some care on the part of the developer. In general, the Web3
specification assumes a certain level of equivalence between the Python
2 str type and the Python 3 bytes type. For example, under Python 2, the
values present in the Web3 environ will be instances of the str type; in
Python 3, these will be instances of the bytes type. The Python 3 bytes
type does not possess all the methods of the Python 2 str type, and some
methods which it does possess behave differently than the Python 2 str
type. Effectively, to ensure that Web3 middleware and applications work
across Python versions, developers must do these things:

1)  Do not assume comparison equivalence between text values and bytes
    values. If you do so, your code may work under Python 2, but it will
    not work properly under Python 3. For example, don't write
    somebytes == 'abc'. This will sometimes be true on Python 2 but it
    will never be true on Python 3, because a sequence of bytes never
    compares equal to a string under Python 3. Instead, always compare a
    bytes value with a bytes value, e.g. "somebytes == b'abc'". Code
    which does this is compatible with and works the same in Python 2.6,
    2.7, and 3.1. The b in front of 'abc' signals to Python 3 that the
    value is a literal bytes instance; under Python 2 it's a forward
    compatibility placebo.
2)  Don't use the __contains__ method (directly or indirectly) of items
    that are meant to be byteslike without ensuring that its argument is
    also a bytes instance. If you do so, your code may work under Python
    2, but it will not work properly under Python 3. For example,
    'abc' in somebytes' will raise a TypeError under Python 3, but it
    will return True under Python 2.6 and 2.7. However,
    b'abc' in somebytes will work the same on both versions. In Python
    3.2, this restriction may be partially removed, as it's rumored that
    bytes types may obtain a __mod__ implementation.
3)  __getitem__ should not be used.
4)  Don't try to use the format method or the __mod__ method of
    instances of bytes (directly or indirectly). In Python 2, the str
    type which we treat equivalently to Python 3's bytes supports these
    method but actual Python 3's bytes instances don't support these
    methods. If you use these methods, your code will work under Python
    2, but not under Python 3.
5)  Do not try to concatenate a bytes value with a string value. This
    may work under Python 2, but it will not work under Python 3. For
    example, doing 'abc' + somebytes will work under Python 2, but it
    will result in a TypeError under Python 3. Instead, always make sure
    you're concatenating two items of the same type, e.g.
    b'abc' + somebytes.

Web3 expects byte values in other places, such as in all the values
returned by an application.

In short, to ensure compatibility of Web3 application code between
Python 2 and Python 3, in Python 2, treat CGI and server variable values
in the environment as if they had the Python 3 bytes API even though
they actually have a more capable API. Likewise for all stringlike
values returned by a Web3 application.

Buffering and Streaming

Generally speaking, applications will achieve the best throughput by
buffering their (modestly-sized) output and sending it all at once. This
is a common approach in existing frameworks: the output is buffered in a
StringIO or similar object, then transmitted all at once, along with the
response headers.

The corresponding approach in Web3 is for the application to simply
return a single-element body iterable (such as a list) containing the
response body as a single string. This is the recommended approach for
the vast majority of application functions, that render HTML pages whose
text easily fits in memory.

For large files, however, or for specialized uses of HTTP streaming
(such as multipart "server push"), an application may need to provide
output in smaller blocks (e.g. to avoid loading a large file into
memory). It's also sometimes the case that part of a response may be
time-consuming to produce, but it would be useful to send ahead the
portion of the response that precedes it.

In these cases, applications will usually return a body iterator (often
a generator-iterator) that produces the output in a block-by-block
fashion. These blocks may be broken to coincide with multipart
boundaries (for "server push"), or just before time-consuming tasks
(such as reading another block of an on-disk file).

Web3 servers, gateways, and middleware must not delay the transmission
of any block; they must either fully transmit the block to the client,
or guarantee that they will continue transmission even while the
application is producing its next block. A server/gateway or middleware
may provide this guarantee in one of three ways:

1.  Send the entire block to the operating system (and request that any
    O/S buffers be flushed) before returning control to the application,
    OR
2.  Use a different thread to ensure that the block continues to be
    transmitted while the application produces the next block.
3.  (Middleware only) send the entire block to its parent
    gateway/server.

By providing this guarantee, Web3 allows applications to ensure that
transmission will not become stalled at an arbitrary point in their
output data. This is critical for proper functioning of e.g. multipart
"server push" streaming, where data between multipart boundaries should
be transmitted in full to the client.

Unicode Issues

HTTP does not directly support Unicode, and neither does this interface.
All encoding/decoding must be handled by the application; all values
passed to or from the server must be of the Python 3 type bytes or
instances of the Python 2 type str, not Python 2 unicode or Python 3 str
objects.

All "bytes instances" referred to in this specification must:

-   On Python 2, be of type str.
-   On Python 3, be of type bytes.

All "bytes instances" must not :

-   On Python 2, be of type unicode.
-   On Python 3, be of type str.

The result of using a textlike object where a byteslike object is
required is undefined.

Values returned from a Web3 app as a status or as response headers must
follow 2616 with respect to encoding. That is, the bytes returned must
contain a character stream of ISO-8859-1 characters, or the character
stream should use 2047 MIME encoding.

On Python platforms which do not have a native bytes-like type (e.g.
IronPython, etc.), but instead which generally use textlike strings to
represent bytes data, the definition of "bytes instance" can be changed:
their "bytes instances" must be native strings that contain only code
points representable in ISO-8859-1 encoding (\u0000 through \u00FF,
inclusive). It is a fatal error for an application on such a platform to
supply strings containing any other Unicode character or code point.
Similarly, servers and gateways on those platforms must not supply
strings to an application containing any other Unicode characters.

HTTP 1.1 Expect/Continue

Servers and gateways that implement HTTP 1.1 must provide transparent
support for HTTP 1.1's "expect/continue" mechanism. This may be done in
any of several ways:

1.  Respond to requests containing an Expect: 100-continue request with
    an immediate "100 Continue" response, and proceed normally.
2.  Proceed with the request normally, but provide the application with
    a web3.input stream that will send the "100 Continue" response
    if/when the application first attempts to read from the input
    stream. The read request must then remain blocked until the client
    responds.
3.  Wait until the client decides that the server does not support
    expect/continue, and sends the request body on its own. (This is
    suboptimal, and is not recommended.)

Note that these behavior restrictions do not apply for HTTP 1.0
requests, or for requests that are not directed to an application
object. For more information on HTTP 1.1 Expect/Continue, see 2616,
sections 8.2.3 and 10.1.1.

Other HTTP Features

In general, servers and gateways should "play dumb" and allow the
application complete control over its output. They should only make
changes that do not alter the effective semantics of the application's
response. It is always possible for the application developer to add
middleware components to supply additional features, so server/gateway
developers should be conservative in their implementation. In a sense, a
server should consider itself to be like an HTTP "gateway server", with
the application being an HTTP "origin server". (See 2616, section 1.3,
for the definition of these terms.)

However, because Web3 servers and applications do not communicate via
HTTP, what 2616 calls "hop-by-hop" headers do not apply to Web3 internal
communications. Web3 applications must not generate any
"hop-by-hop" headers <2616#section-13.5.1>, attempt to use HTTP features
that would require them to generate such headers, or rely on the content
of any incoming "hop-by-hop" headers in the environ dictionary. Web3
servers must handle any supported inbound "hop-by-hop" headers on their
own, such as by decoding any inbound Transfer-Encoding, including
chunked encoding if applicable.

Applying these principles to a variety of HTTP features, it should be
clear that a server may handle cache validation via the If-None-Match
and If-Modified-Since request headers and the Last-Modified and ETag
response headers. However, it is not required to do this, and the
application should perform its own cache validation if it wants to
support that feature, since the server/gateway is not required to do
such validation.

Similarly, a server may re-encode or transport-encode an application's
response, but the application should use a suitable content encoding on
its own, and must not apply a transport encoding. A server may transmit
byte ranges of the application's response if requested by the client,
and the application doesn't natively support byte ranges. Again,
however, the application should perform this function on its own if
desired.

Note that these restrictions on applications do not necessarily mean
that every application must reimplement every HTTP feature; many HTTP
features can be partially or fully implemented by middleware components,
thus freeing both server and application authors from implementing the
same features over and over again.

Thread Support

Thread support, or lack thereof, is also server-dependent. Servers that
can run multiple requests in parallel, should also provide the option of
running an application in a single-threaded fashion, so that
applications or frameworks that are not thread-safe may still be used
with that server.

Implementation/Application Notes

Server Extension APIs

Some server authors may wish to expose more advanced APIs, that
application or framework authors can use for specialized purposes. For
example, a gateway based on mod_python might wish to expose part of the
Apache API as a Web3 extension.

In the simplest case, this requires nothing more than defining an
environ variable, such as mod_python.some_api. But, in many cases, the
possible presence of middleware can make this difficult. For example, an
API that offers access to the same HTTP headers that are found in
environ variables, might return different data if environ has been
modified by middleware.

In general, any extension API that duplicates, supplants, or bypasses
some portion of Web3 functionality runs the risk of being incompatible
with middleware components. Server/gateway developers should not assume
that nobody will use middleware, because some framework developers
specifically organize their frameworks to function almost entirely as
middleware of various kinds.

So, to provide maximum compatibility, servers and gateways that provide
extension APIs that replace some Web3 functionality, must design those
APIs so that they are invoked using the portion of the API that they
replace. For example, an extension API to access HTTP request headers
must require the application to pass in its current environ, so that the
server/gateway may verify that HTTP headers accessible via the API have
not been altered by middleware. If the extension API cannot guarantee
that it will always agree with environ about the contents of HTTP
headers, it must refuse service to the application, e.g. by raising an
error, returning None instead of a header collection, or whatever is
appropriate to the API.

These guidelines also apply to middleware that adds information such as
parsed cookies, form variables, sessions, and the like to environ.
Specifically, such middleware should provide these features as functions
which operate on environ, rather than simply stuffing values into
environ. This helps ensure that information is calculated from environ
after any middleware has done any URL rewrites or other environ
modifications.

It is very important that these "safe extension" rules be followed by
both server/gateway and middleware developers, in order to avoid a
future in which middleware developers are forced to delete any and all
extension APIs from environ to ensure that their mediation isn't being
bypassed by applications using those extensions!

Application Configuration

This specification does not define how a server selects or obtains an
application to invoke. These and other configuration options are highly
server-specific matters. It is expected that server/gateway authors will
document how to configure the server to execute a particular application
object, and with what options (such as threading options).

Framework authors, on the other hand, should document how to create an
application object that wraps their framework's functionality. The user,
who has chosen both the server and the application framework, must
connect the two together. However, since both the framework and the
server have a common interface, this should be merely a mechanical
matter, rather than a significant engineering effort for each new
server/framework pair.

Finally, some applications, frameworks, and middleware may wish to use
the environ dictionary to receive simple string configuration options.
Servers and gateways should support this by allowing an application's
deployer to specify name-value pairs to be placed in environ. In the
simplest case, this support can consist merely of copying all operating
system-supplied environment variables from os.environ into the environ
dictionary, since the deployer in principle can configure these
externally to the server, or in the CGI case they may be able to be set
via the server's configuration files.

Applications should try to keep such required variables to a minimum,
since not all servers will support easy configuration of them. Of
course, even in the worst case, persons deploying an application can
create a script to supply the necessary configuration values:

    from the_app import application

    def new_app(environ):
        environ['the_app.configval1'] = b'something'
        return application(environ)

But, most existing applications and frameworks will probably only need a
single configuration value from environ, to indicate the location of
their application or framework-specific configuration file(s). (Of
course, applications should cache such configuration, to avoid having to
re-read it upon each invocation.)

URL Reconstruction

If an application wishes to reconstruct a request's complete URL (as a
bytes object), it may do so using the following algorithm:

    host = environ.get('HTTP_HOST')

    scheme = environ['web3.url_scheme']
    port = environ['SERVER_PORT']
    query = environ['QUERY_STRING']

    url = scheme + b'://'

    if host:
        url += host
    else:
        url += environ['SERVER_NAME']

        if scheme == b'https':
            if port != b'443':
               url += b':' + port
        else:
            if port != b'80':
               url += b':' + port

    if 'web3.script_name' in url:
        url += url_quote(environ['web3.script_name'])
    else:
        url += environ['SCRIPT_NAME']
    if 'web3.path_info' in environ:
        url += url_quote(environ['web3.path_info'])
    else:
        url += environ['PATH_INFO']
    if query:
        url += b'?' + query

Note that such a reconstructed URL may not be precisely the same URI as
requested by the client. Server rewrite rules, for example, may have
modified the client's originally requested URL to place it in a
canonical form.

Open Questions

-   file_wrapper replacement. Currently nothing is specified here but
    it's clear that the old system of in-band signalling is broken if it
    does not provide a way to figure out as a middleware in the process
    if the response is a file wrapper.

Points of Contention

Outlined below are potential points of contention regarding this
specification.

WSGI 1.0 Compatibility

Components written using the WSGI 1.0 specification will not
transparently interoperate with components written using this
specification. That's because the goals of this proposal and the goals
of WSGI 1.0 are not directly aligned.

WSGI 1.0 is obliged to provide specification-level backwards
compatibility with versions of Python between 2.2 and 2.7. This
specification, however, ditches Python 2.5 and lower compatibility in
order to provide compatibility between relatively recent versions of
Python 2 (2.6 and 2.7) as well as relatively recent versions of Python 3
(3.1).

It is currently impossible to write components which work reliably under
both Python 2 and Python 3 using the WSGI 1.0 specification, because the
specification implicitly posits that CGI and server variable values in
the environ and values returned via start_response represent a sequence
of bytes that can be addressed using the Python 2 string API. It posits
such a thing because that sort of data type was the sensible way to
represent bytes in all Python 2 versions, and WSGI 1.0 was conceived
before Python 3 existed.

Python 3's str type supports the full API provided by the Python 2 str
type, but Python 3's str type does not represent a sequence of bytes, it
instead represents text. Therefore, using it to represent environ values
also requires that the environ byte sequence be decoded to text via some
encoding. We cannot decode these bytes to text (at least in any way
where the decoding has any meaning other than as a tunnelling mechanism)
without widening the scope of WSGI to include server and gateway
knowledge of decoding policies and mechanics. WSGI 1.0 never concerned
itself with encoding and decoding. It made statements about allowable
transport values, and suggested that various values might be best
decoded as one encoding or another, but it never required a server to
perform any decoding before

Python 3 does not have a stringlike type that can be used instead to
represent bytes: it has a bytes type. A bytes type operates quite a bit
like a Python 2 str in Python 3.1+, but it lacks behavior equivalent to
str.__mod__ and its iteration protocol, and containment, sequence
treatment, and equivalence comparisons are different.

In either case, there is no type in Python 3 that behaves just like the
Python 2 str type, and a way to create such a type doesn't exist because
there is no such thing as a "String ABC" which would allow a suitable
type to be built. Due to this design incompatibility, existing WSGI 1.0
servers, middleware, and applications will not work under Python 3, even
after they are run through 2to3.

Existing Web-SIG discussions about updating the WSGI specification so
that it is possible to write a WSGI application that runs in both Python
2 and Python 3 tend to revolve around creating a specification-level
equivalence between the Python 2 str type (which represents a sequence
of bytes) and the Python 3 str type (which represents text). Such an
equivalence becomes strained in various areas, given the different roles
of these types. An arguably more straightforward equivalence exists
between the Python 3 bytes type API and a subset of the Python 2 str
type API. This specification exploits this subset equivalence.

In the meantime, aside from any Python 2 vs. Python 3 compatibility
issue, as various discussions on Web-SIG have pointed out, the WSGI 1.0
specification is too general, providing support (via .write) for
asynchronous applications at the expense of implementation complexity.
This specification uses the fundamental incompatibility between WSGI 1.0
and Python 3 as a natural divergence point to create a specification
with reduced complexity by changing specialized support for asynchronous
applications.

To provide backwards compatibility for older WSGI 1.0 applications, so
that they may run on a Web3 stack, it is presumed that Web3 middleware
will be created which can be used "in front" of existing WSGI 1.0
applications, allowing those existing WSGI 1.0 applications to run under
a Web3 stack. This middleware will require, when under Python 3, an
equivalence to be drawn between Python 3 str types and the bytes values
represented by the HTTP request and all the attendant encoding-guessing
(or configuration) it implies.

Note

Such middleware might in the future, instead of drawing an equivalence
between Python 3 str and HTTP byte values, make use of a
yet-to-be-created "ebytes" type (aka "bytes-with-benefits"),
particularly if a String ABC proposal is accepted into the Python core
and implemented.

Conversely, it is presumed that WSGI 1.0 middleware will be created
which will allow a Web3 application to run behind a WSGI 1.0 stack on
the Python 2 platform.

Environ and Response Values as Bytes

Casual middleware and application writers may consider the use of bytes
as environment values and response values inconvenient. In particular,
they won't be able to use common string formatting functions such as
('%s' % bytes_val) or bytes_val.format('123') because bytes don't have
the same API as strings on platforms such as Python 3 where the two
types differ. Likewise, on such platforms, stdlib HTTP-related API
support for using bytes interchangeably with text can be spotty. In
places where bytes are inconvenient or incompatible with library APIs,
middleware and application writers will have to decode such bytes to
text explicitly. This is particularly inconvenient for middleware
writers: to work with environment values as strings, they'll have to
decode them from an implied encoding and if they need to mutate an
environ value, they'll then need to encode the value into a byte stream
before placing it into the environ. While the use of bytes by the
specification as environ values might be inconvenient for casual
developers, it provides several benefits.

Using bytes types to represent HTTP and server values to an application
most closely matches reality because HTTP is fundamentally a
bytes-oriented protocol. If the environ values are mandated to be
strings, each server will need to use heuristics to guess about the
encoding of various values provided by the HTTP environment. Using all
strings might increase casual middleware writer convenience, but will
also lead to ambiguity and confusion when a value cannot be decoded to a
meaningful non-surrogate string.

Use of bytes as environ values avoids any potential for the need for the
specification to mandate that a participating server be informed of
encoding configuration parameters. If environ values are treated as
strings, and so must be decoded from bytes, configuration parameters may
eventually become necessary as policy clues from the application
deployer. Such a policy would be used to guess an appropriate decoding
strategy in various circumstances, effectively placing the burden for
enforcing a particular application encoding policy upon the server. If
the server must serve more than one application, such configuration
would quickly become complex. Many policies would also be impossible to
express declaratively.

In reality, HTTP is a complicated and legacy-fraught protocol which
requires a complex set of heuristics to make sense of. It would be nice
if we could allow this protocol to protect us from this complexity, but
we cannot do so reliably while still providing to application writers a
level of control commensurate with reality. Python applications must
often deal with data embedded in the environment which not only must be
parsed by legacy heuristics, but does not conform even to any existing
HTTP specification. While these eventualities are unpleasant, they crop
up with regularity, making it impossible and undesirable to hide them
from application developers, as application developers are the only
people who are able to decide upon an appropriate action when an HTTP
specification violation is detected.

Some have argued for mixed use of bytes and string values as environ
values. This proposal avoids that strategy. Sole use of bytes as environ
values makes it possible to fit this specification entirely in one's
head; you won't need to guess about which values are strings and which
are bytes.

This protocol would also fit in a developer's head if all environ values
were strings, but this specification doesn't use that strategy. This
will likely be the point of greatest contention regarding the use of
bytes. In defense of bytes: developers often prefer protocols with
consistent contracts, even if the contracts themselves are suboptimal.
If we hide encoding issues from a developer until a value that contains
surrogates causes problems after it has already reached beyond the I/O
boundary of their application, they will need to do a lot more work to
fix assumptions made by their application than if we were to just
present the problem much earlier in terms of "here's some bytes, you
decode them". This is also a counter-argument to the "bytes are
inconvenient" assumption: while presenting bytes to an application
developer may be inconvenient for a casual application developer who
doesn't care about edge cases, they are extremely convenient for the
application developer who needs to deal with complex, dirty
eventualities, because use of bytes allows him the appropriate level of
control with a clear separation of responsibility.

If the protocol uses bytes, it is presumed that libraries will be
created to make working with bytes-only in the environ and within return
values more pleasant; for example, analogues of the WSGI 1.0 libraries
named "WebOb" and "Werkzeug". Such libraries will fill the gap between
convenience and control, allowing the spec to remain simple and regular
while still allowing casual authors a convenient way to create Web3
middleware and application components. This seems to be a reasonable
alternative to baking encoding policy into the protocol, because many
such libraries can be created independently from the protocol, and
application developers can choose the one that provides them the
appropriate levels of control and convenience for a particular job.

Here are some alternatives to using all bytes:

-   Have the server decode all values representing CGI and server
    environ values into strings using the latin-1 encoding, which is
    lossless. Smuggle any undecodable bytes within the resulting string.
-   Encode all CGI and server environ values to strings using the utf-8
    encoding with the surrogateescape error handler. This does not work
    under any existing Python 2.
-   Encode some values into bytes and other values into strings, as
    decided by their typical usages.

Applications Should be Allowed to Read web3.input Past CONTENT_LENGTH

At[3], Graham Dumpleton makes the assertion that wsgi.input should be
required to return the empty string as a signifier of out-of-data, and
that applications should be allowed to read past the number of bytes
specified in CONTENT_LENGTH, depending only upon the empty string as an
EOF marker. WSGI relies on an application "being well behaved and once
all data specified by CONTENT_LENGTH is read, that it processes the data
and returns any response. That same socket connection could then be used
for a subsequent request." Graham would like WSGI adapters to be
required to wrap raw socket connections: "this wrapper object will need
to count how much data has been read, and when the amount of data
reaches that as defined by CONTENT_LENGTH, any subsequent reads should
return an empty string instead." This may be useful to support chunked
encoding and input filters.

web3.input Unknown Length

There's no documented way to indicate that there is content in
environ['web3.input'], but the content length is unknown.

read() of web3.input Should Support No-Size Calling Convention

At[4], Graham Dumpleton makes the assertion that the read() method of
wsgi.input should be callable without arguments, and that the result
should be "all available request content". Needs discussion.

Comment Armin: I changed the spec to require that from an
implementation. I had too much pain with that in the past already. Open
for discussions though.

Input Filters should set environ CONTENT_LENGTH to -1

At[5], Graham Dumpleton suggests that an input filter might set
environ['CONTENT_LENGTH'] to -1 to indicate that it mutated the input.

headers as Literal List of Two-Tuples

Why do we make applications return a headers structure that is a literal
list of two-tuples? I think the iterability of headers needs to be
maintained while it moves up the stack, but I don't think we need to be
able to mutate it in place at all times. Could we loosen that
requirement?

Comment Armin: Strong yes

Removed Requirement that Middleware Not Block

This requirement was removed: "middleware components must not block
iteration waiting for multiple values from an application iterable. If
the middleware needs to accumulate more data from the application before
it can produce any output, it must yield an empty string." This
requirement existed to support asynchronous applications and servers
(see PEP 333's "Middleware Handling of Block Boundaries"). Asynchronous
applications are now serviced explicitly by web3.async capable protocol
(a Web3 application callable may itself return a callable).

web3.script_name and web3.path_info

These values are required to be placed into the environment by an origin
server under this specification. Unlike SCRIPT_NAME and PATH_INFO, these
must be the original URL-encoded variants derived from the request URI.
We probably need to figure out how these should be computed originally,
and what their values should be if the server performs URL rewriting.

Long Response Headers

Bob Brewer notes on Web-SIG[6]:

  Each header_value must not include any control characters, including
  carriage returns or linefeeds, either embedded or at the end. (These
  requirements are to minimize the complexity of any parsing that must
  be performed by servers, gateways, and intermediate response
  processors that need to inspect or modify response headers.) (PEP 333)

That's understandable, but HTTP headers are defined as (mostly) *TEXT,
and "words of *TEXT MAY contain characters from character sets other
than ISO-8859-1 only when encoded according to the rules of 2047."[7]
And 2047 specifies that "an 'encoded-word' may not be more than 75
characters long... If it is desirable to encode more text than will fit
in an 'encoded-word' of 75 characters, multiple 'encoded-word's
(separated by CRLF SPACE) may be used."[8] This satisfies HTTP header
folding rules, as well: "Header fields can be extended over multiple
lines by preceding each extra line with at least one SP or HT." (PEP
333)

So in my reading of HTTP, some code somewhere should introduce newlines
in longish, encoded response header values. I see three options:

1.  Keep things as they are and disallow response header values if they
    contain words over 75 chars that are outside the ISO-8859-1
    character set.
2.  Allow newline characters in WSGI response headers.
3.  Require/strongly suggest WSGI servers to do the encoding and folding
    before sending the value over HTTP.

Request Trailers and Chunked Transfer Encoding

When using chunked transfer encoding on request content, the RFCs allow
there to be request trailers. These are like request headers but come
after the final null data chunk. These trailers are only available when
the chunked data stream is finite length and when it has all been read
in. Neither WSGI nor Web3 currently supports them.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] The Common Gateway Interface Specification, v 1.1, 3rd Draft
(https://datatracker.ietf.org/doc/html/draft-coar-cgi-v11-03)

[2] mod_ssl Reference, "Environment Variables"
(http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)

[3] Details on WSGI 1.0 amendments/clarifications.
(http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html)

[4] Details on WSGI 1.0 amendments/clarifications.
(http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html)

[5] Details on WSGI 1.0 amendments/clarifications.
(http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html)

[6] [Web-SIG] WSGI and long response header values
https://mail.python.org/pipermail/web-sig/2006-September/002244.html

[7] The Common Gateway Interface Specification, v 1.1, 3rd Draft
(https://datatracker.ietf.org/doc/html/draft-coar-cgi-v11-03)

[8] "Chunked Transfer Coding" -- HTTP/1.1, 2616#section-3.6.1