Python Web 服务器网关接口（原文附译文）

原文地址：PEP-3333

译文项目：

Preface for Readers of PEP 333(致PEP333读者的前言)

Abstract(摘要)

Original Rationale and Goals(from PEP 333)(原理和目标 (来自 PEP 333))

Specification Overview(规范概述)

A Note on String Types(一条关于字符串类型的笔记)

Native strings (which are always implemented using the type named str) that are used for request/response headers and metadata 原生字符串（一般使用str类型实现）。这种字符串用在请求和响应的包头和元数据中。
Bytestrings (which are implemented using the bytes type in Python 3, and str elsewhere), that are used for the bodies of requests and responses (e.g. POST/PUT input data and HTML page outputs). 字节流字符串（在Python3中使用bytes类型实现，其他版本中使用str类型实现）。这种字符串用在请求和响应的包内容中（比如POST方法或PUT方法的输入数据以及HTML页面的输出）。

The Application/Framework Side(应用/框架端)

HELLO_WORLD = b"Hello world!\n"

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [HELLO_WORLD]

class AppClass:
    """Produce the same output, but using a class

    (Note: 'AppClass' is the "application" here, so calling it
    returns an instance of 'AppClass', which is then the iterable
    return value of the "application callable" as required by
    the spec.

    If we wanted to use *instances* of 'AppClass' as application
    objects instead, we would have to implement a '__call__'
    method, which would be invoked to execute the application,
    and we would need to create an instance for use by the
    server or gateway.
    """

    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield HELLO_WORLD

The Server/Gateway Side(服务器/网关端)

import os, sys

enc, esc = sys.getfilesystemencoding(), 'surrogateescape'

def unicode_to_wsgi(u):
    # Convert an environment variable to a WSGI "bytes-as-unicode" string
    return u.encode(enc, esc).decode('iso-8859-1')

def wsgi_to_bytes(s):
    return s.encode('iso-8859-1')

def run_with_cgi(application):
    environ = {k: unicode_to_wsgi(v) for k,v in os.environ.items()}
    environ['wsgi.input']        = sys.stdin.buffer
    environ['wsgi.errors']       = sys.stderr
    environ['wsgi.version']      = (1, 0)
    environ['wsgi.multithread']  = False
    environ['wsgi.multiprocess'] = True
    environ['wsgi.run_once']     = True

    if environ.get('HTTPS', 'off') in ('on', '1'):
        environ['wsgi.url_scheme'] = 'https'
    else:
        environ['wsgi.url_scheme'] = 'http'

    headers_set = []
    headers_sent = []

    def write(data):
        out = sys.stdout.buffer

        if not headers_set:
             raise AssertionError("write() before start_response()")

        elif not headers_sent:
             # Before the first output, send the stored headers
             status, response_headers = headers_sent[:] = headers_set
             out.write(wsgi_to_bytes('Status: %s\r\n' % status))
             for header in response_headers:
                 out.write(wsgi_to_bytes('%s: %s\r\n' % header))
             out.write(wsgi_to_bytes('\r\n'))

        out.write(data)
        out.flush()

    def start_response(status, response_headers, exc_info=None):
        if exc_info:
            try:
                if headers_sent:
                    # Re-raise original exception if headers sent
                    raise exc_info[1].with_traceback(exc_info[2])
            finally:
                exc_info = None     # avoid dangling circular ref
        elif headers_set:
            raise AssertionError("Headers already set!")

        headers_set[:] = [status, response_headers]

        # Note: error checking on the headers should happen here,
        # *after* the headers are set.  That way, if an error
        # occurs, start_response can only be re-called with
        # exc_info set.

        return write

    result = application(environ, start_response)
    try:
        for data in result:
            if data:    # don't send headers until body appears
                write(data)
        if not headers_sent:
            write('')   # send headers now if body was empty
    finally:
        if hasattr(result, 'close'):
            result.close()

Middleware: Components that Play Both Sides(中间件: 两边都起作用的元素)

Routing a request to different application objects based on the target URL, after rewriting the environ accordingly. 重写上文代码中的environ之后，可以根据目标URL将请求转发到不同的应用程序对象
Allowing multiple applications or frameworks to run side-by-side in the same process 允许多个应用程序或框架在一个进程中同时运行
Load balancing and remote processing, by forwarding requests and responses over a network 通过转发请求和响应，实现负载均衡和远程处理
Perform content postprocessing, such as applying XSL stylesheets 对内容进行后期处理，比如引入XSL样式表

from piglatin import piglatin

class LatinIter:

    """Transform iterated output to piglatin, if it's okay to do so

    Note that the "okayness" can change until the application yields
    its first non-empty bytestring, so 'transform_ok' has to be a mutable
    truth value.
    """

    def __init__(self, result, transform_ok):
        if hasattr(result, 'close'):
            self.close = result.close
        self._next = iter(result).__next__
        self.transform_ok = transform_ok

    def __iter__(self):
        return self

    def __next__(self):
        if self.transform_ok:
            return piglatin(self._next())   # call must be byte-safe on Py3
        else:
            return self._next()

class Latinator:

    # by default, don't transform output
    transform = False

    def __init__(self, application):
        self.application = application

    def __call__(self, environ, start_response):

        transform_ok = []

        def start_latin(status, response_headers, exc_info=None):

            # Reset ok flag, in case this is a repeat call
            del transform_ok[:]

            for name, value in response_headers:
                if name.lower() == 'content-type' and value == 'text/plain':
                    transform_ok.append(True)
                    # Strip content-length if present, else it'll be wrong
                    response_headers = [(name, value)
                        for name, value in response_headers
                            if name.lower() != 'content-length'
                    ]
                    break

            write = start_response(status, response_headers, exc_info)

            if transform_ok:
                def write_latin(data):
                    write(piglatin(data))   # call must be byte-safe on Py3
                return write_latin
            else:
                return write

        return LatinIter(self.application(environ, start_latin), transform_ok)


# Run foo_app under a Latinator's control, using the example CGI gateway
from foo_app import foo_app
run_with_cgi(Latinator(foo_app))

Specification Deetails(规范细节)

environ Variables(环境变量)

REQUEST_METHOD
The HTTP request method, such as GET or POST. This cannot ever be an empty string, and so is always required. HTTP请求的类型，比如「GET」或者「POST」。这个不可能是空字符串，所以是必须给出的。
SCRIPT_NAME
The initial portion of the request URL's path that corresponds to the application object, so that the application knows its virtual location. This may be an empty string, if the application corresponds to the root of the server. URL请求中路径的开始部分，对应应用程序对象（？），这样应用程序就知道它的虚拟位置。如果该应用程序对应服务器的根目录的话，它可能是空字符串。
PATH_INFO
The remainder of the request URL's path, designating the virtual location of the request's target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash. URL请求中路径的剩余部分，指定请求的目标在应用程序内部的虚拟位置（？）。如果请求的目标是应用程序根目录并且没有末尾的斜杠的话，可能为空字符串。
QUERY_STRING
The portion of the request URL that follows the ?, if any. May be empty or absent. URL请求中跟在「?」后面的那部分，可能为空或不存在。
CONTENT_TYPE
The contents of any Content-Type fields in the HTTP request. May be empty or absent. HTTP请求中任何Content-Type域的内容，可能为空或不存在。
CONTENT_LENGTH
The contents of any Content-Length fields in the HTTP request. May be empty or absent. HTTP请求中任何Content-Length域的内容，可能为空或不存在。
SERVER_NAME, SERVER_PORT
When combined with SCRIPT_NAME and PATH_INFO, these two strings can be used to complete the URL. Note, however, that HTTP_HOST, if present, should be used in preference to SERVER_NAME for reconstructing the request URL. See the URL Reconstruction section below for more detail. SERVER_NAME and SERVER_PORT can never be empty strings, and so are always required. 这些变量可以和SCRIPT_NAME、PATH_INFO一起组成完整的URL。然而要注意的是，重建请求URL的时候应该优先使用HTTP_HOST而非SERVER_NAME。详细内容参见「URL重建」。SERVER_NAME和SERVER_PORT永远不能为空字符串，也总是必须存在的。
SERVER_PROTOCOL
The version of the protocol the client used to send the request. Typically this will be something like HTTP/1.0 or HTTP/1.1 and may be used by the application to determine how to treat any HTTP request headers. (This variable should probably be called REQUEST_PROTOCOL, since it denotes the protocol used in the request, and is not necessarily the protocol that will be used in the server's response. However, for compatibility with CGI we have to keep the existing name.) 客户端发送请求所使用协议的版本。通常是类似「HTTP/1.0」或「HTTP/1.1」的东西，可以被用来判断如何处理请求包头。（既然这个变量表示的是请求中使用的协议，而且和服务器响应时使用的协议无关，也许它应该被叫做REQUEST_PROTOCOL。不过为了保持和CGI的兼容性，我们还是使用这个名字。）
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with HTTP_). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request. 对应客户端提供的HTTP请求包头（即名字以「HTTP_」开头的各种变量）。这些变量的存在与否应该与请求中对应的HTTP包头是否存在相一致。

Variable	Value
wsgi.version	The tuple (1, 0), representing WSGI version 1.0. (1,0)元组，代表WSGI1.0版
wsgi.url_scheme	A string representing the scheme portion of the URL at which the application is being invoked. Normally, this will have the value http or https, as appropriate. 字符串，表示应用请求的URL所属的协议，通常为「http」或「https」。
wsgi.input	An input stream (file-like object) from which the HTTP request body bytes can be read. (The server or gateway may perform reads on-demand as requested by the application, or it may pre- read the client's request body and buffer it in-memory or on disk, or use any other technique for providing such an input stream, according to its preference.) 类文件对象的输入流，用于读取HTTP请求包体的内容。（服务端在应用端请求时开始读取，或者预读客户端请求包体内容缓存在内存或磁盘中，或者视情况而定采用任何其他技术提供此输入流。）
wsgi.errors	An output stream (file-like object) to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. This should be a text mode stream; i.e., applications should use as a line ending, and assume that it will be converted to the correct line ending by the server/gateway. 类文件对象的输出流，用于写入错误信息，以集中规范地记录程序产生的或其他相关错误信息。这是一个文本流，即应用应该使用「n」来表示行尾，并假定其会被服务端正确地转换。 (On platforms where the str type is unicode, the error stream should accept and log arbitrary unicode without raising an error; it is allowed, however, to substitute characters that cannot be rendered in the stream's encoding.) （在str类型是Unicode编码的平台上，错误流应该正常接收并记录任意Unicode编码而不报错，并且允许自行替代在该平台编码中无法渲染的字符。） For many servers, wsgi.errors will be the server's main error log. Alternatively, this may be sys.stderr, or a log file of some sort. he server's documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired. 很多Web服务器中wsgi.errors是主要的错误日志，也有一些使用sys.stderr或其他形式的文件来记录。Web服务器的自述文档中应该包含如何配置错误日志以及如何找到记录的位置。服务端可以在被要求的情况下，向不同的应用提供不同的错误日志。
wsgi.multithread	This value should evaluate true if the application object may be simultaneously invoked by another thread in the same process, and should evaluate false otherwise. 如果应用对象可能会被同一进程的另一个线程同步调用，此变量值为真，否则为假。
wsgi.multiprocess	This value should evaluate true if an equivalent application object may be simultaneously invoked by another process, and should evaluate false otherwise. 如果同一个应用对象可能会被另一个进程同步调用，此变量值为真，否则为假。
wsgi.run_one	This value should evaluate true if the server or gateway expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a gateway based on CGI (or something similar). 如果服务端期望（但是不保证能得到满足）应用对象在生命周期中之辈调用一次，此变量值为真，否则为假。一般只有在基于类似CGI的网关服务器中此变量才会为真。

Input and Error Streams(输入和错误流)

Method	Stream	Notes
read(size)	input	1
readline()	input	1,2
readlines(hint)	input	1,3
iter()	input
flush()	errors	4
write(str)	errors
writelines(seq)	errors

The server is not required to read past the client's specified Content-Length, and should simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable. 不要求Web服务器读取超过客户端指定的Content-Length的内容，并且应该在应用尝试读取越界内容时虚拟出一个文件结束符。应用不应该尝试读取超过Content-Length指定长度的内容。
A server should allow read() to be called without an argument, and return the remainder of the client's input stream. Web服务器应该允许不使用参数调用read()，并返回客户端输入流剩余的部分；
A server should return empty bytestrings from any attempt to read from an empty or exhausted input stream. 同时服务器应该对任何尝试读取空的或到文件尾的输入流的行为返回空字符串。
Servers should support the optional size argument to readline(), but as in WSGI 1.0, they are allowed to omit support for it. Web服务器应该支持readline()函数的可选「size」参数，但是在WSGI1.0版本中可以忽略这一点。
(In WSGI 1.0, the size argument was not supported, on the grounds that it might have been complex to implement, and was not often used in practice... but then the cgi module started using it, and so practical servers had to start supporting it anyway!) （在WSGI1.0中，「size」参数并不要求提供，因为可能很难实现也不常用。但是由于CGI模块开始支持它了，所以生产环境中的Web服务器还是得实现「size」参数。）
Note that the hint argument to readlines() is optional for both caller and implementer. The application is free not to supply it, and the server or gateway is free to ignore it. readlines()函数的「hint」参数对于调用者和实现者来说都是可选的。应用端完全可以忽略它，服务端亦然。
Since the errors stream may not be rewound, servers and gateways are free to forward write operations immediately, without buffering. In this case, the flush() method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that flush() is a no-op. They must call flush() if they need to ensure that output has in fact been written. (For example, to minimize intermingling of data from multiple processes writing to the same error log.) 由于错误流不能重设读写位置，服务端可以使用无缓冲模式来进行写操作。在这种情况下，flush()函数不做任何操作。但是具有良好可移植性的程序不能假设输出流是无缓冲或flush()函数是误操作的，而应当在需要输出真的被写到存储设备中的时候调用flush()函数。（比如防止多进程写数据造成的混乱这种情况。）

The start_response() Callable(可调用start_response())

1	raise exc_info[1].with_traceback(exc_info[2])

def start_response(status, response_headers, exc_info=None):
    if exc_info:
         try:
             # do stuff w/exc_info here
         finally:
             exc_info = None    # Avoid circular ref.

Handling the Content-Length Header(处理Content-Length头)

Buffering and Streaming(缓冲和流)

Send the entire block to the operating system (and request that any O/S buffers be flushed) before returning control to the application, OR 将这个数据块转交给操作系统并请求刷新所有系统缓存。
Use a different thread to ensure that the block continues to be transmitted while the application produces the next block. 使用另一个单独的线程保证数据块在应用生成下一个数据块的时候继续传送。
(Middleware only) send the entire block to its parent gateway/server 中间件还可以将整个数据块传送给其上层的网关服务器或Web服务器。

Middleware Handling of Block Boundaries(块边界的中间件处理)

The write() Callable(可调用 write())

Unicode Issues(Unicode的问题)

Error Handling(错误处理)

try:
    # regular application code here
    status = "200 Froody"
    response_headers = [("content-type", "text/plain")]
    start_response(status, response_headers)
    return ["normal body goes here"]
except:
    # XXX should trap runtime issues like MemoryError, KeyboardInterrupt
    #     in a separate handler before this bare 'except:'...
    status = "500 Oops"
    response_headers = [("content-type", "text/plain")]
    start_response(status, response_headers, sys.exc_info())
    return ["error body goes here"]

Always provide exc_info when beginning an error response 开始错误响应时总是提供exc_info参数。
Never trap errors raised by start_response when exc_info is being provided 提供了exc_info参数的情况下不要捕获任何由start_response抛出的异常。

HTTP 1.1 Expect/Continue(HTTP 1.1预期/继续)

Respond to requests containing an Expect: 100-continue request with an immediate 100 Continue response, and proceed normally. 对于任何「Expect:100-continue」的请求返回一个即时的「100Continue」响应，然后正常继续运行。
Proceed with the request normally, but provide the application with a wsgi.input stream that will send the 100 Continue response if/when the application first attempts to read from the input stream. The read request must then remain blocked until the client responds. 继续正常运行，但是提供给应用一个wsgi.input流，这个流会在应用第一次尝试读取输入流的时候发送「100Continue」响应。读请求之后必须阻塞，直到客户端响应为止。
Wait until the client decides that the server does not support expect/continue, and sends the request body on its own. (This is suboptimal, and is not recommended.) 阻塞请求直到客户端意识到服务器不支持expect/continue机制，然后自己发送请求包体。（这种方法不是最优的，不推荐使用。）

Other HTTP Features(其他HTTP特性)

Thread Support(线程的支持)

Implementation/Application Notes(实现/应用笔记)

Server Extension APIs(服务器扩展接口)

Application Configuration(应用程序配置)

from the_app import application

def new_app(environ, start_response):
    environ['the_app.configval1'] = 'something'
    return application(environ, start_response)

URL Reconstruction(URL重建)

from urllib.parse import quote
url = environ['wsgi.url_scheme']+'://'

if environ.get('HTTP_HOST'):
    url += environ['HTTP_HOST']
else:
    url += environ['SERVER_NAME']

    if environ['wsgi.url_scheme'] == 'https':
        if environ['SERVER_PORT'] != '443':
           url += ':' + environ['SERVER_PORT']
    else:
        if environ['SERVER_PORT'] != '80':
           url += ':' + environ['SERVER_PORT']

url += quote(environ.get('SCRIPT_NAME', ''))
url += quote(environ.get('PATH_INFO', ''))
if environ.get('QUERY_STRING'):
    url += '?' + environ['QUERY_STRING']

Supporting Older (<2.2) Versions of Python(支持更早版本(<2.2)的Python)

You may not return a file object and expect it to work as an iterable, since before Python 2.2, files were not iterable. (In general, you shouldn't do this anyway, because it will perform quite poorly most of the time!) Use wsgi.file_wrapper or an application-specific file wrapper class. (See Optional Platform-Specific File Handling for more on wsgi.file_wrapper, and an example class you can use to wrap a file as an iterable.) 你不能返回一个文件对象并期望它像一个迭代器一样工作，因为从Python2.2开始文件就不是迭代器了。（一般而言你也不应该使用这种方法，因为绝大多数情况下这是一种丑陋的实现！）应该使用wsgi.file_wrapper或者应用指定的文件包装器。（参见（可选）特定平台上的文件处理小节以获取更多文件包装器的信息，以及一个可以用来将文件包装为迭代器的样例类。
If you return a custom iterable, it must implement the pre-2.2 iterator protocol. That is, provide a __getitem__ method that accepts an integer key, and raises IndexError when exhausted. (Note that built-in sequence types are also acceptable, since they also implement this protocol.) 如果你返回一个经过定制的迭代器，它必须实现2.2版本之前的迭代器协议。亦即提供一个__getitem__方法，这个方法接受一个整数键值，当该值耗尽时就会抛出IndexError异常。（内建的序列类型也是可接受的，因为它们已经集成了相关协议。）

Optional Platform-Specific File Handling(可选的特定于平台的文件处理)

if 'wsgi.file_wrapper' in environ:
    return environ['wsgi.file_wrapper'](filelike, block_size)
else:
    return iter(lambda: filelike.read(block_size), '')

class FileWrapper:

    def __init__(self, filelike, blksize=8192):
        self.filelike = filelike
        self.blksize = blksize
        if hasattr(filelike, 'close'):
            self.close = filelike.close

    def __getitem__(self, key):
        data = self.filelike.read(self.blksize)
        if data:
            return data
        raise IndexError

environ['wsgi.file_wrapper'] = FileWrapper
result = application(environ, start_response)

try:
    if isinstance(result, FileWrapper):
        # check if result.filelike is usable w/platform-specific
        # API, and if so, use that API to transmit the result.
        # If not, fall through to normal iterable handling
        # loop below.

    for data in result:
        # etc.

finally:
    if hasattr(result, 'close'):
        result.close()