自己动手写web框架

在这一章，我将实现一个python web framework，它具有如下功能：

支持正则表达式的路由功能
全局的Request和Response对象，方便获取资源
支持cookie、上传文件
Serve静态文件
模板引擎，支持变量替换和引入其他文件

测试环境：

python 2.7.3
windows 10

CONTENTS

前言
自己写一个框架

前言

python作为一种易学易用的语言，其历史比java还早，设计之初就是为了实现C的功能和shell的易用。python现在由开源社区维护，如今被越来越广的应用到各种邻域，比如web应用开发、科学计算、系统管理等。现在用python开发的web越来越多，比如豆瓣、知乎、今日头条等，各种web框架也是层出不穷，大多数时候我们都会选择用一种现成的框架而不是从头开始开发。一个web框架需要具备哪些功能呢？一般可以从MVC这种框架模式的角度来考虑，M（model）就是数据层，他负责和数据库交互；V（view）则是视图层，负责界面；C（controller）怎是控制层，一般是负责处理用户的请求，将请求映射到不同的处理函数还需要设计路由功能，模型层和视图层都是在控制层被调用。但是在不同的框架中控制层的定位不同，比如在django中，view就是控制层，所以说django是MTV（Model、Template、View）框架模式更合适。

先看一个最简单的web框架：

bottle是一种超轻量级的框架，他只有一个文件，但是他具备了一个框架的基础架构，它具有如下特点：路由功能，支持正则；模板引擎，比如jinja2；方便获取form、cookie等数据；内置了server，并且还支持大部分主流的Server比如gae。

使用示例：

# hello.py
from bottle import route, run, template

@route('/hello/<name>')
def index(name):
    return template('<b>Hello {{name}}</b>!', name=name)
run(host='localhost', port=8080, reloader=True)

自己写一个框架

一个web框架必不可少的部分就是解析http请求和生成响应，首先要弄清楚的是web应用和server。

关于WSGI接口

用户通过http协议访问web应用，http协议规定了客户端的请求和服务端的响应的文本内容。访问一个web的过程如下：

服务器软件（比如apache的http server，不要和服务器硬件混起来，以下简称服务器）监听80端口
浏览器发送请求，请求的body中带有参数
服务器接收请求，解析请求
运行在服务器上的web应用获取请求参数，生成html，返回html
服务器将html作为响应的body发回到浏览器

服务器软件负责接受HTTP请求、解析HTTP请求、发送HTTP响应，WSGI接口（Web Server Gateway Interface）规定了python web server应该遵循的标准，也就是说遵守WSGI规范的web server可以为不同的python web application提供服务。python内置了一个WSGI的参考实现，位于wsgiref模块。

使用示例：

# hello.py

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return '<h1>Hello, web!</h1>'

# server.py

# 从wsgiref模块导入:
from wsgiref.simple_server import make_server
# 导入我们自己编写的application函数:
from hello import application

# 创建一个服务器，IP地址为空，端口是8000，处理函数是application:
httpd = make_server('', 8000, application)
print "Serving HTTP on port 8000..."
# 开始监听HTTP请求:
httpd.serve_forever()

所以，编写web应用很简单，只需要写一个处理函数，由wsgi服务器来调用该函数，在处理函数内部从environ获取请求的参数，将处理后的结果返回就可以了。但是，要处理成百上千个不同URL的请求，就需要路由功能了。

路由功能

在WSGI入口函数内将URL映射到对应的处理函数，为了简单起见，不考虑请求的method，由处理函数自己去处理。借鉴bottle的做法，我们定义装饰器来处理映射，该装饰器可以这样使用:

@route(r'/hello/(\w+)')
def say_hello(name):
    return 'Hello %s' % name

参数是URL的正则表达式。在正则表达式中使用分组可以将该参数以位置参数的形式传到回调函数中。

实现的方法是用一个全局的WebApplication的对象app保存所有处理函数，每个处理函数有一个__route__属性，这个属性就是该函数处理的URL的正则表达式。每当有请求来的时候，app对象就分发该请求到相应的函数，分发的过程是逐一判断当前请求的URL和每一个函数能处理的URL是否匹配，如果匹配则调用该函数。

注意：

如果两个回调函数的url相同，则在前面的回调函数将先处理;
回调函数通过全局的request和response获取请求的参数和设置相应的状态和头。

完整的代码

################################################################
# An app instance with a built in server
# Responsible for routing diffrent request url to handlers
################################################################
class WebApplication(object):

    def __init__(self):
        self.routers = []

    def route(self, path):
        def _decorator(func):
            func.__route__ = path+'$'
            self.routers.append(func)
            return func
        return _decorator

    def callback(self):
        path = ctx.request.path_info
        for fn in self.routers:
            m = re.match(fn.__route__, path)
            if m:
                args = m.groups()
                return fn(*args)
        return error(404)

    def run(self, port=8888):
        '''
        python built in server
        '''
        def wsgi(env, start_response):
            ctx.request = Request(env)
            ctx.response = Response()
            try:
                r = self.callback()
                if isinstance(r, unicode):
                    r = r.encode('utf-8')
                if r is None:
                    r=error(404)
            except Exception, e:
                r=error(500)

            start_response(ctx.response.status, ctx.response.headers)
            return r

        DOCUMENT_ROOT = os.getcwd()
        from wsgiref.simple_server import make_server
        server = make_server('', port, wsgi)
        print 'Serving on port %s' % port
        server.serve_forever()

app = WebApplication()
def route(path):
    return app.route(path)

ctx = threading.local()

为了让用户能直接调用route装饰器，在最后添加了route函数。当然需要先创建app实例，用户启动服务只需要调用app.start()。

为了让URL处理函数能方便的获取Request和设置Response的内容，需要使用全局的Request和Response的实例，但是多线程可以共享全局变量，因此，并发的请求会改乱这些全局变量，因此需要使用线程安全的全局对象。ctx = threading.local()这个对象是局限于线程内的全局变量，将request和response作为他的属性就可以实现线程安全。

解析http请求，生成响应

前面已经讲过了wsgi接口，server通过environ这个参数将http请求传到web应用中，就让我们来获取参数吧！

Request

http请求报文的格式，参见维基百科

A Request-line format：Request-Line = Method SP Request-URI SP HTTP-Version CRLF example：GET /hello.htm HTTP/1.1

Zero or more header (General|Request|Entity) fields followed by CRLF example:

accept-language:zh-CN,zh;q=0.8,en-US;q=0.6
user-agent:Mozilla/5.0
accept-language:zh-CN
Connection: Keep-Alive
...

An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields
Optionally a message-body example: name=yxr&age=22

需要说明的是，不管是GET还是POST方法，type为application/x-www-form-urlencoded（默认）的表单都是通过上面的方式传递参数，如果是文件（multipart/form-data），则会进行响应的编解码，然后通过POST传输。

获取GET的值

通过GET方法提交的参数在查询字符串中（QueryString），可以通过environ['QUERY_STRING']获取

query_dict=parse_qs(environ['QUERY_STRING'],keep_blank_values=True)

environ['QUERY_STRING']是一个字符串，类似于name=yxr&age=22这样的，使用urlparse模块的parse_qs函数进行解析，keep_blank_values设置为True可以保留那些值为空的字段，否则将会忽略。
parse_qs始终返回一个dict，值为list，因此为了方便使用获取需要进行一下处理

# parse query string
self._query_dict={}
query_dict=parse_qs(environ['QUERY_STRING'],keep_blank_values=True)
for k,v in query_dict.iteritems():
    # v is a list
    if len(v)==1:
        self._query_dict=v[0]
    else:
        self._query_dict=v

获取POST的值

post提交的数据保存在environ['wsgi.input']中，可以通过cgi模块的FieldStorage对象获取：

post_data = cgi.FieldStorage(fp=environ['wsgi.input'], environ=self._environ, keep_blank_values=True)

FieldStorage对象类似于dict，它支持in操作符,keys方法和内置的lens函数，但是他没有get方法。

获取指定的字段： post_data[key]返回的是一个FieldStorage对象，通过该对象的value属性可以获取它的值，比如：

post_data['name'].value

但是如果name这个字段不存在，就会报错，推荐使用FieldStorage的getvalue方法，类似于dict的get方法，比如：

post_data.getvalue('name','')

如果name不存在就返回空字符串。

特殊的情况：同一个名字对应多个字段，此时post_data[key]返回的不是FieldStorage对象，而是包含FieldStorage的list，getvalue返回一个list，也可以通过getlist方法返回一个list，不同点是getlist方法总是返回一个list。

cookie不是http标准的内容，应该算是一种工业标准，所有的浏览器都支持。cookie的出现是为了弥补http无状态的缺点，所以是一种实现保持状态的hack手段，它的实现也比较简单。

通过Response的Set-Cookie头来设置cookie，浏览器保存该cookie，每次请求时都会通过Cookie头带上该cookie 维基百科上的一个示例很清楚的解释了cookie的工作原理。

cookie的解析类似于querystring，但是解析后的键名可以包含空白符，需要strip一下。

self._cookies={}
cookie_str=environ.get('HTTP_COOKIE')
if cookie_str:
    cookie_dict=parse_qs(cookie_str, keep_blank_values=True)
    self._cookies = {k.strip():v[0] for k,v in cookie_dict.iteritems()}

文件上传

如果一个字段代表一个上传的一个文件，post_data[key].value或post_data.getvalue(key)返回的是字符串表示的文件内容，因此对于文件上传需要特殊处理一下。
通过判断一个字段是否有file或filename属性可以判断该字段代表的是否为文件：

up_file = post_data["userfile"]
# print the content of the file
if up_file.file:
    print up_file.value

我使用FileUpload类来处理文件上传，使用代表文件的FieldStorage初始化，save方法可以保存该文件，save_path和save_name参数分别指定了保存的路径和保存的文件名：

class FileUpload(object):
    def __init__(self,filed):
        self.error=''
        file_storage=ctx.request.file(filed)
        if file_storage is not None:
            self.filename=file_storage.filename
            self.storage=file_storage
        else:
            self.error='File Not Found'

    def get_filext(self):
        return os.path.splitext(self.filename)[-1]

    def save(self,save_path='uploads',save_name=None):
        if not os.path.exists(save_path):
            os.makedirs(save_path)
        if save_name:
            self.filename=save_name
        target = os.path.join(save_path, self.filename)
        f = open(target, 'wb')
        f.write(self.storage.value)
        f.close()
        return target

完整的Request类定义如下：

class Request(object):
    def __init__(self,environ):
        # parse query string
        self._query_dict={}
        query_dict=parse_qs(environ['QUERY_STRING'],keep_blank_values=True)
        for k,v in query_dict.iteritems():
            # v is a list
            if len(v)==1:
                self._query_dict=v[0]
            else:
                self._query_dict=v
        # post data
        self._post_data = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ, keep_blank_values=True)
        # parse cookie
        self._cookies={}
        cookie_str=environ.get('HTTP_COOKIE')
        if cookie_str:
            cookie_dict=parse_qs(cookie_str, keep_blank_values=True)
            self._cookies = {k.strip():v[0] for k,v in cookie_dict.iteritems()}
        # save the origin environment
        self._environ=environ

    @property
    def method(self):
        return self._environ['REQUEST_METHOD']

    @property
    def path_info(self):
        return self._environ['PATH_INFO']

    # a value of query or post may be a string or a list(multi values) or None(key doesn't exists)

    def query(self,key):
        return self._query_dict.get(key)

    def post(self,key):
        return self._post_data.getvalue(key)

    def file(self,key):
        if key in self._post_data and self._post_data[key].filename:
            return self._post_data[key]
        return None

    def get_cookie(self,name):
        return self._cookies.get(name)

Response

http响应报文的格式

A Status-line format: Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
example: HTTP/1.1 200 OK

Zero or more header (General|Response|Entity) fields followed by CRLF example:

content-length:35
date:Tue, 05 Apr 2016 16:48:53 GMT
last-modified:Sun, 17 May 1998 03:00:00 GMT
content-type:image/gif
...

An empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields

Optionally a message-body example:

<html>
<body>
<h1>Hello, World!</h1>
</body>
</html>

Response类负责生成http响应的状态行和headers，并且支持设置cookie。

完整的Response类定义如下

_RESPONSE_STATUSES = {
    # Redirection
    301: 'Moved Permanently',
    302: 'Found',
    304: 'Not Modified',

    # Client Error
    400: 'Bad Request',
    401: 'Unauthorized',
    402: 'Payment Required',
    403: 'Forbidden',
    404: 'Not Found',

    # Server Error
    500: 'Internal Server Error',
}


class Response(object):
    def __init__(self):
        self._status = '200 OK'
        self._headers = {'CONTENT-TYPE': 'text/html; charset=utf-8'}

    @property
    def status(self):
        return self._status

    @status.setter
    def status(self, value):
        st = _RESPONSE_STATUSES.get(value, '')
        self._status = '%d %s' % (value, st)

    @property
    def headers(self):
        '''
        Return response headers as [(key1, value1), (key2, value2)...] including cookies.
        '''
        L = [(k,v) for k, v in self._headers.iteritems()]
        if hasattr(self, '_cookies'):
            for v in self._cookies.itervalues():
                L.append(('Set-Cookie', v))
        return L

    def header(self, name, value=None):
        if value:
            self._headers[name]=value
        else:
            return self._headers[name]

    def set_cookie(self,name,value,max_age=None):
        if not hasattr(self, '_cookies'):
            self._cookies = {}
        L = ['%s=%s' % (name, value)]
        if isinstance(max_age, (int, long)):
            L.append('Max-Age=%d' % max_age)
        self._cookies[name] = '; '.join(L)

对于状态为3XX或4XX等，我专门定义了函数来处理。因为404（not found），302（found）等很常用,在这里我们可以定制自己的404.html等。

def error(code):
    status='%d %s' % (code, _RESPONSE_STATUSES[code])
    ctx.response.status =status 
    template='<html><body><h1>%s</h1></body></html>'
    return template%status

def redirect(code,location):
    error(code)
    ctx.response.header('Location', location)
    return ''

Serve Static Files

很多web server都默认带了这个功能，但为了安全，web应用需要实现该功能，并且通过手动设置来serve 静态文件。

我使用了generator来返回文件，而不是一次性读入内存。

################################################################
# server static file such as css, js, imgages                  #
################################################################
def _static_file_generator(fpath):
    BLOCK_SIZE = 8192
    with open(fpath, 'rb') as f:
        block = f.read(BLOCK_SIZE)
        while block:
            yield block
            block = f.read(BLOCK_SIZE)

def static_file(fpath):
    if not os.path.isfile(fpath):
        raise HttpError(404)
    fext = os.path.splitext(fpath)[1]
    ctx.response.header('CONTENT-TYPE', mimetypes.types_map.get(fext.lower(), 'application/octet-stream'))
    return _static_file_generator(fpath)

Template

相当于view层，负责将将数据渲染成html，我将用一个简单的函数来实现两个基本的功能。

该函数输入有两个参数，第一个参数是模板的路径，比如app/index.html。第二个参数是要替换的数据，类型为dict。将第二个参数指定为None可以不进行任何替换。返回渲染后的html内容。

功能：

遇到{{var_name}}用传进来的参数data[var_name]替换
遇到{% include template_name.html %}则用template_name.html的内容替换

################################################################
# A very simple template that can
# 1. include other files
# 2. replace {{var_name}} with data['var_name']
# you can add more features yourself
################################################################
def render(path,data={}):
    path=os.path.join(TEMPLATE_DIR,path)
    with open(path) as f:
        page=f.read()
    if data is None:
        return page
    dirname=os.path.dirname(path)
    inc=re.compile(r'\{%\s*include\s*(\w+(\.\w+)*)\s*%\}')
    def include(m):
        name=m.group(1)
        path=os.path.join(dirname,name)
        with open(path) as f:
            return f.read()
    page=re.sub(inc,include,page)

    ass=re.compile(r'\{\{\s*(\w+)\s*\}\}')
    def assign(m):
        var_name=m.group(1)
        return str(data.get(var_name,''))
    page=re.sub(ass,assign,page)
    # with open('out.html','w') as f:
    #     f.write(page)
    return page

好了，web框架写好了，该进入实战了。