协议内容

HTTP(超文本传输协议)是一种应用层协议,基于TCP/IP协议族。它是一种文本协议,通过文本形式进行数据交换,即传输的TCP报文消息内容是特定格式的文本。

消息格式

HTTP消息分为请求(Request)和响应(Response)两种类型,其格式均符合以下通用格式:

HTTP-message = Request | Response
generic-message = start-line *(message-header CRLF) CRLF [message-body]

其中,消息头部包括General Header、Request Header和Response Header三类,消息体可选,用于携带与请求或响应相关的实体数据。

消息头(Message Header)

格式为

field-name:[field-value]

包括General Header、Request Header和Response Header和Entity Header。

消息体(Message Body)

消息体(Message Body/Entity Body)是可选的,用于携带与请求或响应相关的实体数据。

通用头部(General Header)

通用头部是指在请求和响应消息中都可以使用的头部信息,不会针对特定的资源或实体进行设置,具体字段如:

Cache-Control, Connection, Date, Pragma, Trailer, Transfer-Encoding, Upgrade, Via, Warning等。

实体头部(Entity Header)

实体头部是只能在请求或响应的实体部分中使用的头部。实体是指请求或响应的正文部分,也就是消息体,实体头部包含有关该实体的元数据,如内容长度、内容编码方式,具体字段如:

Allow,Content-Encoding,Content-Language,Content-Length,Content-Location,Content-MD5,Content-Range,Content-Type,Expires,Last-Modified,extension-header等。

请求(Request)

请求(Request)由请求行(Request-Line)、请求头部(Request Header)和消息体(Message Body)三部分构成,其中请求行由请求方法(Method)、请求URI(Request-URI)和HTTP版本(HTTP-Version)组成。

Request = Request-Line * ( ( general-header | request-header | entity-header ) CRLF ) CRLF [ message-body ]
Request-Line = Method SP Request-URI SP HTTP-Version CRLF

HTTP协议定义了多种请求方法,常见的包括:

OPTIONS, GET, HEAD, POST, PUT, DELETE, TRANCE, CONNECT等。

请求URI可以是绝对地址(absolute URI)、相对地址(abs_path)或者授权机构(authority)。

请求头部(Request Header)

请求头部(Request Header)包含了请求相关的属性和特性信息,具体字段如:

Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, Expect, Form, Host,If-match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization,Range, Referer, TE, User-Agent等

响应(Response)

响应(Response)由状态行(Status-Line)、响应头部(Response Header)和消息体(Message Body)三部分构成,其中状态行由HTTP版本(HTTP-Version)、状态码(Status-Code)和原因短语(Reason-Phrase)组成。

Response = Status-Line * ( ( general-header | response-header | entity-header ) CRLF ) CRLF [ message-body ]
Status-Line = HTTP-Version SP Status-Code SP Reason-Pharse CRLF

HTTP状态码共分为5大类,包括1XX信息类、2XX成功类、3XX重定向类、4XX客户端错误类和5XX服务端错误类。响应头部以及消息体的信息与请求头部和消息体的内容相似。

响应头部(Response Header)

响应头部(Response Header)包含了响应相关的属性和特性信息,具体字段如:

Accept-Ranges, Age, Etag, Location, Proxy-Authenticate, Retry-After, Server, Vary, WWW-Authenticate等。

HTTP服务器实现

HTTP协议是一种文本协议,实现HTTP服务器,按照协议格式正确处理文本即可。

下面是一个使用Python的socket和多线程,实现的简单HTTP服务器SimpleHTTPServer,支持GET请求,找不到资源返回404,没有实现的方法返回501。

import socket
from multiprocessing.dummy import Pool as ThreadPool
import traceback
import logging
import os


class Server(object):

    SERVER_STRING = b"Server: SimpleHttpd/1.0.0\r\n"

    def __init__(self, host, port, worker_count=4):
        self._host = host
        self._port = port
        self._listen_fd = None
        self._worker_count = worker_count
        self._worker_pool = ThreadPool(worker_count)
        self._logger = logging.getLogger("simple.httpd")
        self._logger.setLevel(logging.DEBUG)
        self._logger.addHandler(logging.StreamHandler())

    def run(self):
        # 初始化 socket,绑定地址并开始监听
        self._listen_fd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self._listen_fd.bind((self._host, self._port))
        self._listen_fd.listen(self._worker_count)
        try:
            while True:
                # 接受连接请求,并将处理任务交给线程池
                conn, addr = self._listen_fd.accept()
                self._worker_pool.apply_async(self.accept_request, (conn, addr,))
        except Exception as e:
            traceback.print_exc()
        finally:
            # 关闭监听 socket
            self._listen_fd.close()

    def accept_request(self, conn: socket.socket, addr):
        try:
            # 解析请求并根据请求方法处理
            method, path, http_version, req_headers, req_body = self.recv_request(conn)
            if method == "GET":
                code = self.try_file(conn, path)
            elif method == "POST":
                code = self.unimplemented(conn)
            else:
                code = self.unimplemented(conn)

            # 记录访问日志
            self._logger.info("{}:{} {} {} {} {}".format(addr[0], addr[1], http_version, method, path, code))
        except Exception as e:
            traceback.print_exc()
        finally:
            # 关闭连接
            conn.close()

    def try_file(self, conn: socket.socket, path: str):
        # 尝试读取静态文件
        here = os.path.abspath(os.path.dirname(__file__))
        target = os.path.join(here, "www", path.strip("/"))
        if not os.path.isfile(target):
            return self.not_found(conn)
        with open(target, "rb") as target_file:
            data = target_file.read()
            conn.sendall(b"HTTP/1.0 200 OK\r\n")
            conn.sendall(self.SERVER_STRING)
            if ".html" in path:
                conn.sendall(b"Content-Type: text/html\r\n")
            else:
                conn.sendall(b"Content-Type: application/octetstream\r\n")
            conn.sendall(bytes("Content-Length: {}\r\n".format(len(data)), "utf-8"))
            conn.sendall(b"\r\n")
            conn.sendall(data)
            return 200

    def not_found(self, conn: socket.socket):
        # 处理404错误
        html = "<html>"
        html += "<head><title>Not Found</title></head>"
        html += "<body>Not Found</body>"
        html += "</html>"
        html = bytes(html, "utf-8")
        conn.sendall(b"HTTP/1.0 404 Not Found\r\n")
        conn.sendall(self.SERVER_STRING)
        conn.sendall(b"Content-Type: text/html\r\n")
        conn.sendall(b"Content-Encoding: utf-8\r\n")
        conn.sendall(bytes("Content-Length: {}\r\n".format(len(html)), "utf-8"))
        conn.sendall(b"\r\n")
        conn.sendall(html)
        return 404

    def unimplemented(self, conn: socket.socket):
        # 处理501错误
        html = "<html>"
        html += "<head><title>Method Not Implemented</title></head>"
        html += "<body>HTTP request method not supported</body>"
        html += "</html>"
        html = bytes(html, "utf-8")
        conn.sendall(b"HTTP/1.0 501 Method Not Implemented\r\n")
        conn.sendall(self.SERVER_STRING)
        conn.sendall(b"Content-Type: text/html\r\n")
        conn.sendall(b"Content-Encoding: utf-8\r\n")
        conn.sendall(bytes("Content-Length: {}\r\n".format(len(html)), "utf-8"))
        conn.sendall(b"\r\n")
        conn.sendall(html)
        return 501

    def recv_request(self, conn: socket.socket):
        # 读取请求行
        line = b''
        while not line.endswith(b'\r\n'):
            data = conn.recv(1)
            if not data:
                raise ConnectionError('Connection closed unexpectedly')
            line += data
        method, path, version = line.strip().decode().split(' ', 2)

        # 读取请求头
        headers = {}
        while True:
            line = b''
            while not line.endswith(b'\r\n'):
                data = conn.recv(1)
                if not data:
                    raise ConnectionError('Connection closed unexpectedly')
                line += data
            if line == b'\r\n':
                break
            key, value = line.strip().decode().split(': ', 1)
            headers[key] = value

        # 读取请求体
        content_length = int(headers.get('Content-Length', '0'))
        if content_length > 0:
            body = conn.recv(content_length)
        else:
            body = None

        # 返回请求行、请求头和请求体
        return method, path, version, headers, body


if __name__ == "__main__":
    server = Server("0.0.0.0", 3000)
    server.run()