协议内容
HTTP(超文本传输协议)是一种应用层协议,基于TCP/IP协议族。它是一种文本协议,通过文本形式进行数据交换,即传输的TCP报文消息内容是特定格式的文本。
消息格式
HTTP消息分为请求(Request)和响应(Response)两种类型,其格式均符合以下通用格式:
HTTP-message = Request | Response
generic-message = start-line *(message-header CRLF) CRLF [message-body]
其中,消息头部包括General Header、Request Header和Response Header三类,消息体可选,用于携带与请求或响应相关的实体数据。
消息头(Message Header)
格式为
field-name:[field-value]
包括General Header、Request Header和Response Header和Entity Header。
消息体(Message Body)
消息体(Message Body/Entity Body)是可选的,用于携带与请求或响应相关的实体数据。
通用头部(General Header)
通用头部是指在请求和响应消息中都可以使用的头部信息,不会针对特定的资源或实体进行设置,具体字段如:
Cache-Control, Connection, Date, Pragma, Trailer, Transfer-Encoding, Upgrade, Via, Warning等。
实体头部(Entity Header)
实体头部是只能在请求或响应的实体部分中使用的头部。实体是指请求或响应的正文部分,也就是消息体,实体头部包含有关该实体的元数据,如内容长度、内容编码方式,具体字段如:
Allow,Content-Encoding,Content-Language,Content-Length,Content-Location,Content-MD5,Content-Range,Content-Type,Expires,Last-Modified,extension-header等。
请求(Request)
请求(Request)由请求行(Request-Line)、请求头部(Request Header)和消息体(Message Body)三部分构成,其中请求行由请求方法(Method)、请求URI(Request-URI)和HTTP版本(HTTP-Version)组成。
Request = Request-Line * ( ( general-header | request-header | entity-header ) CRLF ) CRLF [ message-body ]
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
HTTP协议定义了多种请求方法,常见的包括:
OPTIONS, GET, HEAD, POST, PUT, DELETE, TRANCE, CONNECT等。
请求URI可以是绝对地址(absolute URI)、相对地址(abs_path)或者授权机构(authority)。
请求头部(Request Header)
请求头部(Request Header)包含了请求相关的属性和特性信息,具体字段如:
Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, Expect, Form, Host,If-match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization,Range, Referer, TE, User-Agent等
响应(Response)
响应(Response)由状态行(Status-Line)、响应头部(Response Header)和消息体(Message Body)三部分构成,其中状态行由HTTP版本(HTTP-Version)、状态码(Status-Code)和原因短语(Reason-Phrase)组成。
Response = Status-Line * ( ( general-header | response-header | entity-header ) CRLF ) CRLF [ message-body ]
Status-Line = HTTP-Version SP Status-Code SP Reason-Pharse CRLF
HTTP状态码共分为5大类,包括1XX信息类、2XX成功类、3XX重定向类、4XX客户端错误类和5XX服务端错误类。响应头部以及消息体的信息与请求头部和消息体的内容相似。
响应头部(Response Header)
响应头部(Response Header)包含了响应相关的属性和特性信息,具体字段如:
Accept-Ranges, Age, Etag, Location, Proxy-Authenticate, Retry-After, Server, Vary, WWW-Authenticate等。
HTTP服务器实现
HTTP协议是一种文本协议,实现HTTP服务器,按照协议格式正确处理文本即可。
下面是一个使用Python的socket和多线程,实现的简单HTTP服务器SimpleHTTPServer,支持GET请求,找不到资源返回404,没有实现的方法返回501。
import socket
from multiprocessing.dummy import Pool as ThreadPool
import traceback
import logging
import os
class Server(object):
SERVER_STRING = b"Server: SimpleHttpd/1.0.0\r\n"
def __init__(self, host, port, worker_count=4):
self._host = host
self._port = port
self._listen_fd = None
self._worker_count = worker_count
self._worker_pool = ThreadPool(worker_count)
self._logger = logging.getLogger("simple.httpd")
self._logger.setLevel(logging.DEBUG)
self._logger.addHandler(logging.StreamHandler())
def run(self):
# 初始化 socket,绑定地址并开始监听
self._listen_fd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self._listen_fd.bind((self._host, self._port))
self._listen_fd.listen(self._worker_count)
try:
while True:
# 接受连接请求,并将处理任务交给线程池
conn, addr = self._listen_fd.accept()
self._worker_pool.apply_async(self.accept_request, (conn, addr,))
except Exception as e:
traceback.print_exc()
finally:
# 关闭监听 socket
self._listen_fd.close()
def accept_request(self, conn: socket.socket, addr):
try:
# 解析请求并根据请求方法处理
method, path, http_version, req_headers, req_body = self.recv_request(conn)
if method == "GET":
code = self.try_file(conn, path)
elif method == "POST":
code = self.unimplemented(conn)
else:
code = self.unimplemented(conn)
# 记录访问日志
self._logger.info("{}:{} {} {} {} {}".format(addr[0], addr[1], http_version, method, path, code))
except Exception as e:
traceback.print_exc()
finally:
# 关闭连接
conn.close()
def try_file(self, conn: socket.socket, path: str):
# 尝试读取静态文件
here = os.path.abspath(os.path.dirname(__file__))
target = os.path.join(here, "www", path.strip("/"))
if not os.path.isfile(target):
return self.not_found(conn)
with open(target, "rb") as target_file:
data = target_file.read()
conn.sendall(b"HTTP/1.0 200 OK\r\n")
conn.sendall(self.SERVER_STRING)
if ".html" in path:
conn.sendall(b"Content-Type: text/html\r\n")
else:
conn.sendall(b"Content-Type: application/octetstream\r\n")
conn.sendall(bytes("Content-Length: {}\r\n".format(len(data)), "utf-8"))
conn.sendall(b"\r\n")
conn.sendall(data)
return 200
def not_found(self, conn: socket.socket):
# 处理404错误
html = "<html>"
html += "<head><title>Not Found</title></head>"
html += "<body>Not Found</body>"
html += "</html>"
html = bytes(html, "utf-8")
conn.sendall(b"HTTP/1.0 404 Not Found\r\n")
conn.sendall(self.SERVER_STRING)
conn.sendall(b"Content-Type: text/html\r\n")
conn.sendall(b"Content-Encoding: utf-8\r\n")
conn.sendall(bytes("Content-Length: {}\r\n".format(len(html)), "utf-8"))
conn.sendall(b"\r\n")
conn.sendall(html)
return 404
def unimplemented(self, conn: socket.socket):
# 处理501错误
html = "<html>"
html += "<head><title>Method Not Implemented</title></head>"
html += "<body>HTTP request method not supported</body>"
html += "</html>"
html = bytes(html, "utf-8")
conn.sendall(b"HTTP/1.0 501 Method Not Implemented\r\n")
conn.sendall(self.SERVER_STRING)
conn.sendall(b"Content-Type: text/html\r\n")
conn.sendall(b"Content-Encoding: utf-8\r\n")
conn.sendall(bytes("Content-Length: {}\r\n".format(len(html)), "utf-8"))
conn.sendall(b"\r\n")
conn.sendall(html)
return 501
def recv_request(self, conn: socket.socket):
# 读取请求行
line = b''
while not line.endswith(b'\r\n'):
data = conn.recv(1)
if not data:
raise ConnectionError('Connection closed unexpectedly')
line += data
method, path, version = line.strip().decode().split(' ', 2)
# 读取请求头
headers = {}
while True:
line = b''
while not line.endswith(b'\r\n'):
data = conn.recv(1)
if not data:
raise ConnectionError('Connection closed unexpectedly')
line += data
if line == b'\r\n':
break
key, value = line.strip().decode().split(': ', 1)
headers[key] = value
# 读取请求体
content_length = int(headers.get('Content-Length', '0'))
if content_length > 0:
body = conn.recv(content_length)
else:
body = None
# 返回请求行、请求头和请求体
return method, path, version, headers, body
if __name__ == "__main__":
server = Server("0.0.0.0", 3000)
server.run()