Tutorial to implement HTTP 206 content fragmentation using Node.js

Introduction

In this article, I will explain the basic concepts of HTTP status 206 sub-sections and implement it step by step using Node.js. We will also test the code with an example based on the most common scenario of its usage: a An HTML5 page that starts playing the video file at any point in time.
A brief introduction to Partial Content

HTTP's 206 Partial Content status code and its related message headers provide a mechanism that allows browsers and other user agents to receive part of the content from the server instead of the entire content. This mechanism is widely used in a large The transfer of video files is supported by most browsers and players such as Windows Media Player and VLC Player.

The basic process can be described in the following steps:

The browser requests content.
The server tells the browser that the content can be requested in parts using the Accept-Ranges header.
The browser resends the request and uses the Range header to tell the server the required content range.

The server will respond to the browser’s request in the following two situations:

If the range is reasonable, the server will return the requested partial content with a 206 Partial Content status code. The range of the current content will be declared in the Content-Range header.
If the range is unavailable (for example, larger than the total number of bytes of the content), the server will return the 416 Requested Range Not Satisfiable status code. The available range will also be declared in the Content-Range header. .

Let’s take a look at each of the key headers in these steps.

Accept-Ranges: bytes

This is the byte header that will be sent by the server, showing the content that can be sent to the browser in parts. This value declares the range that is accepted for each request, in most cases the number of bytes.

Range: Number of bytes (bytes) = (start)-(end)

This is the message header that the browser informs the server of the required partial content range. Note that the start and end positions are included and start from 0. This message header does not need to send the two positions. The meaning is as follows:

If the end position is removed, the server will return the last available byte of the content from the declared start position to the end position of the entire content.
If the start position is removed, the end position parameter can be described as the number of bytes that can be returned by the server starting from the last available byte.

Content-Range: Number of bytes (bytes) = (start)-(end)/(total)

This header will appear with HTTP status code 206. The start and end values show the range of the current content. Like the Range header, both values are inclusive and start from zero. . The total value declares the total number of available bytes.

Content-Range: */(total number)

This header is the same as the previous one, but in a different format, and is only sent when HTTP status code 416 is returned. The total number represents the total number of bytes available for the text.

Here are a pair of examples with 2048 byte files. Pay attention to the difference between omitting the starting point and the key point.

The first 1024 bytes of the request

Browser sends:

GET /dota2/techies.mp4 HTTP/1.1
Host: localhost:8000
Range: bytes=0-1023

Copy after login

Server returns:

HTTP/1.1 206 Partial Content
Date: Mon, 15 Sep 2014 22:19:34 GMT
Content-Type: video/mp4
Content-Range: bytes 0-1023/2048
Content-Length: 1024
 
(Content...)

Copy after login

No end position request

Browser sends:

GET /dota2/techies.mp4 HTTP/1.1
Host: localhost:8000
Range: bytes=1024-

Copy after login

Server returns:

HTTP/1.1 206 Partial Content
Date: Mon, 15 Sep 2014 22:19:34 GMT
Content-Type: video/mp4
Content-Range: bytes 1024-2047/2048
Content-Length: 1024
 
(Content...)

Copy after login

Note: The server is not required to return all remaining bytes in a single response, especially if the body is too long or there are other performance considerations. So the following two examples are also acceptable in this case:

Content-Range: bytes 1024-1535/2048
Content-Length: 512

Copy after login

The server returns only half of the remaining body. The next requested range will start at byte 1536.

Content-Range: bytes 1024-1279/2048
Content-Length: 256

Copy after login

The server only returns the 256 bytes of the remaining body. The next requested range will start at byte 1280.

Request last 512 bytes

Browser sends:

GET /dota2/techies.mp4 HTTP/1.1
Host: localhost:8000
Range: bytes=-512

Copy after login

Server returns:

HTTP/1.1 206 Partial Content
Date: Mon, 15 Sep 2014 22:19:34 GMT
Content-Type: video/mp4
Content-Range: bytes 1536-2047/2048
Content-Length: 512
 
(Content...)

Copy after login

Requesting an unavailable range:

Browser sends:

GET /dota2/techies.mp4 HTTP/1.1
Host: localhost:8000
Range: bytes=1024-4096

Copy after login

Server returns:

HTTP/1.1 416 Requested Range Not Satisfiable
Date: Mon, 15 Sep 2014 22:19:34 GMT
Content-Range: bytes */2048

Copy after login

理解了工作流和头部信息后，现在我们可以用Node.js去实现这个机制。

开始用Node.js实现

第一步：创建一个简单的HTTP服务器

我们将像下面的例子那样，从一个基本的HTTP服务器开始。这已经可以基本足够处理大多数的浏览器请求了。首先，我们初始化我们需要用到的对象，并且用initFolder来代表文件的位置。为了生成Content-Type头部，我们列出文件扩展名和它们相对应的MIME名称来构成一个字典。在回调函数httpListener()中，我们将仅允许GET可用。如果出现其他方法，服务器将返回405 Method Not Allowed，在文件不存在于initFolder，服务器将返回404 Not Found。

// 初始化需要的对象
var http = require("http");
var fs = require("fs");
var path = require("path");
var url = require("url");
 
// 初始的目录，随时可以改成你希望的目录
var initFolder = "C:\\Users\\User\\Videos";
 
// 将我们需要的文件扩展名和MIME名称列出一个字典
var mimeNames = {
  ".css": "text/css",
  ".html": "text/html",
  ".js": "application/javascript",
  ".mp3": "audio/mpeg",
  ".mp4": "video/mp4",
  ".ogg": "application/ogg", 
  ".ogv": "video/ogg", 
  ".oga": "audio/ogg",
  ".txt": "text/plain",
  ".wav": "audio/x-wav",
  ".webm": "video/webm";
};
 
http.createServer(httpListener).listen(8000);
 
function httpListener (request, response) {
  // 我们将只接受GET请求，否则返回405 'Method Not Allowed'
  if (request.method != "GET") { 
    sendResponse(response, 405, {"Allow" : "GET"}, null);
    return null;
  }
 
  var filename = 
    initFolder + url.parse(request.url, true, true).pathname.split('/').join(path.sep);
 
  var responseHeaders = {};
  var stat = fs.statSync(filename);
  // 检查文件是否存在，不存在就返回404 Not Found
  if (!fs.existsSync(filename)) {
    sendResponse(response, 404, null, null);
    return null;
  }
  responseHeaders["Content-Type"] = getMimeNameFromExt(path.extname(filename));
  responseHeaders["Content-Length"] = stat.size; // 文件大小
     
  sendResponse(response, 200, responseHeaders, fs.createReadStream(filename));
}
 
function sendResponse(response, responseStatus, responseHeaders, readable) {
  response.writeHead(responseStatus, responseHeaders);
 
  if (readable == null)
    response.end();
  else
    readable.on("open", function () {
      readable.pipe(response);
    });
 
  return null;
}
 
function getMimeNameFromExt(ext) {
  var result = mimeNames[ext.toLowerCase()];
   
  // 最好给一个默认值
  if (result == null)
    result = "application/octet-stream";
   
  return result;
<strong>}
</strong>

Copy after login

步骤 2 - 使用正则表达式捕获Range消息头

有了这个HTTP服务器做基础，我们现在就可以用如下代码处理Range消息头了. 我们使用正则表达式将消息头分割，以获取开始和结束字符串。然后使用 parseInt() 方法将它们转换成整形数. 如果返回值是 NaN (非数字not a number), 那么这个字符串就是没有在这个消息头中的. 参数totalLength展示了当前文件的总字节数. 我们将使用它计算开始和结束位置.

function readRangeHeader(range, totalLength) {
    /*
     * Example of the method &apos;split&apos; with regular expression.
     * 
     * Input: bytes=100-200
     * Output: [null, 100, 200, null]
     * 
     * Input: bytes=-200
     * Output: [null, null, 200, null]
     */
 
  if (range == null || range.length == 0)
    return null;
 
  var array = range.split(/bytes=([0-9]*)-([0-9]*)/);
  var start = parseInt(array[1]);
  var end = parseInt(array[2]);
  var result = {
    Start: isNaN(start) &#63; 0 : start,
    End: isNaN(end) &#63; (totalLength - 1) : end
  };
   
  if (!isNaN(start) && isNaN(end)) {
    result.Start = start;
    result.End = totalLength - 1;
  }
 
  if (isNaN(start) && !isNaN(end)) {
    result.Start = totalLength - end;
    result.End = totalLength - 1;
  }
 
  return result;
}

Copy after login

步骤 3 - 检查数据范围是否合理

回到函数 httpListener(), 在HTTP方法通过之后，现在我们来检查请求的数据范围是否可用. 如果浏览器没有发送 Range 消息头过来, 请求就会直接被当做一般的请求对待. 服务器会返回整个文件，HTTP状态将会是 200 OK. 另外我们还会看看开始和结束位置是否比文件长度更大或者相等. 只要有一个是这种情况，请求的数据范围就是不能被满足的. 返回的状态就将会是 416 Requested Range Not Satisfiable 而 Content-Range 也会被发送.

var responseHeaders = {};
  var stat = fs.statSync(filename);
  var rangeRequest = readRangeHeader(request.headers[&apos;range&apos;], stat.size);
  
  // If &apos;Range&apos; header exists, we will parse it with Regular Expression.
  if (rangeRequest == null) {
    responseHeaders[&apos;Content-Type&apos;] = getMimeNameFromExt(path.extname(filename));
    responseHeaders[&apos;Content-Length&apos;] = stat.size; // File size.
    responseHeaders[&apos;Accept-Ranges&apos;] = &apos;bytes&apos;;
     
    // If not, will return file directly.
    sendResponse(response, 200, responseHeaders, fs.createReadStream(filename));
    return null;
  }
 
  var start = rangeRequest.Start;
  var end = rangeRequest.End;
 
  // If the range can&apos;t be fulfilled. 
  if (start >= stat.size || end >= stat.size) {
    // Indicate the acceptable range.
    responseHeaders[&apos;Content-Range&apos;] = &apos;bytes */&apos; + stat.size; // File size.
 
    // Return the 416 &apos;Requested Range Not Satisfiable&apos;.
    sendResponse(response, 416, responseHeaders, null);
    return null;
  }

Copy after login

步骤 4 - 满足请求

最后使人迷惑的一块来了。对于状态 216 Partial Content, 我们有另外一种格式的 Content-Range 消息头，包括开始，结束位置以及当前文件的总字节数. 我们也还有 Content-Length 消息头，其值就等于开始和结束位置之间的差。在最后一句代码中，我们调用了 createReadStream() 并将开始和结束位置的值给了第二个参数选项的对象, 这意味着返回的流将只包含从开始到结束位置的只读数据.

// Indicate the current range. 
  responseHeaders['Content-Range'] = 'bytes ' + start + '-' + end + '/' + stat.size;
  responseHeaders['Content-Length'] = start == end &#63; 0 : (end - start + 1);
  responseHeaders['Content-Type'] = getMimeNameFromExt(path.extname(filename));
  responseHeaders['Accept-Ranges'] = 'bytes';
  responseHeaders['Cache-Control'] = 'no-cache';
 
  // Return the 206 'Partial Content'.
  sendResponse(response, 206, 
    responseHeaders, fs.createReadStream(filename, { start: start, end: end }));

Copy after login

下面是完整的 httpListener() 回调函数.

function httpListener(request, response) {
  // We will only accept 'GET' method. Otherwise will return 405 'Method Not Allowed'.
  if (request.method != 'GET') {
    sendResponse(response, 405, { 'Allow': 'GET' }, null);
    return null;
  }
 
  var filename =
    initFolder + url.parse(request.url, true, true).pathname.split('/').join(path.sep);
 
  // Check if file exists. If not, will return the 404 'Not Found'. 
  if (!fs.existsSync(filename)) {
    sendResponse(response, 404, null, null);
    return null;
  }
 
  var responseHeaders = {};
  var stat = fs.statSync(filename);
  var rangeRequest = readRangeHeader(request.headers['range'], stat.size);
 
  // If 'Range' header exists, we will parse it with Regular Expression.
  if (rangeRequest == null) {
    responseHeaders['Content-Type'] = getMimeNameFromExt(path.extname(filename));
    responseHeaders['Content-Length'] = stat.size; // File size.
    responseHeaders['Accept-Ranges'] = 'bytes';
 
    // If not, will return file directly.
    sendResponse(response, 200, responseHeaders, fs.createReadStream(filename));
    return null;
  }
 
  var start = rangeRequest.Start;
  var end = rangeRequest.End;
 
  // If the range can't be fulfilled. 
  if (start >= stat.size || end >= stat.size) {
    // Indicate the acceptable range.
    responseHeaders['Content-Range'] = 'bytes */' + stat.size; // File size.
 
    // Return the 416 'Requested Range Not Satisfiable'.
    sendResponse(response, 416, responseHeaders, null);
    return null;
  }
 
  // Indicate the current range. 
  responseHeaders['Content-Range'] = 'bytes ' + start + '-' + end + '/' + stat.size;
  responseHeaders['Content-Length'] = start == end &#63; 0 : (end - start + 1);
  responseHeaders['Content-Type'] = getMimeNameFromExt(path.extname(filename));
  responseHeaders['Accept-Ranges'] = 'bytes';
  responseHeaders['Cache-Control'] = 'no-cache';
 
  // Return the 206 'Partial Content'.
  sendResponse(response, 206, 
    responseHeaders, fs.createReadStream(filename, { start: start, end: end }));
}

Copy after login

测试实现

我们怎么来测试我们的代码呢？就像在介绍中提到的，部分正文最常用的场景是流和播放视频。所以我们创建了一个ID为mainPlayer并包含一个标签的

<!DOCTYPE html>
<html>
  <head>
    <script type="text/javascript">
 
      function onLoad() {
        var sec = parseInt(document.location.search.substr(1));
         
        if (!isNaN(sec))
          mainPlayer.currentTime = sec;
      }
     
    </script>
    <title>Partial Content Demonstration</title>
  </head>
  <body>
    <h3>Partial Content Demonstration</h3>
    <hr />
    <video id="mainPlayer" width="640" height="360" 
      autoplay="autoplay" controls="controls" onloadedmetadata="onLoad()">
      <source src="dota2/techies.mp4" />
    </video>
  </body>
</html>

Copy after login

现在我们把页面保存为"player.html"并和"dota2/techies.mp4"一起放在initFolder目录下。然后在浏览器中打开URL：http://localhost:8000/player.html

在Chrome中看起来像这样：

2015623105803917.png (680×535)

因为在URL中没有任何参数，文件将从最开始出播放。

接下来就是有趣的部分了。让我们试着打开这个然后看看发生了什么：http://localhost:8000/player.html?60

2015623105918021.png (680×535)

如果你按F12来打开Chrome的开发者工具，切换到网络标签页，然后点击查看最近一次日志的详细信息。你会发现范围的头信息(Range)被你的浏览器发送了：

Range:bytes=225084502-

Copy after login

Funny, right? When the function onLoad() changes the currentTime property, the browser calculates the byte position at 60 seconds into the video. Because mainPlayer has been preloaded with metadata, including format, bitrate, and other basic information, this starting position is obtained immediately. The browser can then download and play the video without requesting the first 60 seconds. Success!

Conclusion

We have used Node.js to implement an HTTP server that supports partial text. We also tested with HTML5 pages. But this is just the beginning. If you have a thorough understanding of header information and workflow, you can try to implement it using other frameworks like ASP.NET MVC or WCF services. But don't forget to launch Task Manager to view CPU and memory usage. As we discussed earlier, the server does not return the remaining bytes used in a single response. Finding the performance balance will be an important task.

Tutorial to implement HTTP 206 content fragmentation using Node.js_node.js