How to use flv.js? Comprehensive interpretation of flv.js code-H5 Tutorial-php.cn

First of all, let me state that I don’t know much about JavaScript. I am only familiar with the audio and video processing part. It is inevitable that I will make mistakes. Corrections are welcome.

flv.jsThe code of the project has a certain scale. If you want to study it, I suggest starting with demux. If you understand demux, you will master the key steps of media data processing. The previous media data Downloading and subsequent media data playback becomes easy to understand.

First, let’s spread some background knowledge. Why does HTML5 video playback use flv format?

Because of Flash. My title picture uses "flash RIP". Flash is dying, but its influence is still there. Flash technology has been the basic technology for Internet video in the past 10 years. A large number of related infrastructures are built around Flash, such as CDN. Supported RTMP and flv over http protocols. In order to be compatible with Flash playback on the Web, companies doing Internet live broadcasts invariably choose the flv media format. During the transition period from Flash to HTML5, it would be great if HTML5 could support the flash protocol, which would allow a smooth transition. However, HTML5 does not natively support the flash protocol. The flv.js project solves the problem of HTML5 supporting the flash protocol. This is the historical background of flv.js’ emergence and short-term popularity.

The demux in flv.js is a set of parsers for the FLV media data format. If you want to understand the FLV format, the following documents must be read carefully.
Adobe’s official flv format description
http://www.adobe.com/content/dam/Adobe/en/devnet/flv/pdfs/video_file_format_spec_v10.pdf

flv. How to use js? Let’s get to the point, flv.js code interpretation: demux part

Open the code https://github.com/Bilibili/flv.js/blob/master/src/demux/flv-demuxer.js

 static probe(buffer) {
        let data = new Uint8Array(buffer);
        let mismatch = {match: false};

        if (data[0] !== 0x46 || data[1] !== 0x4C || data[2] !== 0x56 || data[3] !== 0x01) {
            return mismatch;
        }

Copy after login

0x46 0x4c 0x56 These numbers are actually the ASCII codes of 'F' 'L' 'V', which represent the flv file header. The following 0x01 is the version number of flv format. Use this to detect whether the data is in flv format.

let hasAudio = ((data[4] & 4) >>> 2) !== 0;
let hasVideo = (data[4] & 1) !== 0;

Copy after login

Take out the fifth byte. Its sixth and eighth bits indicate whether audio and video data exist respectively. The other bits are reserved bits and can be ignored.

This probe is called by parseChunks. After reading at least 13 bytes, it is judged whether it is a flv data, and then continues the subsequent analysis. Why is it 13? Because the file header of flv is 13 bytes. Refer to "The FLV header" in the PDF above. These 13 bytes include the following four-byte size. This size represents the size of the previous tag. , but since the first tag does not exist in the previous one, the first size is always 0.

The code behind parseChunks is constantly parsing tags. flv calls a piece of media data TAG. Each tag has a different type. In fact, there are only three types actually used, 8, 9, and 18 corresponding to audio, video and Script Data.

 if (tagType !== 8 && tagType !== 9 && tagType !== 18) {
                Log.w(this.TAG, `Unsupported tag type ${tagType}, skipped`);
                // consume the whole tag (skip it)
                offset += 11 + dataSize + 4;
                continue;
            }

Copy after login

This code is judging the tag type. Pay attention to the number 11, because the tag header is 11 bytes, followed by the tag body, so the offset plus these offsets is to jump to the next tag position.

The format of the tag header is: UI represents unsigned int, followed by the number of bits.

UI8 tag type
UI24 data size
UI24 timestamp
UI8 TimestampExtended
UI24 StreamID

Do you see if it is exactly 11 bytes? In order to save traffic, Adobe will never use 32bit if it can be expressed in 24bit, but it still sets an extension bit for timestamp to store the highest byte. This design is very painful, which leads to the following This weird code first takes three bytes, converts them into integers according to Big-Endian, and then puts the fourth byte in the high bits.

let ts2 = v.getUint8(4);
let ts1 = v.getUint8(5);
let ts0 = v.getUint8(6);
let ts3 = v.getUint8(7);
let timestamp = ts0 | (ts1 << 8) | (ts2 << 16) | (ts3 << 24);

Copy after login

After parsing the tag header, different parsing functions are called according to different tag types.

switch (tagType) {
    case 8:  // Audio
        this._parseAudioData(chunk, dataOffset, dataSize, timestamp);
        break;
    case 9:  // Video
        this._parseVideoData(chunk, dataOffset, dataSize, timestamp, byteStart + offset);
        break;
    case 18:  // ScriptDataObject
        this._parseScriptData(chunk, dataOffset, dataSize);
        break;
}

Copy after login

TAG type: 8 audio

The audio structure is relatively simple. The first byte of AUDIODATA indicates the audio format. In fact, it is basically ACC 16bit stereo 44.1kHz sampling, so the most common number is 0xAF, followed by AACAUDIODATA

TAG type: 9 video

The key thing to watch is the video,

let frameType = (spec & 240) >>> 4;
let codecId = spec & 15;

Copy after login

Two important values are taken here. frameType indicates the frame type. 1 is a key frame and 2 is a non-key frame. codeId is the encoding type. Although flv supports six video formats, in fact, only H.264 is actually used for Internet on-demand live broadcasts. So the codecId is basically 7. The author uses decimal numbers here, which are actually bit-wise values. It will be better to understand using hexadecimal numbers.

_parseAVCVideoPacket is used to parse the AVCVIDEOPACKET structure, which is the H.264 video package

let packetType = v.getUint8(0);
let cts = v.getUint32(0, !le) & 0x00FFFFFF;

Copy after login

Explain the concept of CTS, CompositionTime. We got a timestamp in the tag header earlier. This corresponds to DTS in the video, which is the decoding timestamp. CTS is actually an offset, indicating the offset of PTS relative to DTS. , which is the difference between PTS and DTS.

这里有个坑，参考adobe的文档，这是CTS是个有符号的24位整数，SI24，就是说它有可能是个负数，所以我怀疑flv.js解析cts的代码有bug，没有处理负数情况。因为负数的24位整型到32位负数转换的时候要手工处理高位的符号位和补码问题。（我只是怀疑，没有调试确认过，但是我在处理YY直播数据的时候是踩过这个坑的，个别包含 B frame的视频是会出现CTS为负数的情况的）

How to use flv.js? Comprehensive interpretation of flv.js code

packetType有两种，0 表示 AVCDecoderConfigurationRecord，这个是H.264的视频信息头，包含了 sps 和 pps，AVCDecoderConfigurationRecord的格式不是flv定义的，而是264标准定义的，如果用ffmpeg去解码，这个结构可以直接放到 codec的extradata里送给ffmpeg去解释。

flv.js作者选择了自己来解析这个数据结构，也是迫不得已，因为JS环境下没有ffmpeg，解析这个结构主要是为了提取 sps和pps。虽然理论上sps允许有多个，但其实一般就一个。

let config = SPSParser.parseSPS(sps);

Copy after login

pps的信息没什么用，所以作者只实现了sps的分析器，说明作者下了很大功夫去学习264的标准，其中的Golomb解码还是挺复杂的，能解对不容易，我在PC和手机平台都是用ffmpeg去解析的。SPS里面包括了视频分辨率，帧率，profile level等视频重要信息。

packetTtype 为 1 表示 NALU，NALU= network abstract layer unit，这是H.264的概念，网络抽象层数据单元，其实简单理解就是一帧视频数据。

NALU的头有两种标准，一种是用 00 00 00 01四个字节开头这叫 start code，另一个叫mp4风格以Big-endian的四字节size开头，flv用了后一种，而我们在H.264的裸流里常见的是前一种。

TAG type ： 18 Script Data

除了音视频数据外还有 ScriptData，这是一种类似二进制json的对象描述数据格式，JavaScript比较惨只能自己写实现，其它平台可以用 librtmp的代码去做。

我觉得作者处理解决flv播放问题外，也为前端贡献了 amf 解析，sps解析，Golomb解码等基础代码，这些是可以用在其他项目里的。

在用传输协议获取了flv数据流后，用demux分离出音视频数据的属性和数据包，这为后面的播放打下了基础，从demux入手去读代码是个不错的切入点，而且一定要配合 flv file format spec一起看，反复多看几遍争取熟记在心。我现在已经可以从wireshark的抓包数据里人肉分析flv数据包了，对于debug相当有帮助。

如何看待B站 (bilibili) 开源 HTML5 播放器内核 flv.js?

开源代码flv.js的使用说明

The above is the detailed content of How to use flv.js? Comprehensive interpretation of flv.js code. For more information, please follow other related articles on the PHP Chinese website!