How to use flv.js? Comprehensive interpretation of flv.js code
First of all, let me state that I don’t know much about JavaScript. I am only familiar with the audio and video processing part. It is inevitable that I will make mistakes. Corrections are welcome.
flv.jsThe code of the project has a certain scale. If you want to study it, I suggest starting with demux. If you understand demux, you will master the key steps of media data processing. The previous media data Downloading and subsequent media data playback becomes easy to understand.
First, let’s spread some background knowledge. Why does HTML5 video playback use flv format?
Because of Flash. My title picture uses "flash RIP". Flash is dying, but its influence is still there. Flash technology has been the basic technology for Internet video in the past 10 years. A large number of related infrastructures are built around Flash, such as CDN. Supported RTMP and flv over http protocols. In order to be compatible with Flash playback on the Web, companies doing Internet live broadcasts invariably choose the flv media format. During the transition period from Flash to HTML5, it would be great if HTML5 could support the flash protocol, which would allow a smooth transition. However, HTML5 does not natively support the flash protocol. The flv.js project solves the problem of HTML5 supporting the flash protocol. This is the historical background of flv.js’ emergence and short-term popularity.
The demux in flv.js is a set of parsers for the FLV media data format. If you want to understand the FLV format, the following documents must be read carefully.
Adobe’s official flv format description
http://www.adobe.com/content/dam/Adobe/en/devnet/flv/pdfs/video_file_format_spec_v10.pdf
flv. How to use js? Let’s get to the point, flv.js code interpretation: demux part
Open the code https://github.com/Bilibili/flv.js/blob/master/src/demux/flv-demuxer.js
static probe(buffer) { let data = new Uint8Array(buffer); let mismatch = {match: false}; if (data[0] !== 0x46 || data[1] !== 0x4C || data[2] !== 0x56 || data[3] !== 0x01) { return mismatch; }
0x46 0x4c 0x56 These numbers are actually the ASCII codes of 'F' 'L' 'V', which represent the flv file header. The following 0x01 is the version number of flv format. Use this to detect whether the data is in flv format.
let hasAudio = ((data[4] & 4) >>> 2) !== 0; let hasVideo = (data[4] & 1) !== 0;
Take out the fifth byte. Its sixth and eighth bits indicate whether audio and video data exist respectively. The other bits are reserved bits and can be ignored.
This probe is called by parseChunks. After reading at least 13 bytes, it is judged whether it is a flv data, and then continues the subsequent analysis. Why is it 13? Because the file header of flv is 13 bytes. Refer to "The FLV header" in the PDF above. These 13 bytes include the following four-byte size. This size represents the size of the previous tag. , but since the first tag does not exist in the previous one, the first size is always 0.
The code behind parseChunks is constantly parsing tags. flv calls a piece of media data TAG. Each tag has a different type. In fact, there are only three types actually used, 8, 9, and 18 corresponding to audio, video and Script Data.
if (tagType !== 8 && tagType !== 9 && tagType !== 18) { Log.w(this.TAG, `Unsupported tag type ${tagType}, skipped`); // consume the whole tag (skip it) offset += 11 + dataSize + 4; continue; }
This code is judging the tag type. Pay attention to the number 11, because the tag header is 11 bytes, followed by the tag body, so the offset plus these offsets is to jump to the next tag position.
The format of the tag header is: UI represents unsigned int, followed by the number of bits.
UI8 tag type
UI24 data size
UI24 timestamp
UI8 TimestampExtended
UI24 StreamID
Do you see if it is exactly 11 bytes? In order to save traffic, Adobe will never use 32bit if it can be expressed in 24bit, but it still sets an extension bit for timestamp to store the highest byte. This design is very painful, which leads to the following This weird code first takes three bytes, converts them into integers according to Big-Endian, and then puts the fourth byte in the high bits.
let ts2 = v.getUint8(4); let ts1 = v.getUint8(5); let ts0 = v.getUint8(6); let ts3 = v.getUint8(7); let timestamp = ts0 | (ts1 << 8) | (ts2 << 16) | (ts3 << 24);
After parsing the tag header, different parsing functions are called according to different tag types.
switch (tagType) { case 8: // Audio this._parseAudioData(chunk, dataOffset, dataSize, timestamp); break; case 9: // Video this._parseVideoData(chunk, dataOffset, dataSize, timestamp, byteStart + offset); break; case 18: // ScriptDataObject this._parseScriptData(chunk, dataOffset, dataSize); break; }
TAG type: 8 audio
The audio structure is relatively simple. The first byte of AUDIODATA indicates the audio format. In fact, it is basically ACC 16bit stereo 44.1kHz sampling, so the most common number is 0xAF, followed by AACAUDIODATA
TAG type: 9 video
The key thing to watch is the video,
let frameType = (spec & 240) >>> 4; let codecId = spec & 15;
Two important values are taken here. frameType indicates the frame type. 1 is a key frame and 2 is a non-key frame. codeId is the encoding type. Although flv supports six video formats, in fact, only H.264 is actually used for Internet on-demand live broadcasts. So the codecId is basically 7. The author uses decimal numbers here, which are actually bit-wise values. It will be better to understand using hexadecimal numbers.
_parseAVCVideoPacket is used to parse the AVCVIDEOPACKET structure, which is the H.264 video package
let packetType = v.getUint8(0); let cts = v.getUint32(0, !le) & 0x00FFFFFF;
Explain the concept of CTS, CompositionTime. We got a timestamp in the tag header earlier. This corresponds to DTS in the video, which is the decoding timestamp. CTS is actually an offset, indicating the offset of PTS relative to DTS. , which is the difference between PTS and DTS.
这里有个坑,参考adobe的文档,这是CTS是个有符号的24位整数,SI24,就是说它有可能是个负数,所以我怀疑flv.js解析cts的代码有bug,没有处理负数情况。因为负数的24位整型到32位负数转换的时候要手工处理高位的符号位和补码问题。(我只是怀疑,没有调试确认过,但是我在处理YY直播数据的时候是踩过这个坑的,个别包含 B frame的视频是会出现CTS为负数的情况的)
packetType有两种,0 表示 AVCDecoderConfigurationRecord,这个是H.264的视频信息头,包含了 sps 和 pps,AVCDecoderConfigurationRecord的格式不是flv定义的,而是264标准定义的,如果用ffmpeg去解码,这个结构可以直接放到 codec的extradata里送给ffmpeg去解释。
flv.js作者选择了自己来解析这个数据结构,也是迫不得已,因为JS环境下没有ffmpeg,解析这个结构主要是为了提取 sps和pps。虽然理论上sps允许有多个,但其实一般就一个。
let config = SPSParser.parseSPS(sps);
pps的信息没什么用,所以作者只实现了sps的分析器,说明作者下了很大功夫去学习264的标准,其中的Golomb解码还是挺复杂的,能解对不容易,我在PC和手机平台都是用ffmpeg去解析的。SPS里面包括了视频分辨率,帧率,profile level等视频重要信息。
packetTtype 为 1 表示 NALU,NALU= network abstract layer unit,这是H.264的概念,网络抽象层数据单元,其实简单理解就是一帧视频数据。
NALU的头有两种标准,一种是用 00 00 00 01四个字节开头这叫 start code,另一个叫mp4风格以Big-endian的四字节size开头,flv用了后一种,而我们在H.264的裸流里常见的是前一种。
TAG type : 18 Script Data
除了音视频数据外还有 ScriptData,这是一种类似二进制json的对象描述数据格式,JavaScript比较惨只能自己写实现,其它平台可以用 librtmp的代码去做。
我觉得作者处理解决flv播放问题外,也为前端贡献了 amf 解析,sps解析,Golomb解码等基础代码,这些是可以用在其他项目里的。
在用传输协议获取了flv数据流后,用demux分离出音视频数据的属性和数据包,这为后面的播放打下了基础,从demux入手去读代码是个不错的切入点,而且一定要配合 flv file format spec一起看,反复多看几遍争取熟记在心。我现在已经可以从wireshark的抓包数据里人肉分析flv数据包了,对于debug相当有帮助。
相关文章:
如何看待B站 (bilibili) 开源 HTML5 播放器内核 flv.js?
The above is the detailed content of How to use flv.js? Comprehensive interpretation of flv.js code. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



How to display file suffix under Win11 system? Detailed explanation: In the Windows 11 operating system, the file suffix refers to the dot after the file name and the characters after it, which is used to indicate the type of file. By default, the Windows 11 system hides the suffix of the file, so that you can only see the name of the file in the file explorer but cannot intuitively understand the file type. However, for some users, displaying file suffixes is necessary because it helps them better identify file types and perform related operations.

With the continuous development of the Internet, people are increasingly inseparable from browsers. In browsers, everyone will use cookies more or less. However, many people don’t know which folder the cookie data is in. Let’s explain it in detail today. First, we need to understand what cookies are. Simply put, a cookie is a piece of text information stored by the browser, which is used to save some of the user's personal settings in the browser or record the user's historical operations, etc. When the user opens the same website again, c

LinuxBashrc is a configuration file in the Linux system, used to set the user's Bash (BourneAgainShell) environment. The Bashrc file stores information such as environment variables and startup scripts required for user login, and can customize the user's Shell environment. In the Linux system, each user has a corresponding Bashrc file, which is located in a hidden folder in the user's home directory. The main functions of the Bashrc file are as follows: setting up the environment

The tokenization of on-chain assets is becoming an important long-term trend with huge prospects. Among them, treasury bond RWA is becoming an important branch. This sector achieved nearly 7-fold growth in 2023. After experiencing a brief decline at the end of 2023, it quickly returned to the upward channel. This BingVentures research article will discuss the current status and important development trends of treasury bond RWA and the entire RWA sector. Current status of RWA ecology In the current market environment, DeFi yields are relatively low and real interest rates are rising, which has promoted the growth of RWA assets such as tokenized treasury bonds. Investors prefer assets with stable, predictable returns, a trend that is particularly evident among investors seeking a balance between financial and cryptocurrency markets. Tokenized Treasury Bonds, etc.

What is CryptoGPT? Why is 3EX’s CryptoGPT said to be a new entrance to the currency circle? According to news on July 5, 3EXAI trading platform officially launched CryptoGPT, an innovative project based on AI technology and big data, aiming to provide comprehensive and intelligent information query and AI investment advice to global crypto investors. CryptoGPT has included the top 200 coins in CoinMarketCap and hundreds of high-quality project party information, and plans to continue to expand. Through CryptoGPT, users can obtain detailed transaction consulting reports and AI investment advice for free, realizing a full-stack closed loop from information consulting services to intelligent strategy creation and automatic execution of transactions. Currently, the service is free. Needed

Interpretation of Java documentation: Usage analysis of the exit() method of the System class. Specific code examples are required. The System class is an important class in Java. It provides many system-related functions and methods. Among them, the exit() method is a common method in the System class, which is used to terminate the currently running Java virtual machine. In this article, we will analyze the usage of the exit() method and give specific code examples. The exit() method is defined as follows: public

Interpretation of Java documentation: Analysis of the function of toHexString() method of Short class In Java programming, we often need to convert and process numerical values. The Short class is a wrapper class in Java, used to process short type data. Among them, the Short class provides a toHexString() method for converting short type data into a string in hexadecimal form. This article will analyze the function of toHexString() method and

HTTP status code is an information feedback mechanism often encountered in web development. It is used to indicate the processing results of HTTP requests. Different status codes represent different meanings and processing methods. However, sometimes we encounter some abnormal status codes, and at this time we need to interpret and solve them. This article will focus on some common HTTP status code exceptions and how to deal with them. 1. 404NotFound404 is one of the most common status codes. It indicates that the requested resource does not exist on the server. this might be
