Table of Contents
1. Introduction
2. Backend Infrastructure
3. Why you need Vitess
3.1 Master-Slave Replica
3.2 Sharding
3.3 Disaster Management
4.Vitess: A system for horizontal expansion of MySQL database cluster
5. Deploy to the cloud
6.CDN
7. Data storage: How does YouTube store such a huge amount of data?
7.1 Plug-and-play commercial servers
7.2 Storage Disks Designed for Data Centers
Home Backend Development Python Tutorial How does YouTube save huge video files?

How does YouTube save huge video files?

Apr 10, 2023 am 11:21 AM
document video storage

Hello everyone, I am Bucai Chen~

YouTube is the second most popular website after Google. In May 2019, more than 500 hours of video content were uploaded to the platform every minute.

The video sharing platform has more than 2 billion users, with more than 1 billion hours of video played every day, generating billions of views. These are incredible numbers.

This article will provide an in-depth explanation of the database and back-end data infrastructure used by YouTube, which allows the video platform to store such huge amounts of data and scale to billions of users.

Then let’s get started.

1. Introduction

The YouTube journey began in 2005. As the venture capital-funded technology startup continued to find success, it was acquired by Google in November 2006 for $1.65 billion.

Before being acquired by Google, their team consisted of the following people:

  • Two system administrators
  • Two scalability software architects
  • Two Feature Developers
  • Two Network Engineers
  • One DBA

2. Backend Infrastructure

YouTube The backend microservices are written in Python, database, hardware, Java (using the Guice framework) and Go. The user interface is written using JavaScript.

The main database is MySQL supported by Vitess. Vitess is a database cluster system used for horizontal expansion of MySQL. In addition, use Memcache for caching and Zookeeper for node coordination.

How does YouTube save huge video files?

Popular videos are served through a CDN, while general, less-played videos are fetched from a database.

When each video is uploaded, it will be given a unique identifier and will be processed by a batch job. This job will run multiple automated processes, such as generating thumbnails, metadata, and videos. Scripting, coding, setting monetization status, and more.

VP9 & H.264/MPEG-4 AVC Advanced Video Coding codecs are used for video compression and are capable of encoding HD and 4K quality video using half the bandwidth of other codecs.

Video streaming uses Dynamic Adaptive Streaming based on the HTTP protocol, which is an adaptive bitrate streaming technology that can achieve high-quality video streaming from a traditional HTTP web server. Video streaming. With this technology, content can be served to viewers at different bitrates. The YouTube client automatically adapts video rendering to the viewer's internet connection speed to minimize buffering times.

I once discussed YouTube's video transcoding process in a dedicated article, see "How YouTube provides high-quality videos with low latency".

So, here is a quick introduction to the back-end technology of the platform. The main database used by YouTube is MySQL. Now, let’s find out why YouTube’s engineering team felt the need to write Vitess? What problems did they face with their original MySQL environment that led them to implement an additional framework on top of it?

3. Why you need Vitess

The website initially has only one database instance. As the website grows, developers have to horizontally expand the database in order to meet the increasing QPS (queries per second) requirements.

3.1 Master-Slave Replica

The replica will be added to the master database instance. Read requests are routed to the primary database and replicas to reduce the load on the primary database. Adding replicas helps alleviate bottlenecks, increase read throughput, and increase the durability of the system.

The master node handles write traffic, and the master node and replica node handle read traffic at the same time.

How does YouTube save huge video files?

However, in this scenario, it is possible to read stale data from the replica. If a request reads the replica's data before the master updates the information to the replica, the viewer will get stale data.

At this time, the data of the primary node and the replica node are inconsistent. In this case, the inconsistent data is the number of views of a specific video on the primary and replica nodes.

Actually, this is no problem at all. Viewers won’t mind a slight inconsistency in view counts, right? What's more, the video can be rendered in their browser.

The data between the master node and the replica node will eventually be consistent.

So the engineers were very happy and the audience was also very happy. With the introduction of replicas, things are progressing smoothly.

The website continues to be popular and QPS continues to rise. The master-slave replica strategy is now having difficulty keeping up with the growth of website traffic.

What should we do now?

3.2 Sharding

The next strategy is to shard the database. Sharding is one of the ways to extend relational databases in addition to master-slave replicas, master-master replicas, federations, and de-normalization.

Database sharding is not a simple process. It greatly increases the complexity of the system and makes management more difficult.

However, the database must be sharded to meet the growth of QPS. After developers shard the database, the data is spread across multiple machines. This increases the write throughput of the system. Now, instead of just one master instance handling writes, write operations can occur across multiple sharded machines.

At the same time, a separate copy is created for each machine for redundancy and throughput.

The popularity of the platform continues to rise, with large amounts of data being added to the database by content creators.

In order to prevent data loss or service unavailability caused by machine failure or unknown external events, it is necessary to add disaster management functions to the system.

3.3 Disaster Management

Disaster management refers to emergency measures in the face of power outages and natural disasters (such as earthquakes and fires). It needs to be redundant and back up user data to data centers in different geographical areas of the world. Loss of user data or service unavailability is not permitted.

Having multiple data centers around the world also helps YouTube reduce system latency, as user requests are routed to the nearest data center instead of being routed to origin servers located on different continents.

Now, you can imagine how complex the infrastructure can become.

Often unoptimized full table scans cause the entire database to crash. Databases must be protected from bad queries. All servers need to be tracked to ensure efficient service.

Developers need a system that abstracts the complexity of the system, allows them to solve scalability challenges, and manages the system with minimal cost. All this led YouTube to develop Vitess.

4.Vitess: A system for horizontal expansion of MySQL database cluster

Vitess is a database cluster system running on MySQL that enables MySQL to expand horizontally. It has built-in sharding features that allow developers to scale the database without having to add any sharding logic to the application. This is similar to what NoSQL does.

How does YouTube save huge video files?

#Vitess also handles failover and backup automatically. It manages servers and improves database performance by intelligently rewriting resource-intensive queries and implementing caching. In addition to YouTube, the framework is also used by other well-known players in the industry, such as GitHub, Slack, Square, New Relic, etc.

Vitess comes into play when you need support for ACID transactions and strong consistency, and at the same time want to quickly scale a relational database like a NoSQL database.

At YouTube, each MySQL connection has a 2MB overhead. Each connection has a calculated cost, and as the number of connections increases, additional RAM must be added.

Vitess is able to manage these connections at a very low cost through a connection pool built on the Go programming language’s concurrency support. It uses Zookeeper to manage the cluster and keep it up to date.

5. Deploy to the cloud

Vitess is cloud native and is well suited for cloud deployment because, like the cloud model, capacity is gradually added to the database. It can run as a Kubernetes-aware, cloud-native distributed database.

At YouTube, Vitess runs in a containerized environment and uses Kubernetes as the container orchestration tool.

In today's computing era, every large-scale service runs in the cloud in a distributed environment. There are many benefits to running services in the cloud.

Google Cloud Platform is a set of cloud computing services based on the same infrastructure used by Google's internal end-user products such as Google Search and YouTube.

Every large-scale online service has a polyglot persistence architecture because no one data model, whether relational or NoSQL, can handle all usage scenarios of the service.

In research for this article, I was unable to find a list of specific Google Cloud databases used by YouTube, but I am pretty sure it uses GCP-specific products such as Google Cloud Spanner, Cloud SQL, Cloud Datastore , Memorystore, etc. to run different features of the service.

This article details the databases used by other Google services, such as Google Adwords, Google Finance, Google Trends, etc.

6.CDN

YouTube uses Google’s global network for low-latency, low-cost content delivery. With globally distributed POP edge points, it enables customers to obtain data faster without having to fetch it from the origin server.

So, so far, I have talked about the databases, frameworks and technologies used by YouTube. Now, it's time to talk about storage.

How does YouTube store such a huge amount of data (500 hours of video content uploaded every minute)?

7. Data storage: How does YouTube store such a huge amount of data?

Videos will be stored on hard drives in Google data centers. This data is managed by Google File System and BigTable.

GFS Google File System is a distributed file system developed by Google for managing large-scale data in distributed environments.

BigTable is a low-latency distributed data storage system built on the Google File System, used to process petabytes of data distributed across thousands of machines. It is used in more than 60 Google products.

Therefore, the video is stored on the hard drive. Relationships, metadata, user preferences, profile information, account settings, related data needed to get the video from storage, etc. are all stored in MySQL.

How does YouTube save huge video files?

7.1 Plug-and-play commercial servers

Google data centers have homogeneous hardware and software is built in-house , managing thousands of independent server clusters.

The servers deployed by Google can enhance the storage capabilities of the data center. They are all commercial servers (commodity servers), also known as commercial off-the-shelf servers (commercial off-the-shelf servers). These servers are low-priced, widely available and purchased in large quantities, and can replace or configure the same hardware in the data center at minimal cost and expense.

As the need for additional storage increases, new commodity servers will be plugged into the system.

After problems occur, commercial servers are often replaced instead of repaired. They are not custom-made, and using them allows businesses to reduce infrastructure costs to a significant extent compared to running custom-made servers.

7.2 Storage Disks Designed for Data Centers

YouTube requires over a petabyte of new storage every day. Spinning hard drives are the primary storage medium due to their low cost and high reliability.

SSD Solid-state drives have higher performance than spinning disks because they are based on semiconductors, but using them on a large scale is not cost-effective.

They are quite expensive and prone to losing data over time. This makes them unsuitable for storage of archived data.

In addition, Google is developing a new series of disks suitable for large-scale data centers.

There are five key metrics that can be used to judge the quality of hardware built for data storage:

  • The hardware should be capable of supporting high-speed input and output operations on the order of seconds.
  • It should comply with the security standards specified by the organization.
  • It should have higher storage capacity compared to ordinary storage hardware.
  • Hardware purchase costs, electricity costs and maintenance costs should all be acceptable.
  • The disk should be reliable and latency stable.

The above is the detailed content of How does YouTube save huge video files?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Is it infringing to post other people's videos on Douyin? How does it edit videos without infringement? Is it infringing to post other people's videos on Douyin? How does it edit videos without infringement? Mar 21, 2024 pm 05:57 PM

With the rise of short video platforms, Douyin has become an indispensable part of everyone's daily life. On TikTok, we can see interesting videos from all over the world. Some people like to post other people’s videos, which raises a question: Is Douyin infringing upon posting other people’s videos? This article will discuss this issue and tell you how to edit videos without infringement and how to avoid infringement issues. 1. Is it infringing upon Douyin’s posting of other people’s videos? According to the provisions of my country's Copyright Law, unauthorized use of the copyright owner's works without the permission of the copyright owner is an infringement. Therefore, posting other people’s videos on Douyin without the permission of the original author or copyright owner is an infringement. 2. How to edit a video without infringement? 1. Use of public domain or licensed content: Public

What to do if the 0x80004005 error code appears. The editor will teach you how to solve the 0x80004005 error code. What to do if the 0x80004005 error code appears. The editor will teach you how to solve the 0x80004005 error code. Mar 21, 2024 pm 09:17 PM

When deleting or decompressing a folder on your computer, sometimes a prompt dialog box "Error 0x80004005: Unspecified Error" will pop up. How should you solve this situation? There are actually many reasons why the error code 0x80004005 is prompted, but most of them are caused by viruses. We can re-register the dll to solve the problem. Below, the editor will explain to you the experience of handling the 0x80004005 error code. Some users are prompted with error code 0X80004005 when using their computers. The 0x80004005 error is mainly caused by the computer not correctly registering certain dynamic link library files, or by a firewall that does not allow HTTPS connections between the computer and the Internet. So how about

How to make money from posting videos on Douyin? How can a newbie make money on Douyin? How to make money from posting videos on Douyin? How can a newbie make money on Douyin? Mar 21, 2024 pm 08:17 PM

Douyin, the national short video platform, not only allows us to enjoy a variety of interesting and novel short videos in our free time, but also gives us a stage to show ourselves and realize our values. So, how to make money by posting videos on Douyin? This article will answer this question in detail and help you make more money on TikTok. 1. How to make money from posting videos on Douyin? After posting a video and gaining a certain amount of views on Douyin, you will have the opportunity to participate in the advertising sharing plan. This income method is one of the most familiar to Douyin users and is also the main source of income for many creators. Douyin decides whether to provide advertising sharing opportunities based on various factors such as account weight, video content, and audience feedback. The TikTok platform allows viewers to support their favorite creators by sending gifts,

How to transfer files from Quark Cloud Disk to Baidu Cloud Disk? How to transfer files from Quark Cloud Disk to Baidu Cloud Disk? Mar 14, 2024 pm 02:07 PM

Quark Netdisk and Baidu Netdisk are currently the most commonly used Netdisk software for storing files. If you want to save the files in Quark Netdisk to Baidu Netdisk, how do you do it? In this issue, the editor has compiled the tutorial steps for transferring files from Quark Network Disk computer to Baidu Network Disk. Let’s take a look at how to operate it. How to save Quark network disk files to Baidu network disk? To transfer files from Quark Network Disk to Baidu Network Disk, you first need to download the required files from Quark Network Disk, then select the target folder in the Baidu Network Disk client and open it. Then, drag and drop the files downloaded from Quark Cloud Disk into the folder opened by the Baidu Cloud Disk client, or use the upload function to add the files to Baidu Cloud Disk. Make sure to check whether the file was successfully transferred in Baidu Cloud Disk after the upload is completed. That's it

How to publish Xiaohongshu video works? What should I pay attention to when posting videos? How to publish Xiaohongshu video works? What should I pay attention to when posting videos? Mar 23, 2024 pm 08:50 PM

With the rise of short video platforms, Xiaohongshu has become a platform for many people to share their lives, express themselves, and gain traffic. On this platform, publishing video works is a very popular way of interaction. So, how to publish Xiaohongshu video works? 1. How to publish Xiaohongshu video works? First, make sure you have a video content ready to share. You can use your mobile phone or other camera equipment to shoot, but you need to pay attention to the image quality and sound clarity. 2. Edit the video: In order to make the work more attractive, you can edit the video. You can use professional video editing software, such as Douyin, Kuaishou, etc., to add filters, music, subtitles and other elements. 3. Choose a cover: The cover is the key to attracting users to click. Choose a clear and interesting picture as the cover to attract users to click on it.

What is hiberfil.sys file? Can hiberfil.sys be deleted? What is hiberfil.sys file? Can hiberfil.sys be deleted? Mar 15, 2024 am 09:49 AM

Recently, many netizens have asked the editor, what is the file hiberfil.sys? Can hiberfil.sys take up a lot of C drive space and be deleted? The editor can tell you that the hiberfil.sys file can be deleted. Let’s take a look at the details below. hiberfil.sys is a hidden file in the Windows system and also a system hibernation file. It is usually stored in the root directory of the C drive, and its size is equivalent to the size of the system's installed memory. This file is used when the computer is hibernated and contains the memory data of the current system so that it can be quickly restored to the previous state during recovery. Since its size is equal to the memory capacity, it may take up a larger amount of hard drive space. hiber

How to post videos on Weibo without compressing the image quality_How to post videos on Weibo without compressing the image quality How to post videos on Weibo without compressing the image quality_How to post videos on Weibo without compressing the image quality Mar 30, 2024 pm 12:26 PM

1. First open Weibo on your mobile phone and click [Me] in the lower right corner (as shown in the picture). 2. Then click [Gear] in the upper right corner to open settings (as shown in the picture). 3. Then find and open [General Settings] (as shown in the picture). 4. Then enter the [Video Follow] option (as shown in the picture). 5. Then open the [Video Upload Resolution] setting (as shown in the picture). 6. Finally, select [Original Image Quality] to avoid compression (as shown in the picture).

Two solutions for sharing edge browser web videos with no sound Two solutions for sharing edge browser web videos with no sound Mar 14, 2024 pm 02:22 PM

Many users like to watch videos on the browser. If there is no sound when watching web videos on the edge browser, how to solve the problem? This problem is not difficult. Next, let me tell you how to fix the problem of no sound in edge browser web videos. There is no sound in edge browser web videos? Method 1: 1. First, check the top tab of the edge browser. 2. There is a "Sound Button" on the left side of the tab, make sure it is not muted. Method 2: 1. If it is confirmed that the sound is not muted, it may be a sound setting problem. 2. You can right-click the sound device in the lower right corner and select "Open Volume Synthesizer" 3. Open

See all articles