Betting the Farm on MongoDB
This is a guest post by Jon Dokulil, VP of Engineering at Hudl. Hudls CTO, Brian Kaiser, will be speaking at MongoDB World about migrating from SQL Server to MongoDB Hudl helps coaches win. We give sports teams from peewee to the pros onli
This is a guest post by Jon Dokulil, VP of Engineering at Hudl. Hudl’s CTO, Brian Kaiser, will be speaking at MongoDB World about migrating from SQL Server to MongoDB
Hudl helps coaches win. We give sports teams from peewee to the pros online tools to make working with and analyzing video easy. Today we store well over 600 million video clips in MongoDB spread across seven shards. Our clips dataset has grown to over 350GB of data with over 70GB of indexes. From our first year of a dozen beta high schools we’ve grown to service the video needs of over 50,000 sports teams worldwide.
Why MongoDB
When we began hacking away on Hudl we chose SQL Server as our database. Our backend is written primarily in C#, so it was a natural choice. After a few years and solid company growth we realized SQL Server was quickly becoming a bottleneck. Because we run in EC2, vertically scaling our DB was not a great option. That’s when we began to look at NoSQL seriously and specifically MongoDB. We wanted something that was fast, flexible and developer-friendly.
After comparing a few alternative NoSQL databases and running our own benchmarks, we settled on MongoDB. Then came the task of moving our existing data from SQL Server to MongoDB. Video clips were not only our biggest dataset, it was also our most frequently-accessed data. During our busy season we average 75 clip views per second but peak at over 800 per second. We wanted to migrate the dataset with zero downtime and zero data loss. We also wanted to have fail-safes ready during each step of the process so we could recover immediately from any unanticipated problems during the migration.
In this post we’ll take a look at our schema design choices, our migration plan and the performance we’ve seen with MongoDB.
Schema Design
In SQL Server we normalized our data model. Pulling together data from multiple tables is SQL’s bread-and-butter. In the NoSQL world joins are not an option and we knew that simply moving the SQL tables directly over to MongoDB and doing joins in code was a bad idea. So, we looked at how our application interacted with SQL and created an optimized schema in MongoDB.
Before I get into the schema we chose, I’ll try to provide context to Hudl’s product. Below is a screenshot of our ‘Library’ page. This is where coaches spend much of their time reviewing and analyzing video.
You see above a video playing and a kind of spreadsheet underneath. The video represents one angle of one clip (many of our teams film two or three angles each game). The spreadsheet contains rows of clips and columns of breakdown data. The breakdown data gives context to what happened in the clip. For example, the second clip was a defensive play from the 30 yard line. It was first and ten and was a run play to the left. This breakdown data is incredibly important for coaches to spot patterns and trends in their opponents play (as well as make sure they don’t have an obvious patterns that could be used against them).
When we translated this schema to MongoDB we wanted to optimize for the most-common operations. Watching video clips and editing clip metadata are our two highest frequency operations. To maximize performance we made a few important decisions.
- We chose to encapsulate an entire clip per document. Watching a clip would involve a single document lookup. Because MongoDB stores each document contiguously on disk, it would minimize the number of disk seeks when fetching a clip not in memory, which means faster clip loads.
- We denormalized our column names to speed up both writes and reads. Writes are faster because we no longer have to lookup or track Column IDs. A write operation is as simple as:
Reads are also faster because we no longer have to join on the ClipDataColumn table to get the column names. This comes at a cost of greater storage and memory requirements as we store the same column names in multiple documents. Despite that, we felt the performance benefits were worth the cost.
db.clips.update({teamId:205, _id:123}, {$set: {'data.PLAY TYPE':'Pass'}})
로그인 후 복사
One of the most important considerations when designing a schema in MongoDB is choosing a shard key. Have a good shard key is critical for effective horizontal scaling. Data is stored in shards (each shard is a replica set) and we can add new shards easily as our dataset grows. Replica sets don’t need to know about each other, they are only concerned with their own data. The MongoDB Router (mongos) is the piece that sees the whole picture. It knows which shard houses each document.
When you perform a query against a sharded collection, the shard key is not required. However, there is a cost penalty for not providing the shard key. The key is used to know which shard contains the answer to your query. Without it, the query has to be sent to all shards in your cluster. To illustrate this, I’ve got a four shard cluster. The shard key is TeamId (the property is named ‘t’), and you can see that clips belonging to teams 1-100 live on Shard 1, 101-200 live on Shard 2, etc. Given the query to find clip ‘123’, only Shard 3 will respond with results, but Shards 1, 2 and 4 must also process and execute the query. This is known as a scatter/gather query. In low volume this is ok, but you won’t see the benefits of horizontal scalability if every query has to be sent to all shards. Only when the shard key is provided can the query be sent directly to Shard 3. This is known as a targeted query.
For our Clips collection, we chose TeamId as our shard key. We looked at a few different possible shard keys:
- We considered sharding by clipId (_id) but decided against it because we let coaches organize clips into playlists (similar to a song playlist in iTunes or Spotify). While queries to all clips in a playlist are less common than grabbing an individual clip, they are common enough that we wanted it to use a targeted query.
- We also considered sharding by the playlist Id, but we wanted the ability for clips to be a part of multiple playlists. The shard key, once set, is immutable. Clips can be added or removed from playlists at any time.
- We finally settled on TeamId. TeamId is easily available to us when making the vast majority of our queries to the Clips collection. Only for a few infrequent operations would we need to use scatter/gather queries.
The Transition
As I mentioned, we needed to transition from SQL Server to MongoDB with zero downtime. In case anything went wrong, we needed fallbacks and fail-safes along the way. Our approach was two-fold. In the background we ran a process that ‘fork-lifted’ data from SQL Server to MongoDB. While that ran in the background, we created a multiplexed DAO (data access object, our db abstraction layer) that would only read from SQL but would write to both SQL and MongoDB. That allowed us to batch-move all clips without having to worry about stale data. Once the two databases were completely synced up, we switched over to perform all reads from MongoDB. We continued to dual-write so we could easily switch back to SQL Server if problems arose. After we felt confident in our MongoDB solution, we pulled the plug on SQL Server.
In step one we took a look at how we read and wrote clip data. That let us design an optimal MongoDB schema. We then refactored our existing database abstraction layer to use data-structures that matched the MongoDB schema. This gave us a chance to prove out the schema ahead of time.
Next we began sending write operations to both SQL and MongoDB. This was an important step because it allowed our data fork-lifting process work through all clips one after another while protecting us from data corruption.
The data fork-lifting process took about a week to complete. The time was due to both the large size of the dataset and our own throttling logic. We throttled the rate of data migration to minimize the impact on normal operations. We didn’t want coaches to feel any pain during this migration.
After the data fork-lift was complete we began the process of reading from MongoDB. We built in the ability to progressively send more and more read traffic to MongoDB. That allowed us to gain confidence in our code and the MongoDB cluster without having to switch all-at-once. After a while with dual writes but all MongoDB reads, we turned off dual writes and dropped the tables in SQL Server. It was both a scary moment (sure, we had backups… but still!) and very satisfying. Our SQL database size was reduced by over 80GB. Of that total amount, 20GB was index data, which means our memory footprint was also greatly reduced.
Performance
We have been thrilled with the performance of MongoDB. MongoDB exceeded our average performance goal of 100ms and, just as important, is consistently performant. While it’s good to keep an eye on average times, it’s more important to watch the 90th and 99th percentile performance metrics. With MongoDB our average clip load time is around 18ms and our 99th percentile times are typically at or under 100ms.
Clip load times during the same time period during season
Conclusion
Our transition from SQL Server to MongoDB started with our largest and most critical dataset. After having gone through it, we are very happy with the performance and scalability of MongoDB and appreciate how developer-friendly it is to work with. Moving from a relational to a NoSQL database naturally has a learning curve. Now that we are over it we feel very good about our ability to scale well into the future. Perhaps most telling of all, most new feature development at Hudl is done in MongoDB. We feel MongoDB lets us focus more on writing features to help coaches win and less time crafting database scripts.
原文地址:Betting the Farm on MongoDB, 感谢原作者分享。

핫 AI 도구

Undresser.AI Undress
사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover
사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool
무료로 이미지를 벗다

Clothoff.io
AI 옷 제거제

Video Face Swap
완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

인기 기사

뜨거운 도구

메모장++7.3.1
사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전
중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기
강력한 PHP 통합 개발 환경

드림위버 CS6
시각적 웹 개발 도구

SublimeText3 Mac 버전
신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제











.NET 4.0은 다양한 애플리케이션을 만드는 데 사용되며 객체 지향 프로그래밍, 유연성, 강력한 아키텍처, 클라우드 컴퓨팅 통합, 성능 최적화, 광범위한 라이브러리, 보안, 확장성, 데이터 액세스 및 모바일을 포함한 풍부한 기능을 애플리케이션 개발자에게 제공합니다. 개발 지원.

이 기사는 데비안 시스템에서 MongoDB를 구성하여 자동 확장을 달성하는 방법을 소개합니다. 주요 단계에는 MongoDB 복제 세트 및 디스크 공간 모니터링 설정이 포함됩니다. 1. MongoDB 설치 먼저 MongoDB가 데비안 시스템에 설치되어 있는지 확인하십시오. 다음 명령을 사용하여 설치하십시오. sudoaptupdatesudoaptinstall-imongb-org 2. MongoDB Replica 세트 MongoDB Replica 세트 구성은 자동 용량 확장을 달성하기위한 기초 인 고 가용성 및 데이터 중복성을 보장합니다. MongoDB 서비스 시작 : sudosystemctlstartMongodsudosys

이 기사는 데비안 시스템에서 고도로 사용 가능한 MongoDB 데이터베이스를 구축하는 방법에 대해 설명합니다. 우리는 데이터 보안 및 서비스가 계속 운영되도록하는 여러 가지 방법을 모색 할 것입니다. 주요 전략 : ReplicaSet : ReplicaSet : 복제품을 사용하여 데이터 중복성 및 자동 장애 조치를 달성합니다. 마스터 노드가 실패하면 복제 세트는 서비스의 지속적인 가용성을 보장하기 위해 새 마스터 노드를 자동으로 선택합니다. 데이터 백업 및 복구 : MongoDump 명령을 정기적으로 사용하여 데이터베이스를 백업하고 데이터 손실의 위험을 처리하기 위해 효과적인 복구 전략을 공식화합니다. 모니터링 및 경보 : 모니터링 도구 (예 : Prometheus, Grafana) 배포 MongoDB의 실행 상태를 실시간으로 모니터링하고

해시 값으로 저장되기 때문에 MongoDB 비밀번호를 Navicat을 통해 직접 보는 것은 불가능합니다. 분실 된 비밀번호 검색 방법 : 1. 비밀번호 재설정; 2. 구성 파일 확인 (해시 값이 포함될 수 있음); 3. 코드를 점검하십시오 (암호 하드 코드 메일).

MongoDB 및 Relational Database : 심층 비교이 기사는 NOSQL 데이터베이스 MongoDB와 전통적인 관계형 데이터베이스 (예 : MySQL 및 SQLServer)의 차이점을 심층적으로 탐구합니다. 관계형 데이터베이스는 행 및 열의 테이블 구조를 사용하여 데이터를 구성하는 반면 MongoDB는 유연한 문서 지향 모델을 사용하여 최신 응용 프로그램의 요구에 더 잘 어울립니다. 주로 데이터 구조를 차별화합니다. 관계형 데이터베이스는 사전 정의 된 스키마 테이블을 사용하여 데이터를 저장하고 기본 키와 외부 키를 통해 테이블 간의 관계가 설정됩니다. MongoDB는 JSON과 같은 BSON 문서를 사용하여 컬렉션에 저장하며 각 문서 구조는 패턴없는 설계를 달성하기 위해 독립적으로 변경할 수 있습니다. 건축 설계 : 관계형 데이터베이스는 사전 정의 된 고정 스키마가 필요합니다. MongoDB는 지원합니다

CentOS 시스템 하에서 MongoDB 효율적인 백업 전략에 대한 자세한 설명이 기사는 CentOS 시스템에서 MongoDB 백업을 구현하기위한 다양한 전략을 자세히 소개하여 데이터 보안 및 비즈니스 연속성을 보장 할 것입니다. Docker 컨테이너 환경에서 수동 백업, 시간이 정해진 백업, 자동 스크립트 백업 및 백업 메소드를 다루고 백업 파일 관리를위한 모범 사례를 제공합니다. 수동 백업 : MongoDump 명령을 사용하여 Manual 전체 백업을 수행하십시오 (예 : Mongodump-HlocalHost : 27017-U username-P password-d 데이터베이스 이름 -o/백업 디렉토리이 명령은 지정된 데이터베이스의 데이터 및 메타 데이터를 지정된 백업 디렉토리로 내보내게됩니다.

전자 상거래 웹 사이트를 개발할 때 어려운 문제가 발생했습니다. 사용자에게 개인화 된 제품 권장 사항을 제공하는 방법. 처음에는 간단한 권장 알고리즘을 시도했지만 결과는 이상적이지 않았으며 사용자 만족도에도 영향을 미쳤습니다. 추천 시스템의 정확성과 효율성을 향상시키기 위해보다 전문적인 솔루션을 채택하기로 결정했습니다. 마지막으로 Composer를 통해 Andres-Montanez/Residations-Bundle을 설치하여 문제를 해결했을뿐만 아니라 추천 시스템의 성능을 크게 향상 시켰습니다. 다음 주소를 통해 작곡가를 배울 수 있습니다.

Pinetwork는 혁신적인 모바일 뱅킹 플랫폼 인 Pibank를 출시하려고합니다! Pinetwork는 오늘 Pibank라고 불리는 Elmahrosa (Face) Pimisrbank에 대한 주요 업데이트를 발표했습니다. Pibank는 Pinetwork Cryptocurrency 기능을 완벽하게 통합하여 화폐 통화 및 암호 화폐의 원자 교환을 실현합니다 (US Dollar, Indones rupiah, indensian rupiah and with rupiah and and indensian rupiah and rupiah and and Indones rupiah and rupiahh and rupiah and rupiah and rupiah and rupiah and rupiah and rupiah and rupiah cherrenciance) ). Pibank의 매력은 무엇입니까? 알아 보자! Pibank의 주요 기능 : 은행 계좌 및 암호 화폐 자산의 원 스톱 관리. 실시간 거래를 지원하고 생물학을 채택하십시오
