New Hash-based Sharding Feature in MongoDB 2.4-MySQL 튜토리얼-php.cn

Hash-based sharding in an existing collection

Sharding a new collection

Manual chunk assignment and other caveats

집

데이터 베이스

MySQL 튜토리얼

New Hash-based Sharding Feature in MongoDB 2.4

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:30 PM

new sharding

Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collection’s documents. For certain common workloads though, like key/value lookup, using the natural choice of _id as a shard key isn’t optimal because default ObjectId’s are ascending, resulting in poor write distribution. ?Creating randomized _ids or choosing another well-distributed field is always possible, but this adds complexity to an app and is another place where something could go wrong.

To help keep these simple workloads simple, in 2.4 MongoDB added the new Hash-based shard key feature. ?The idea behind Hash-based shard keys is that MongoDB will do the work to randomize data distribution for you, based on whatever kind of document identifier you like. ?So long as the identifier has a high cardinality, the documents in your collection will be spread evenly across the shards of your cluster. ?For heavy workloads with lots of individual document writes or reads (e.g. key/value), this is usually the best choice. ?For workloads where getting ranges of documents is more important (i.e. find recent documents from all users), other choices of shard key may be better suited.

Hash-based sharding in an existing collection

To start off with Hash-based sharding, you need the name of the collection you’d like to shard and the name of the hashed “identifier" field for the documents in the collection. ?For example, we might want to create a sharded “mydb.webcrawler" collection, where each document is usually found by a “url" field. ?We can populate the collection with sample data using:

shell$ wget http://en.wikipedia.org/wiki/Web_crawler -O web_crawler.html
shell$ mongo 
connecting to: /test
> use mydb
switched to db mydb
> cat("web_crawler.html").split("\n").forEach( function(line){
... var regex = /a href="http://blog.mongodb.org/post/\""([^\"]*)\"/; if (regex.test(line)) { db.webcrawler.insert({ "url" : regex.exec(line)[1] }); }})
> db.webcrawler.find()
...
{ "_id" : ObjectId("5162fba3ad5a8e56b7b36020"), "url" : "/wiki/OWASP" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603d"), "url" : "/wiki/Image_retrieval" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603e"), "url" : "/wiki/Video_search_engine" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603f"), "url" : "/wiki/Enterprise_search" }
{ "_id" : ObjectId("5162fba3ad5a8e56b7b36040"), "url" : "/wiki/Semantic_search" }
...

로그인 후 복사

Just for this example, we multiply this data ~x2000 (otherwise we won’t get any pre-splitting in the collection because it’s too small):

> for (var i = 0; i 
<p><span>Next, we create a hashed index on this field:</span></p>
<pre class="brush:php;toolbar:false">> db.webcrawler.ensureIndex({ url : "hashed" })

로그인 후 복사

As usual, the creation of the hashed index doesn’t prevent other types of indices from being created as well.

Then we shard the “mydb.webcrawler" collection using the same field as a Hash-based shard key:

> db.printShardingStatus(true)
--- Sharding Status ---
sharding version: {
  "_id" : 1,
  "version" : 3,
  "minCompatibleVersion" : 3,
  "currentVersion" : 4,
  "clusterId" : ObjectId("5163032a622c051263c7b8ce")
}
shards:
  {  "_id" : "test-rs0",  "host" : "test-rs0/nuwen:31100,nuwen:31101" }
  {  "_id" : "test-rs1",  "host" : "test-rs1/nuwen:31200,nuwen:31201" }
  {  "_id" : "test-rs2",  "host" : "test-rs2/nuwen:31300,nuwen:31301" }
  {  "_id" : "test-rs3",  "host" : "test-rs3/nuwen:31400,nuwen:31401" }
databases:
  {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
  {  "_id" : "mydb",  "partitioned" : true,  "primary" : "test-rs0" }
      mydb.webcrawler
          shard key: { "url" : "hashed" }
          chunks:
              test-rs0    4
          { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-4837773290201122847") } on : test-rs0 { "t" : 1, "i" : 3 }
          { "url" : NumberLong("-4837773290201122847") } -->> { "url" : NumberLong("-2329535691089872938") } on : test-rs0 { "t" : 1, "i" : 4 }
          { "url" : NumberLong("-2329535691089872938") } -->> { "url" : NumberLong("3244151849123193853") } on : test-rs0 { "t" : 1, "i" : 1 }
          { "url" : NumberLong("3244151849123193853") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs0 { "t" : 1, "i" : 2 }

로그인 후 복사

you can see that the chunk boundaries are 64-bit integers (generated by hashing the “url" field). ?When inserts or queries target particular urls, the query can get routed using the url hash to the correct chunk.

Sharding a new collection

Above we’ve sharded an existing collection, which will result in all the chunks of a collection initially living on the same shard. ?The balancer takes care of moving the chunks around, as usual, until we get an even distribution of data.

Much of the time though, it’s better to shard the collection before we add our data - this way MongoDB doesn’t have to worry about moving around existing data. ?Users of sharded collections are familiar with pre-splitting - where empty chunks can be quickly balanced around a cluster before data is added. ?When sharding a new collection using Hash-based shard keys, MongoDB will take care of the presplitting for you. Similarly sized ranges of the Hash-based key are distributed to each existing shard, which means that no initial balancing is needed (unless of course new shards are added).

Let’s see what happens when we shard a new collection webcrawler_empty the same way:

> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
> db.webcrawler_empty.ensureIndex({ url : "hashed" })
> sh.shardCollection("mydb.webcrawler_empty", { url : "hashed" })
{ "collectionsharded" : "mydb.webcrawler_empty", "ok" : 1 }
> db.printShardingStatus(true)
--- Sharding Status ---
...
      mydb.webcrawler_empty
          shard key: { "url" : "hashed" }
          chunks:
              test-rs0    2
              test-rs1    2
              test-rs2    2
              test-rs3    2
          { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-6917529027641081850") } on : test-rs0 { "t" : 4, "i" : 2 }
          { "url" : NumberLong("-6917529027641081850") } -->> { "url" : NumberLong("-4611686018427387900") } on : test-rs0 { "t" : 4, "i" : 3 }
          { "url" : NumberLong("-4611686018427387900") } -->> { "url" : NumberLong("-2305843009213693950") } on : test-rs1 { "t" : 4, "i" : 4 }
          { "url" : NumberLong("-2305843009213693950") } -->> { "url" : NumberLong(0) } on : test-rs1 { "t" : 4, "i" : 5 }
          { "url" : NumberLong(0) } -->> { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }
          { "url" : NumberLong("2305843009213693950") } -->> { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }
          { "url" : NumberLong("4611686018427387900") } -->> { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }
          { "url" : NumberLong("6917529027641081850") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }

로그인 후 복사

As you can see, the new empty collection is already well-distributed and ready to use. ?Be aware though - any balancing currently in progress can interfere with moving the empty initial chunks off the initial shard, balancing will take priority (hence the initial stopBalancer step). Like before, eventually the balancer will distribute all empty chunks anyway, but if you are preparing for a immediate data load it’s probably best to stop the balancer beforehand.

That’s it - you now have a pre-split collection on four shards using Hash-based shard keys. ?Queries and updates on exact urls go to randomized shards and are balanced across the cluster:

> db.webcrawler_empty.find({ url: "/wiki/OWASP" }).explain()
{
  "clusteredType" : "ParallelSort",
  "shards" : {
      "test-rs2/nuwen:31300,nuwen:31301" : [ ... ]
...

로그인 후 복사

However, the trade-off with Hash-based shard keys is that ranged queries and multi-updates must hit all shards:

> db.webcrawler_empty.find({ url: /^\/wiki\/OWASP/ }).explain()
{
  "clusteredType" : "ParallelSort",
  "shards" : {
      "test-rs0/nuwen:31100,nuwen:31101" : [ ... ],
     "test-rs1/nuwen:31200,nuwen:31201" : [ ... ],
     "test-rs2/nuwen:31300,nuwen:31301" : [ ... ],
     "test-rs3/nuwen:31400,nuwen:31401" : [ ... ]
...

로그인 후 복사

…

Manual chunk assignment and other caveats

The core benefits of the new Hash-based shard keys are:

Easy setup of randomized shard key
Automated pre-splitting of empty collections
Better distribution of chunks on shards for isolated document writes and reads

The standard split and moveChunk functions do work with Hash-based shard keys, so it’s still possible to balance your collection’s chunks in any way you like. ?However, the usual “find” mechanism used to select chunks can behave a bit unexpectedly since the specifier is a document which is hashed to get the containing chunk. ?To keep things simple, just use the new “bounds” parameter when manually manipulating chunks of hashed collections (or all collections, if you prefer):

> use admin
> db.runCommand({ split : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }] })
> db.runCommand({ moveChunk : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }], to : "test-rs3" })

로그인 후 복사

There are a few other caveats as well - in particular with tag-aware sharding. ?Tag-aware sharding is a feature we released in MongoDB 2.2, which allows you to attach labels to a subset of shards in a cluster. This is valuable for “pinning" collection data to particular shards (which might be hosted on more powerful hardware, for example). ?You can also tag ranges of a collection differently, such that a collection sharded by { “countryCode" : 1 } would have chunks only on servers in that country.

Hash-based shard keys are compatible with tag-aware sharding. ?As in any sharded collection, you may assign chunks to specific shards, but since the chunk ranges are based on the value of the randomized hash of the shard key instead of the shard key itself, this is usually only useful for tagging the whole range to a specific set of shards:

1 2	`> sh.addShardTag("test-rs2",` `"DC1")` `sh.addShardTag("test-rs3",` `"DC1")`

로그인 후 복사

The above commands assign a hypothetical data center tag “DC1” to shards -rs2 and -rs3, which could indicate that -rs2 and -rs3 are in a particular location. ?Then, by running:

1	`> sh.addTagRange("mydb.webcrawler_empty", { url : MinKey }, { url : MaxKey },` `"DC1"` `)`

로그인 후 복사

we indicate to the cluster that the mydb.webcrawler_empty collection should only be stored on “DC1” shards. ?After letting the balancer work:

> db.printShardingStatus(true)
--- Sharding Status ---
...
      mydb.webcrawler_empty
          shard key: { "url" : "hashed" }
          chunks:
              test-rs2    4
              test-rs3    4
          { "url" : { "$minKey" : 1 } } -->> { "url" : NumberLong("-6917529027641081850") } on : test-rs2 { "t" : 5, "i" : 0 }
          { "url" : NumberLong("-6917529027641081850") } -->> { "url" : NumberLong("-4611686018427387900") } on : test-rs3 { "t" : 6, "i" : 0 }
          { "url" : NumberLong("-4611686018427387900") } -->> { "url" : NumberLong("-2305843009213693950") } on : test-rs2 { "t" : 7, "i" : 0 }
          { "url" : NumberLong("-2305843009213693950") } -->> { "url" : NumberLong(0) } on : test-rs3 { "t" : 8, "i" : 0 }
          { "url" : NumberLong(0) } -->> { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }
          { "url" : NumberLong("2305843009213693950") } -->> { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }
          { "url" : NumberLong("4611686018427387900") } -->> { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }
          { "url" : NumberLong("6917529027641081850") } -->> { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }
           tag: DC1  { "url" : { "$minKey" : 1 } } -->> { "url" : { "$maxKey" : 1 } }

로그인 후 복사

Again, it doesn’t usually make a lot of sense to tag anything other than the full hashed shard key collection to particular shards - by design, there’s no real way to know or control what data is in what range.

Finally, remember that Hash-based shard keys can (right now) only distribute documents based on the value of a single field. ?So, continuing the example above, it isn’t directly possible to use “url" + “timestamp" as a Hash-based shard key without storing the combination in a single field in your application, for example:

1	`url_and_ts : { url : <url>, timestamp : <timestamp> }</timestamp></url>`

로그인 후 복사

The sub-document will be hashed as a unit.

If you’re interested in learning more about Hash-based sharding, register for the Hash-based sharding feature demo on May 2.

原文地址：New Hash-based Sharding Feature in MongoDB 2.4, 感谢原作者分享。

본 웹사이트의 성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

AI 옷 제거제

Video Face Swap

완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

뜨거운 도구

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제

자바 튜토리얼

1666

Cakephp 튜토리얼

1425

라라벨 튜토리얼

1328

PHP 튜토리얼

1273

C# 튜토리얼

1253

Related knowledge

Go 언어에서 make와 new의 차이점은 무엇입니까 Jan 09, 2023 am 11:44 AM

차이점: 1. Make는 Slice, Map 및 chan 유형의 데이터를 할당하고 초기화하는 데만 사용할 수 있는 반면 new는 모든 유형의 데이터를 할당할 수 있습니다. 2. 새 할당은 "*Type" 유형인 포인터를 반환하고 make는 유형인 참조를 반환합니다. 3. new에 의해 할당된 공간은 지워집니다. make가 공간을 할당한 후에는 초기화됩니다.

Java에서 new 키워드를 사용하는 방법 May 03, 2023 pm 10:16 PM

1. 개념 Java 언어에서 "new" 표현식은 인스턴스를 생성하는 역할을 하며 생성자가 인스턴스를 초기화하기 위해 호출됩니다. 생성자 자체의 반환 값 유형은 "생성자가 새로 생성된 값을 반환합니다." 개체 참조"이지만 새 표현식의 값은 새로 생성된 개체에 대한 참조입니다. 2. 목적: 새 클래스의 객체를 생성합니다. 3. 작동 메커니즘: 객체 멤버에 대한 메모리 공간을 할당하고, 멤버 변수를 명시적으로 초기화하고, 생성 방법 계산을 수행하고, 참조 값을 자주 반환합니다. 메모리에서 새로운 메모리를 여는 것을 의미합니다. 메모리 공간은 메모리의 힙 영역에 할당되며 jvm에 의해 제어되며 자동으로 메모리를 관리합니다. 여기서는 String 클래스를 예로 사용합니다. 푸

Redis에서 구현하는 데이터 분할(Sharding)에 대한 자세한 설명 Jun 20, 2023 pm 04:52 PM

Redis는 캐싱 및 순위 지정과 같은 애플리케이션 시나리오에 자주 사용되는 고성능 키-값 스토리지 시스템입니다. 데이터의 양이 점점 더 많아지면 단일 시스템의 Redis는 성능 병목 현상에 직면할 수 있습니다. 이때 데이터를 분할하고 여러 Redis 노드에 저장하면 수평 확장을 달성할 수 있습니다. 이것이 Redis의 데이터 분할(Sharding)입니다. Redis 데이터 세분화는 다음 단계를 통해 완료할 수 있습니다. 샤딩 규칙을 설정하려면 먼저 샤딩 규칙을 설정해야 합니다. Redis 샤딩은 키 값을 기반으로 할 수 있습니다.

Oukitel은 저렴한 가격으로 새로운 C50, 견고한 WP39 및 WP50 스마트폰을 출시합니다. Jun 21, 2024 am 07:10 AM

new 연산자는 js에서 어떻게 작동하나요? Feb 19, 2024 am 11:17 AM

js의 new 연산자는 어떻게 작동하나요? 구체적인 코드 예제가 필요합니다. js의 new 연산자는 객체를 생성하는 데 사용되는 키워드입니다. 그 기능은 지정된 생성자를 기반으로 새 인스턴스 개체를 만들고 개체에 대한 참조를 반환하는 것입니다. new 연산자를 사용할 때 실제로 다음 단계가 수행됩니다. 빈 개체의 프로토타입을 생성자의 프로토타입 개체에 지정하고 생성자의 범위를 새 개체에 할당합니다. 객체) 생성자에서 코드를 실행하고 새 객체를 제공합니다.

새로운 후지필름 고정 렌즈 GFX 카메라, 새로운 중형 포맷 센서 선보이며 완전히 새로운 시리즈 시작 가능 Sep 27, 2024 am 06:03 AM

Fujifilm은 필름 시뮬레이션과 소셜 미디어에서의 소형 레인지핑거 스타일 카메라의 인기 덕분에 최근 몇 년 동안 많은 성공을 거두었습니다. 그러나 Fujirumors에 따르면, 현재의 성공에 안주하지 않는 것 같습니다. 당신

new 키워드 대신 Java에서 clone() 메소드를 사용하는 방법은 무엇입니까? Apr 22, 2023 pm 07:55 PM

new 대신 clone() 사용 Java에서 새 객체 인스턴스를 생성하는 가장 일반적인 방법은 new 키워드를 사용하는 것입니다. new에 대한 JDK의 지원은 매우 훌륭합니다. new 키워드를 사용하여 경량 객체를 생성하면 매우 빠릅니다. 그러나 중량이 큰 개체의 경우 개체가 생성자에서 일부 복잡하고 시간이 많이 소요되는 작업을 수행할 수 있으므로 생성자의 실행 시간이 더 길어질 수 있습니다. 결과적으로 시스템은 단기간에 많은 수의 인스턴스를 확보할 수 없습니다. 이 문제를 해결하려면 Object.clone() 메서드를 사용할 수 있습니다. Object.clone() 메서드는 생성자를 우회하고 객체 인스턴스를 빠르게 복사할 수 있습니다. 그러나 기본적으로 clone() 메서드에 의해 생성된 인스턴스는 원본 객체의 단순 복사본일 뿐입니다.

Java에서 새 인스턴스화를 사용하는 방법 May 16, 2023 pm 07:04 PM

1. 개념은 "Java 객체 생성"입니다. ----- 메모리를 할당하고 해당 메모리에 대한 참조를 반환합니다. 2. 참고 사항 (1) Java 키워드 new는 연산자입니다. +, -, *, / 및 기타 연산자와 동일하거나 유사한 우선순위를 갖습니다. (2) Java 객체를 생성하려면 참조 변수 선언, 인스턴스화 및 객체 인스턴스 초기화의 세 단계가 필요합니다. (3) 인스턴스화 전에는 기본적으로 부모 클래스의 매개변수 없는 생성자가 호출됩니다. 즉, 부모 클래스의 객체를 생성해야 합니다. 3. 두 가지 인스턴스화 방법 (1) 객체 이름 = 새 클래스 이름(매개변수 1, 매개변수 2... 매개변수 n), 메소드(); (2) 새 클래스 이름(매개변수 1, 매개변수 2... 매개변수 n). 예제에서는 메소드를 만드는 방법을 보여줍니다. 진짜 물건

See all articles

New Hash-based Sharding Feature in MongoDB 2.4

Hash-based sharding in an existing collection

Sharding a new collection

Manual chunk assignment and other caveats

핫 AI 도구

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

인기 기사

뜨거운 도구

메모장++7.3.1

SublimeText3 중국어 버전

스튜디오 13.0.1 보내기

드림위버 CS6

SublimeText3 Mac 버전

뜨거운 주제