Real-time Profiling a MongoDB Cluster-tutorial mysql-php.cn

by A. Jesse Jiryu Davis, Python Evangelist at 10gen In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations

by A. Jesse Jiryu Davis, Python Evangelist at 10gen

In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.

Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.

Setup

You’ll need a recent install of MongoDB (I’m using 2.4.4), Python, a recent version of PyMongo (at least 2.4—I’m using 2.5.2) and the code in my cluster-profile repository on GitHub. If you install the Colorama Python package you’ll get cute colored output. These scripts were tested on my Mac.

Sharded cluster of replica sets

Run the cluster_setup.py script in my repository. It sets up a standard sharded cluster for you running on your local machine. There’s a mongos, three config servers, and two shards, each of which is a three-member replica set. The first shard’s replica set is running on ports 4000 through 4002, the second shard is on ports 5000 through 5002, and the three config servers are on ports 6000 through 6002:

The setup

For the finale, cluster_setup.py makes a collection named sharded_collection, sharded on a key named shard_key.

In a normal deployment, we’d let MongoDB’s balancer automatically distribute chunks of data among our two shards. But for this demo we want documents to be on predictable shards, so my script disables the balancer. It makes a chunk for all documents with shard_key less than 500 and another chunk for documents with shard_key greater than or equal to 500. It moves the high chunk to replset_1:

client = MongoClient()  # Connect to mongos.
admin = client.admin  # admin database.

Salin selepas log masuk

Pre-split.

admin.command(
    'split', 'test.sharded_collection',
    middle={'shard_key': 500})
admin.command(
    'moveChunk', 'test.sharded_collection',
    find={'shard_key': 500},
    to='replset_1')

Salin selepas log masuk

If you connect to mongos with the MongoDB shell, sh.status() shows there’s one chunk on each of the two shards:

{ "shard_key" : { "$minKey" : 1 } } -->> { "shard_key" : 500 } on : replset_0 { "t" : 2, "i" : 1 }
{ "shard_key" : 500 } -->> { "shard_key" : { "$maxKey" : 1 } } on : replset_1 { "t" : 2, "i" : 0 }

Salin selepas log masuk

The setup script also inserts a document with a shard_key of 0 and another with a shard_key of 500. Now we’re ready for some profiling.

Profiling

Run the tail_profile.py script from my repository. It connects to all the replica set members. On each, it sets the profiling level to 2 (“log everything”) on the test database, and creates a tailable cursor on the system.profile collection. The script filters out some noise in the profile collection—for example, the activities of the tailable cursor show up in the system.profile collection that it’s tailing. Any legitimate entries in the profile are spat out to the console in pretty colors.

Experiments

Targeted queries versus scatter-gather

Let’s run a query from Python in a separate terminal:

>>> from pymongo import MongoClient
>>> # Connect to mongos.
>>> collection = MongoClient().test.sharded_collection
>>> collection.find_one({'shard_key': 0})
{'_id': ObjectId('51bb6f1cca1ce958c89b348a'), 'shard_key': 0}

Salin selepas log masuk

tail_profile.py prints:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}

The query includes the shard key, so mongos reads from the shard that can satisfy it. Adding shards can scale out your throughput on a query like this. What about a query that doesn’t contain the shard key?:

>>> collection.find_one({})

Salin selepas log masuk

mongos sends the query to both shards:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
replset_1 primary on 5000: query test.sharded_collection {“shard_key”: 500}

For fan-out queries like this, adding more shards won’t scale out your query throughput as well as it would for targeted queries, because every shard has to process every query. But we can scale throughput on queries like these by reading from secondaries.

Queries with read preferences

We can use read preferences to read from secondaries:

>>> from pymongo.read_preferences import ReadPreference
>>> collection.find_one({}, read_preference=ReadPreference.SECONDARY)

Salin selepas log masuk

tail_profile.py shows us that mongos chose a random secondary from each shard:

replset_0 secondary on 4001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
replset_1 secondary on 5001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}

Note how PyMongo passes the read preference to mongos in the query, as the $readPreference field. mongos targets one secondary in each of the two replica sets.

Updates

With a sharded collection, updates must either include the shard key or be “multi-updates”. An update with the shard key goes to the proper shard, of course:

>>> collection.update({'shard_key': -100}, {'$set': {'field': 'value'}})

Salin selepas log masuk

replset_0 primary on 4000: update test.sharded_collection {“shard_key”: -100}

mongos only sends the update to replset_0, because we put the chunk of documents with shard_key less than 500 there.

A multi-update hits all shards:

>>> collection.update({}, {'$set': {'field': 'value'}}, multi=True)

Salin selepas log masuk

replset_0 primary on 4000: update test.sharded_collection {}
replset_1 primary on 5000: update test.sharded_collection {}

A multi-update on a range of the shard key need only involve the proper shard:

>>> collection.update({'shard_key': {'$gt': 1000}}, {'$set': {'field': 'value'}}, multi=True)

Salin selepas log masuk

replset_1 primary on 5000: update test.sharded_collection {“shard_key”: {“$gt”: 1000}}

So targeted updates that include the shard key can be scaled out by adding shards. Even multi-updates can be scaled out if they include a range of the shard key, but multi-updates without the shard key won’t benefit from extra shards.

Commands

In version 2.4, mongos can use secondaries not only for queries, but also for some commands. You can run count on secondaries if you pass the right read preference:

>>> cursor = collection.find(read_preference=ReadPreference.SECONDARY)
>>> cursor.count()

Salin selepas log masuk

replset_0 secondary on 4001: command count: sharded_collection
replset_1 secondary on 5001: command count: sharded_collection

Whereas findAndModify, since it modifies data, is run on the primaries no matter your read preference:

>>> db = MongoClient().test
>>> test.command(
...     'findAndModify',
...     'sharded_collection',
...     query={'shard_key': -1},
...     remove=True,
...     read_preference=ReadPreference.SECONDARY)

Salin selepas log masuk

replset_0 primary on 4000: command findAndModify: sharded_collection

Go Forth And Scale

To scale a sharded cluster, you should understand how operations are distributed: are they scatter-gather, or targeted to one shard? Do they run on primaries or secondaries? If you set up a cluster and test your queries interactively like we did here, you can see how your cluster behaves in practice, and design your application for future growth.

Read Jesse’s blog, Emptysquare and follow him on Github

原文地址：Real-time Profiling a MongoDB Cluster, 感谢原作者分享。

Kenyataan Laman Web ini

Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn

Alat AI Hot

Undresser.AI Undress

Apl berkuasa AI untuk mencipta foto bogel yang realistik

AI Clothes Remover

Alat AI dalam talian untuk mengeluarkan pakaian daripada foto.

Undress AI Tool

Gambar buka pakaian secara percuma

Clothoff.io

Penyingkiran pakaian AI

AI Hentai Generator

Menjana ai hentai secara percuma.

Tunjukkan Lagi

Artikel Panas

R.E.P.O. Kristal tenaga dijelaskan dan apa yang mereka lakukan (kristal kuning)

4 minggu yang lalu By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Tetapan grafik terbaik

4 minggu yang lalu By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Penyelesaian Riddle Seashell

2 minggu yang lalu By DDD

R.E.P.O. Cara Memperbaiki Audio Jika anda tidak dapat mendengar sesiapa

4 minggu yang lalu By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: Cara Membuka Segala -galanya Di Myrise

1 bulan yang lalu By 尊渡假赌尊渡假赌尊渡假赌

Tunjukkan Lagi

Alat panas

Notepad++7.3.1

Editor kod yang mudah digunakan dan percuma

SublimeText3 versi Cina

Versi Cina, sangat mudah digunakan

Hantar Studio 13.0.1

Persekitaran pembangunan bersepadu PHP yang berkuasa

Dreamweaver CS6

Alat pembangunan web visual

SublimeText3 versi Mac

Perisian penyuntingan kod peringkat Tuhan (SublimeText3)

Tunjukkan Lagi

Topik panas

Di manakah pintu masuk log masuk untuk e-mel gmail?

7514

Tutorial CakePHP

1378

Apakah format nama akaun stim

kunci pengaktifan win11 kekal

Sambungan NYT menunjukkan dan jawapan

Tunjukkan Lagi

Related knowledge

Apa yang perlu dilakukan jika navicat tamat tempoh Apr 23, 2024 pm 12:12 PM

Penyelesaian untuk menyelesaikan isu tamat tempoh Navicat termasuk: memperbaharui lesen dan menyahpasang semula kemas kini automatik, hubungi Navicat Premium Essentials;

Bagaimana untuk menyambungkan navicat ke mongodb Apr 24, 2024 am 11:27 AM

Untuk menyambung ke MongoDB menggunakan Navicat, anda perlu: Pasang Navicat Buat sambungan MongoDB: a Masukkan nama sambungan, alamat hos dan port b Masukkan maklumat pengesahan (jika perlu) Tambah sijil SSL (jika perlu) Sahkan sambungan Simpan sambungan

Apakah kegunaan net4.0 May 10, 2024 am 01:09 AM

.NET 4.0 digunakan untuk mencipta pelbagai aplikasi dan ia menyediakan pemaju aplikasi dengan ciri yang kaya termasuk: pengaturcaraan berorientasikan objek, fleksibiliti, seni bina berkuasa, penyepaduan pengkomputeran awan, pengoptimuman prestasi, perpustakaan yang luas, keselamatan, Kebolehskalaan, akses data dan mudah alih sokongan pembangunan.

Penyepaduan fungsi dan pangkalan data Java dalam seni bina tanpa pelayan Apr 28, 2024 am 08:57 AM

Dalam seni bina tanpa pelayan, fungsi Java boleh disepadukan dengan pangkalan data untuk mengakses dan memanipulasi data dalam pangkalan data. Langkah utama termasuk: mencipta fungsi Java, mengkonfigurasi pembolehubah persekitaran, menggunakan fungsi dan menguji fungsi. Dengan mengikuti langkah ini, pembangun boleh membina aplikasi kompleks yang mengakses data yang disimpan dalam pangkalan data dengan lancar.

Cara mengkonfigurasi pengembangan automatik MongoDB pada Debian Apr 02, 2025 am 07:36 AM

Artikel ini memperkenalkan cara mengkonfigurasi MongoDB pada sistem Debian untuk mencapai pengembangan automatik. Langkah -langkah utama termasuk menubuhkan set replika MongoDB dan pemantauan ruang cakera. 1. Pemasangan MongoDB Pertama, pastikan MongoDB dipasang pada sistem Debian. Pasang menggunakan arahan berikut: SudoaptDateSudoaptInstall-ImongoDB-Org 2. Mengkonfigurasi set replika replika MongoDB MongoDB Set memastikan ketersediaan dan kelebihan data yang tinggi, yang merupakan asas untuk mencapai pengembangan kapasiti automatik. Mula MongoDB Service: sudosystemctlstartmongodsudosys

Cara Memastikan Ketersediaan MongoDB Tinggi di Debian Apr 02, 2025 am 07:21 AM

Artikel ini menerangkan cara membina pangkalan data MongoDB yang sangat tersedia pada sistem Debian. Kami akan meneroka pelbagai cara untuk memastikan keselamatan data dan perkhidmatan terus beroperasi. Strategi Utama: Replicaset: Replicaset: Gunakan replika untuk mencapai redundansi data dan failover automatik. Apabila nod induk gagal, set replika secara automatik akan memilih nod induk baru untuk memastikan ketersediaan perkhidmatan yang berterusan. Sandaran dan Pemulihan Data: Secara kerap Gunakan perintah Mongodump untuk membuat sandaran pangkalan data dan merumuskan strategi pemulihan yang berkesan untuk menangani risiko kehilangan data. Pemantauan dan penggera: Menyebarkan alat pemantauan (seperti Prometheus, Grafana) untuk memantau status MongoDB dalam masa nyata, dan

Bagaimana untuk menyambungkan nodejs ke pangkalan data Apr 21, 2024 am 06:16 AM

Untuk menyambung ke pangkalan data, Node.js menyediakan berbilang pakej penyambung pangkalan data untuk MySQL, PostgreSQL, MongoDB dan Redis. Langkah-langkah sambungan termasuk: 1. Pasang pakej penyambung yang sepadan 2. Buat kumpulan sambungan untuk mengekalkan sambungan yang boleh digunakan semula; Nota: Operasi tidak segerak dan ralat perlu dikendalikan untuk memastikan keselamatan dan mengoptimumkan prestasi.

Bolehkah navicat menyambung ke mongodb? Apr 23, 2024 pm 05:15 PM

Ya, Navicat boleh menyambung ke pangkalan data MongoDB. Langkah khusus termasuk: Buka Navicat dan buat sambungan baharu. Pilih jenis pangkalan data sebagai MongoDB. Masukkan alamat hos MongoDB, port dan nama pangkalan data. Masukkan nama pengguna dan kata laluan MongoDB anda (jika perlu). Klik butang "Sambung".

See all articles