Real-time Profiling a MongoDB Cluster-MySQL-Tutorial-php.cn

by A. Jesse Jiryu Davis, Python Evangelist at 10gen In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations

by A. Jesse Jiryu Davis, Python Evangelist at 10gen

In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.

Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.

Setup

You’ll need a recent install of MongoDB (I’m using 2.4.4), Python, a recent version of PyMongo (at least 2.4—I’m using 2.5.2) and the code in my cluster-profile repository on GitHub. If you install the Colorama Python package you’ll get cute colored output. These scripts were tested on my Mac.

Sharded cluster of replica sets

Run the cluster_setup.py script in my repository. It sets up a standard sharded cluster for you running on your local machine. There’s a mongos, three config servers, and two shards, each of which is a three-member replica set. The first shard’s replica set is running on ports 4000 through 4002, the second shard is on ports 5000 through 5002, and the three config servers are on ports 6000 through 6002:

The setup

For the finale, cluster_setup.py makes a collection named sharded_collection, sharded on a key named shard_key.

In a normal deployment, we’d let MongoDB’s balancer automatically distribute chunks of data among our two shards. But for this demo we want documents to be on predictable shards, so my script disables the balancer. It makes a chunk for all documents with shard_key less than 500 and another chunk for documents with shard_key greater than or equal to 500. It moves the high chunk to replset_1:

client = MongoClient()  # Connect to mongos.
admin = client.admin  # admin database.

Nach dem Login kopieren

Pre-split.

admin.command(
    'split', 'test.sharded_collection',
    middle={'shard_key': 500})
admin.command(
    'moveChunk', 'test.sharded_collection',
    find={'shard_key': 500},
    to='replset_1')

Nach dem Login kopieren

If you connect to mongos with the MongoDB shell, sh.status() shows there’s one chunk on each of the two shards:

{ "shard_key" : { "$minKey" : 1 } } -->> { "shard_key" : 500 } on : replset_0 { "t" : 2, "i" : 1 }
{ "shard_key" : 500 } -->> { "shard_key" : { "$maxKey" : 1 } } on : replset_1 { "t" : 2, "i" : 0 }

Nach dem Login kopieren

The setup script also inserts a document with a shard_key of 0 and another with a shard_key of 500. Now we’re ready for some profiling.

Profiling

Run the tail_profile.py script from my repository. It connects to all the replica set members. On each, it sets the profiling level to 2 (“log everything”) on the test database, and creates a tailable cursor on the system.profile collection. The script filters out some noise in the profile collection—for example, the activities of the tailable cursor show up in the system.profile collection that it’s tailing. Any legitimate entries in the profile are spat out to the console in pretty colors.

Experiments

Targeted queries versus scatter-gather

Let’s run a query from Python in a separate terminal:

>>> from pymongo import MongoClient
>>> # Connect to mongos.
>>> collection = MongoClient().test.sharded_collection
>>> collection.find_one({'shard_key': 0})
{'_id': ObjectId('51bb6f1cca1ce958c89b348a'), 'shard_key': 0}

Nach dem Login kopieren

tail_profile.py prints:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}

The query includes the shard key, so mongos reads from the shard that can satisfy it. Adding shards can scale out your throughput on a query like this. What about a query that doesn’t contain the shard key?:

>>> collection.find_one({})

Nach dem Login kopieren

mongos sends the query to both shards:

replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
replset_1 primary on 5000: query test.sharded_collection {“shard_key”: 500}

For fan-out queries like this, adding more shards won’t scale out your query throughput as well as it would for targeted queries, because every shard has to process every query. But we can scale throughput on queries like these by reading from secondaries.

Queries with read preferences

We can use read preferences to read from secondaries:

>>> from pymongo.read_preferences import ReadPreference
>>> collection.find_one({}, read_preference=ReadPreference.SECONDARY)

Nach dem Login kopieren

tail_profile.py shows us that mongos chose a random secondary from each shard:

replset_0 secondary on 4001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
replset_1 secondary on 5001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}

Note how PyMongo passes the read preference to mongos in the query, as the $readPreference field. mongos targets one secondary in each of the two replica sets.

Updates

With a sharded collection, updates must either include the shard key or be “multi-updates”. An update with the shard key goes to the proper shard, of course:

>>> collection.update({'shard_key': -100}, {'$set': {'field': 'value'}})

Nach dem Login kopieren

replset_0 primary on 4000: update test.sharded_collection {“shard_key”: -100}

mongos only sends the update to replset_0, because we put the chunk of documents with shard_key less than 500 there.

A multi-update hits all shards:

>>> collection.update({}, {'$set': {'field': 'value'}}, multi=True)

Nach dem Login kopieren

replset_0 primary on 4000: update test.sharded_collection {}
replset_1 primary on 5000: update test.sharded_collection {}

A multi-update on a range of the shard key need only involve the proper shard:

>>> collection.update({'shard_key': {'$gt': 1000}}, {'$set': {'field': 'value'}}, multi=True)

Nach dem Login kopieren

replset_1 primary on 5000: update test.sharded_collection {“shard_key”: {“$gt”: 1000}}

So targeted updates that include the shard key can be scaled out by adding shards. Even multi-updates can be scaled out if they include a range of the shard key, but multi-updates without the shard key won’t benefit from extra shards.

Commands

In version 2.4, mongos can use secondaries not only for queries, but also for some commands. You can run count on secondaries if you pass the right read preference:

>>> cursor = collection.find(read_preference=ReadPreference.SECONDARY)
>>> cursor.count()

Nach dem Login kopieren

replset_0 secondary on 4001: command count: sharded_collection
replset_1 secondary on 5001: command count: sharded_collection

Whereas findAndModify, since it modifies data, is run on the primaries no matter your read preference:

>>> db = MongoClient().test
>>> test.command(
...     'findAndModify',
...     'sharded_collection',
...     query={'shard_key': -1},
...     remove=True,
...     read_preference=ReadPreference.SECONDARY)

Nach dem Login kopieren

replset_0 primary on 4000: command findAndModify: sharded_collection

Go Forth And Scale

To scale a sharded cluster, you should understand how operations are distributed: are they scatter-gather, or targeted to one shard? Do they run on primaries or secondaries? If you set up a cluster and test your queries interactively like we did here, you can see how your cluster behaves in practice, and design your application for future growth.

Read Jesse’s blog, Emptysquare and follow him on Github

原文地址：Real-time Profiling a MongoDB Cluster, 感谢原作者分享。

Erklärung dieser Website

Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn

Heiße KI -Werkzeuge

Undresser.AI Undress

KI-gestützte App zum Erstellen realistischer Aktfotos

AI Clothes Remover

Online-KI-Tool zum Entfernen von Kleidung aus Fotos.

Undress AI Tool

Ausziehbilder kostenlos

Clothoff.io

KI-Kleiderentferner

AI Hentai Generator

Erstellen Sie kostenlos Ai Hentai.

Heißer Artikel

R.E.P.O. Energiekristalle erklärten und was sie tun (gelber Kristall)

4 Wochen vor By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Beste grafische Einstellungen

4 Wochen vor By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle -Lösung

2 Wochen vor By DDD

R.E.P.O. So reparieren Sie Audio, wenn Sie niemanden hören können

4 Wochen vor By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat -Befehle und wie man sie benutzt

4 Wochen vor By 尊渡假赌尊渡假赌尊渡假赌

Heiße Werkzeuge

Notepad++7.3.1

Einfach zu bedienender und kostenloser Code-Editor

SublimeText3 chinesische Version

Chinesische Version, sehr einfach zu bedienen

Senden Sie Studio 13.0.1

Leistungsstarke integrierte PHP-Entwicklungsumgebung

Dreamweaver CS6

Visuelle Webentwicklungstools

SublimeText3 Mac-Version

Codebearbeitungssoftware auf Gottesniveau (SublimeText3)

Heiße Themen

Wo ist der Login-Zugang für Gmail-E-Mail?

7522

CakePHP-Tutorial

1378

Wie lautet das Format des Kontonamens von Steam?

Win11 -Aktivierungsschlüssel dauerhaft

NYT -Verbindungen Hinweise und Antworten

Related knowledge

Was tun, wenn Navicat abläuft? Apr 23, 2024 pm 12:12 PM

Zu den Lösungen zur Behebung von Navicat-Ablaufproblemen gehören: Erneuern der Lizenz; Deaktivieren der automatischen Updates; Wenden Sie sich an den Navicat-Kundendienst.

So verbinden Sie Navicat mit Mongodb Apr 24, 2024 am 11:27 AM

Um mit Navicat eine Verbindung zu MongoDB herzustellen, müssen Sie: Navicat installieren. Eine MongoDB-Verbindung erstellen: a. Geben Sie den Verbindungsnamen, die Hostadresse und den Port ein. b. Geben Sie die Authentifizierungsinformationen ein (falls erforderlich). Überprüfen Sie die Verbindung Speichern Sie die Verbindung

Was nützt net4.0? May 10, 2024 am 01:09 AM

.NET 4.0 wird zum Erstellen einer Vielzahl von Anwendungen verwendet und bietet Anwendungsentwicklern umfangreiche Funktionen, darunter objektorientierte Programmierung, Flexibilität, leistungsstarke Architektur, Cloud-Computing-Integration, Leistungsoptimierung, umfangreiche Bibliotheken, Sicherheit, Skalierbarkeit, Datenzugriff und Mobilgeräte Entwicklungsunterstützung.

Integration von Java-Funktionen und Datenbanken in serverlose Architektur Apr 28, 2024 am 08:57 AM

In einer serverlosen Architektur können Java-Funktionen in die Datenbank integriert werden, um auf Daten in der Datenbank zuzugreifen und diese zu bearbeiten. Zu den wichtigsten Schritten gehören: Erstellen von Java-Funktionen, Konfigurieren von Umgebungsvariablen, Bereitstellen von Funktionen und Testen von Funktionen. Durch Befolgen dieser Schritte können Entwickler komplexe Anwendungen erstellen, die nahtlos auf in Datenbanken gespeicherte Daten zugreifen.

So konfigurieren Sie die automatische Expansion von MongoDB auf Debian Apr 02, 2025 am 07:36 AM

In diesem Artikel wird vorgestellt, wie MongoDB im Debian -System konfiguriert wird, um eine automatische Expansion zu erzielen. Die Hauptschritte umfassen das Einrichten der MongoDB -Replikat -Set und die Überwachung des Speicherplatzes. 1. MongoDB Installation Erstens stellen Sie sicher, dass MongoDB im Debian -System installiert ist. Installieren Sie den folgenden Befehl: sudoaptupdatesudoaptinstall-emongoDB-org 2. Konfigurieren von MongoDB Replika-Set MongoDB Replikate sorgt für eine hohe Verfügbarkeit und Datenreduktion, was die Grundlage für die Erreichung der automatischen Kapazitätserweiterung darstellt. Start MongoDB Service: SudosystemctlstartMongodsudosysys

Wie Sie eine hohe Verfügbarkeit von MongoDB bei Debian gewährleisten Apr 02, 2025 am 07:21 AM

In diesem Artikel wird beschrieben, wie man eine hoch verfügbare MongoDB -Datenbank für ein Debian -System erstellt. Wir werden mehrere Möglichkeiten untersuchen, um sicherzustellen, dass die Datensicherheit und -Dienste weiter funktionieren. Schlüsselstrategie: ReplicaSet: Replicaset: Verwenden Sie Replikaten, um Datenreduktion und automatisches Failover zu erreichen. Wenn ein Master -Knoten fehlschlägt, wählt der Replikate -Set automatisch einen neuen Masterknoten, um die kontinuierliche Verfügbarkeit des Dienstes zu gewährleisten. Datensicherung und Wiederherstellung: Verwenden Sie den Befehl mongodump regelmäßig, um die Datenbank zu sichern und effektive Wiederherstellungsstrategien zu formulieren, um das Risiko eines Datenverlusts zu behandeln. Überwachung und Alarme: Überwachungsinstrumente (wie Prometheus, Grafana) bereitstellen, um den laufenden Status von MongoDB in Echtzeit zu überwachen, und

So verbinden Sie NodeJS mit der Datenbank Apr 21, 2024 am 06:16 AM

Um eine Verbindung zur Datenbank herzustellen, stellt Node.js mehrere Datenbank-Connector-Pakete für MySQL, PostgreSQL, MongoDB und Redis bereit. Die Verbindungsschritte umfassen: 1. Installieren Sie das entsprechende Connector-Paket. 2. Erstellen Sie einen Verbindungspool, um wiederverwendbare Verbindungen aufrechtzuerhalten. 3. Stellen Sie eine Verbindung mit der Datenbank her. Hinweis: Der Vorgang ist asynchron und Fehler müssen behandelt werden, um die Sicherheit zu gewährleisten und die Leistung zu optimieren.

Kann Navicat eine Verbindung zu Mongodb herstellen? Apr 23, 2024 pm 05:15 PM

Ja, Navicat kann eine Verbindung zur MongoDB-Datenbank herstellen. Zu den spezifischen Schritten gehören: Öffnen Sie Navicat und erstellen Sie eine neue Verbindung. Wählen Sie den Datenbanktyp MongoDB aus. Geben Sie die MongoDB-Hostadresse, den Port und den Datenbanknamen ein. Geben Sie Ihren MongoDB-Benutzernamen und Ihr Passwort ein (falls erforderlich). Klicken Sie auf die Schaltfläche „Verbinden“.

See all articles