Home Common Problem How to understand what distributed databases are

How to understand what distributed databases are

Oct 30, 2020 pm 02:24 PM
Distributed database

Distributed databases include: 1. Elasticsearch database, which can exist on a single node or multiple nodes; 2. Redis database, which supports rich data types; 3. Mongodb database, which can obtain data more conveniently; 4. Mysql Distributed cluster, high availability.

How to understand what distributed databases are

Distributed databases include:

1. Elasticsearch database

Course Recommendation→: "Elasticsearch Full Text Search Practical Combat" (Practical Video)

From the course"Ten Million Level Data Concurrency Solution ( Theory and practice)》

1. Introduction to Elasticsearch

Distributed real-time file storage, each field is indexed and searchable, distributed real-time analysis and search The engine

can be expanded to hundreds of servers to process PB-level structured or unstructured data

2. Elasticsearch application scenarios

Distributed search engine and data analysis Engine, full-text retrieval, structured retrieval, data analysis

Near-real-time processing of massive data, on-site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, Etc.), data analysis

3. Advantages and disadvantages of Elasticsearch

Disadvantages: no user verification and permission control, no concept of transactions, no rollback support, accidental deletion cannot be restored, requires java Environment.

Advantages: Split your documents into different containers or shards, which can exist on a single node or multiple nodes

Replicate each shard to provide data backup to prevent hardware problems data lost.

Route mutual requests from any node in the cluster to ensure that the data obtained is what you need. When the cluster adds or redistributes shards, the new node will not stop to recover the lost node shard data

4. Elasticsearch persistence solution

gateway represents the persistent storage method of elasticsearch index. By default, elasticsearch stores the index in memory first, and then persists it to the hard disk when the memory is full. . When the elasticsearch cluster is shut down or restarted again, index data will be read from the gateway. Elasticsearch supports multiple types of gateways, including local file systems (default), distributed file systems, Hadoop's HDFS and Amazon's S3 cloud storage service.

ElasticSearch first saves the index content into the memory, and then persists the index to the hard disk when the memory is not enough. At the same time, it also has a queue that automatically writes the index to the hard disk when the system is idle. middle.

2. Redis database

1. Introduction to Redis

redis is an open source BSD licensed advanced key-value storage system (NoSQL) that can be used It is used to store strings, hash structures, linked lists, and sets. Therefore, it is often used to provide data structure services. Redis supports data persistence. It can save the data in the memory to the disk and load it again for use when restarting. It supports simple key-value type data, and also provides storage of data structures such as list, set, zset, and hash. Redis supports data backup, that is, data backup in master-slave mode.

2.Redis application scenario

A) Regular counting: number of fans, number of Weibo

B) User information change

C) Cache processing, As mysql's cache

D) queue system, a prioritized queue system and log collection system

3. Advantages and disadvantages of Redis

Advantages:

(1) It is fast because the data is stored in memory, similar to HashMap. The advantage of HashMap is that the time complexity of search and operation is O(1)

(2) It supports rich data types and supports string, list, set, sorted set, hash

(3) supports transactions and operations are atomic. The so-called atomicity means that all changes to the data are either executed or not executed at all

(4) Rich features: can be used for caching, messages, setting expiration time by key, and will be automatically deleted after expiration

Disadvantages:

(1) Redis does not have automatic fault tolerance and recovery Function, the downtime of the host and slave machines will cause some front-end read and write requests to fail. You need to wait for the machine to restart or manually switch the front-end IP to recover

(2) The host is down, and some data failed before the downtime. Synchronize to the slave machine in time. After switching IP, data inconsistency will be introduced, which reduces the availability of the system.

(3) The master-slave replication of redis adopts full replication. During the replication process, the host will fork a child process. Make a snapshot of the memory and save the memory snapshot of the child process as a file and send it to the slave. This process requires ensuring that the host has enough free memory. If the snapshot file is large, it will have a greater impact on the cluster's service capabilities. Moreover, the replication process will be performed when the slave machine newly joins the cluster or when the slave machine and the host network are disconnected and reconnected. That is to say, network fluctuations will cause the host and host to reconnect. A full data copy between slave machines causes a lot of trouble to the actual system operation

(4) Redis is difficult to support online expansion. When the cluster capacity reaches the upper limit, online expansion will become very complicated. In order to avoid this problem, operation and maintenance personnel must ensure that there is enough space when the system goes online, which causes a great waste of resources.

4. Redis persistence solution

Redis provides two methods for persistence, one is RDB persistence (the principle is to regularly dump the Redis database records in memory to the disk RDB persistence), and the other is AOF (append only file) persistence (the principle is to write Reids' operation log to the file in an appended manner).

RDB persistence refers to writing the snapshot of the data set in the memory to the disk within a specified time interval. The actual operation process is to fork a child process and first write the data set to a temporary file. After the writing is successful, , then replace the previous file and store it with binary compression.​

3. Mongodb database

1. Introduction to Mongodb

MongoDB itself is a non-relational database. Each of its records is a Document, and each Document consists of a set of key-value pairs. Documents in MongoDB are similar to JSON objects. The values ​​of fields in Document may include other Documents, arrays, etc.

2.Mongodb application scenario

The main goal of mongodb is to build on the key/value storage method (providing high performance and high scalability) and the traditional RDBMS system (rich functions) A bridge that combines the best of both worlds. Mongo is suitable for the following scenarios:

a. Website data: Mongo is very suitable for real-time insertion, update and query, and has the replication and high scalability required for real-time data storage on the website.

b. Caching: Due to its high performance, mongo is also suitable as a caching layer for information infrastructure. After the system is restarted, the persistent cache built by mongo can prevent the underlying data source from being overloaded.

c. Large-size, low-value data: It may be more expensive to store some data using traditional relational databases. Before this, many programmers often chose traditional files for storage.

d. High scalability scenario: mongo is very suitable for databases composed of dozens or hundreds of servers.

e. Used for storage of objects and JSON data: mongo’s BSON data format is very suitable for document formatted storage and query.

3. Advantages and disadvantages of Mongodb

Advantages:

(1) Weak consistency (eventual consistency), which can better ensure user access speed

(2) The storage method of document structure can obtain data more conveniently

(3) Built-in GridFS supports large-capacity storage

(4) In use cases, tens of millions of levels For document objects, nearly 10G of data, the query for indexed IDs will not be slower than mysql, while the query for non-indexed fields will win overall.

Disadvantages:

(1) Does not support things

(2) Occupies too much space, causing disk waste

(3) Single machine reliability Relatively poor

(4) Large amounts of data are continuously inserted, and the writing performance fluctuates greatly

4. Mongodb’s persistence solution/exception handling

When performing a write operation , MongoDB creates a journal containing the exact disk location and the changed bytes. Therefore, if the server suddenly crashes, when it starts, journal will replay any write operations that were not flushed to disk before the crash.

The data file is refreshed to the disk every 60s, by default, so the journal only needs to hold the written data within 60s. The journal pre-allocates several empty files for this purpose, located in /data/db/journal, named _j.0, j.1, etc.

When MongoDB runs for a long time, you will see files similar to _j.6217, _j.6218 and _j.6219 in the journal directory. These files are the current journal files, and if MongoDB is running all the time, these numbers will continue to increase. When MongoDB is shut down gracefully, these files will be cleared because these logs are no longer needed during a graceful shutdown.

If the server crashes or kill -9, when mongodb starts again, the journal file will be replayed and lengthy and difficult-to-understand verification lines will be output, indicating normal recovery.

4. Mysql distributed cluster

1. Introduction to Mysql distributed cluster

MySQL cluster is a shared-nothing, A storage solution based on distributed node architecture, which aims to provide fault tolerance and high performance.

Data update uses the read-committed isolation level to ensure the consistency of data on all nodes, and uses the two-phase commit mechanism (two-phasedcommit) to ensure that all nodes have the same data (if any If the write operation fails, the update fails).

Shared-nothing peer nodes make update operations on one server immediately visible on other servers. Propagating updates uses a complex communication mechanism designed to provide high throughput across the network.

Distribute the load through multiple MySQL servers to maximize program performance and ensure high availability and redundancy by storing data in different locations.

2.Mysql distributed cluster application scenario

Solve the problem of mass storage, such as the Mysql distributed cluster used by Jingdong B2B.

Suitable for billions of PV access to DB.

3. Advantages and disadvantages of Mysql distributed cluster

Advantages:

a) High availability

b) Fast automatic failover

c) Flexible distributed architecture, no single point of failure

d) High throughput and low latency

e ) Strong scalability, supports online expansion

Disadvantages:

a) There are many limitations, such as: no support for foreign keys

b) Deployment, management, and configuration are complex

c) It takes up a lot of disk space and memory

d) Backup and recovery are inconvenient

e) When restarting, it takes a long time for the data node to load data into the memory. Time

4. Mysql distributed cluster persistence solution

Load balancing.

Manage node backup.

Related free learning recommendations: mysql video tutorial

The above is the detailed content of How to understand what distributed databases are. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Comparison of distributed database management tools: MySQL vs. TiDB Comparison of distributed database management tools: MySQL vs. TiDB Jul 12, 2023 am 11:57 AM

Comparison of distributed database management tools: MySQL vs. TiDB In today's era of growing data volume and data processing needs, distributed database management systems are increasingly widely used. MySQL and TiDB are two of the distributed database management tools that have attracted much attention. This article will conduct a comprehensive comparison between MySQL and TiDB and explore their characteristics and advantages. MySQL is an open source relational database management system that is widely used in various application scenarios. It has good stability, reliability and success

MySql's distributed database: How to use MySQL to implement a distributed database MySql's distributed database: How to use MySQL to implement a distributed database Jun 15, 2023 pm 06:42 PM

With the continuous development of Internet technology, the use of databases is becoming more and more common. Whether you are a business or an individual, you need to use a database to store and manage data. For large enterprises, using one database alone can no longer meet business needs. At this time, it is necessary to use distributed databases to achieve decentralized storage and management of data. MySQL is one of the most widely used open source databases at present, so how to use MySQL to implement a distributed database? 1. What is a distributed database? A distributed database refers to a database system that is dispersed across

How to build a highly available MySQL cluster using distributed database architecture How to build a highly available MySQL cluster using distributed database architecture Aug 02, 2023 pm 04:29 PM

How to use distributed database architecture to build a highly available MySQL cluster. With the development of the Internet, the demand for high availability and scalability of databases is getting higher and higher. Distributed database architecture has become one of the effective ways to solve these needs. This article will introduce how to use a distributed database architecture to build a highly available MySQL cluster and provide relevant code examples. Building a MySQL master-slave replication cluster MySQL master-slave replication is the basic high-availability solution provided by MySQL. Through master-slave replication, data can be

Configuring Linux systems to support distributed database development Configuring Linux systems to support distributed database development Jul 04, 2023 am 08:24 AM

Configuring Linux systems to support distributed database development Introduction: With the rapid development of the Internet, the amount of data has increased dramatically, and the requirements for database performance and scalability are also getting higher and higher. Distributed databases emerged as a solution to this challenge. This article will introduce how to configure a distributed database environment under Linux system to support distributed database development. 1. Install the Linux system First, we need to install a Linux operating system. Common Linux distributions include Ubuntu, CentOS, D

What are the characteristics of distributed database systems? What are the characteristics of distributed database systems? Sep 05, 2023 pm 05:09 PM

The characteristics of distributed database systems include data consistency, concurrent access, distributed computing, load balancing, scalability, security and reliability, etc. Detailed introduction: 1. Data consistency. A distributed database system stores data through multiple servers, so the consistency of the data is jointly maintained by multiple servers. Each server can store and update data independently, but they must abide by the consistency. Constraints, such as transaction isolation level, data integrity, etc.; 2. Concurrent access, the distributed database system can support multiple users to read and write data at the same time, etc.

MySql replication and clustering: how to implement large-scale distributed databases MySql replication and clustering: how to implement large-scale distributed databases Jun 16, 2023 am 08:04 AM

With the development of business and the gradual increase of data volume, a single database can no longer fully meet the needs, and distributed database systems have become an important solution in the industry. MySQL is currently one of the most popular relational databases, and there are many solutions for using MySQL to build distributed databases. In this article, we will delve into MySQL replication and clustering and how to implement large-scale distributed databases. 1. MySQL’s infrastructure MySQL’s infrastructure mainly consists of three parts: Client

Interaction between Golang functions and distributed databases in distributed systems Interaction between Golang functions and distributed databases in distributed systems Apr 19, 2024 pm 03:06 PM

In a distributed system, Go functions can interact with distributed databases. The specific steps are as follows: Install the necessary dependencies. Use spanner.NewClient function to connect to the database. Use the Query method to execute the query and obtain the iterator. Use the Do method to iterate through the query results and process the data. After the query is completed, use the Close method to close the connection.

What are the implementation methods of distributed database in PHP7.0? What are the implementation methods of distributed database in PHP7.0? May 26, 2023 am 09:12 AM

PHP is a scripting language widely used in web development. It has the advantages of being easy to learn and use, high efficiency, and cross-platform. As the complexity of web applications continues to increase, higher requirements are also put forward for data storage and management. It is difficult for traditional single relational databases to meet these needs, so distributed databases have become the focus of developers. In PHP7.0, there are many ways to implement distributed databases, and we will introduce them one by one below. Table splitting is a common distributed database implementation method, which splits a large table into