Home headlines Several currently commonly used solutions for metadata management

Several currently commonly used solutions for metadata management

Mar 12, 2018 am 09:16 AM
Metadata manage solution

Metadata is defined as: data describing data, descriptive information about data and information resources.

Metadata is data that describes other data (data about other data), or structured data (structured data) used to provide information about a certain resource. Metadata is data that describes objects such as information resources or data. Its purpose of use is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; realize effective discovery of information resources, Finding, integrating organization and effective management of used resources.

There are currently several commonly used solutions for metadata management: central node management metadata, distributed management metadata, and metadata-free design; this article talks about the characteristics of the three solutions:

Several currently commonly used solutions for metadata management

1. Central node management metadata

When designing a distributed (storage) system, using a central node is a very simple and clear solution. The central node usually It has functions such as metadata storage and query, cluster node status management, decision-making and task issuance;

Advantages:

A. Due to its centralized management of metadata, it can be convenient To handle the statistical analysis requirements of cluster operation and maintenance management;

B. The central node records the status information of user data (i.e. metadata). When expanding, you can choose not to perform the rebalance operation (data caused by rebalance Migration may bring huge performance overhead), and can still be addressed normally;

Disadvantages and solutions:

a.Single point of failure is one of the most taboo issues in designing distributed systems , the simple design of the central node also brings this problem. How to implement HA? ; Solution: (1) Use the active-standby model, use synchronous or asynchronous methods for incremental or full data synchronization (such as TFS, mfs, HDFS2.0, etc.), or use remote Shared storage (such as HDFS2.0, remote storage requires high availability);

b. There is an upper limit for performance and capacity expansion, and the centralized central node’s own hardware facilities have an upper limit for expansion (scale up) and query-based addressing. method, leading to this problem; even if the client caches metadata or uses a cache cluster, the upper limit cannot be fundamentally eliminated. In some scenarios (such as massive small files), this problem still exists; Solution: (1) Optimize and upgrade hardware , such as using SSD, large memory and other machines; (2) When faced with this problem, consider using a distributed management metadata solution.

2. Distributed management of metadata

is similar to the central node solution, except that the metadata is fragmented and distributed nodes are used to manage the storage. While retaining the advantages of the central node solution, it solves It solves the problem of upper limit of performance and capacity expansion. At the same time, multiple nodes provide metadata query services at the same time, and the system performance is improved;

Disadvantages

This type of system is relatively rare, and the system itself has a complex structure. It is also difficult to implement;

a. The system contains two relatively independent distributed nodes: metadata nodes and data nodes. They are both stateful nodes. The distributed modules composed of each node must face distribution. The choice of formula CAP principle must be scalable, especially metadata has higher requirements for consistency;

b. Metadata nodes need to jointly maintain the status of data nodes and make decisions when the status changes. Consistent decision-making; these pose great challenges to the design and implementation of the system;

c. In addition, the storage equipment required for a large amount of metadata is also a cost that cannot be ignored;

The above two solutions have the same idea: record and maintain the status of the data (i.e. metadata). When addressing the data, first query the metadata server and then access the actual data;

3. No metadata Design

Mainly takes ceph as an example. Different from the above two ideas, the main idea of ​​this type of system is to use an algorithm to calculate addressing. One of the input parameters of the addressing algorithm is the cluster status (such as data node Some form of description of distribution topology, weight, process status, etc.). Such common algorithms include consistent hashing and the CRUSH algorithm of the Ceph RADOS system. This type of algorithm usually does not directly manage user data, but introduces an intermediate layer of logical sharding structure. (such as the ring fragment of consistent hashing, the placement group of ceph), its granularity is larger, its quantity is limited and relatively fixed, the data accessed by the user belongs to only one of the shards, and the system manages and maintains these shards by managing and maintaining them. User data; some such systems also have central configuration management nodes (such as ceph rados monitor), which only provide management and maintenance of important states such as clusters and shards, and do not provide storage and query of metadata;

Advantages:

A. As mentioned above, the system only needs to manage and maintain information such as logical sharding and cluster status, and does not store and manage metadata of user data. The scalability of the system is greatly enhanced, which is particularly important in scenarios with large amounts of metadata. It is especially obvious when;

B. The amount of parameter data required by the addressing algorithm is small and relatively fixed. The client can achieve the purpose of parallel addressing for several clients through caching, avoiding the addressing performance bottleneck;

Disadvantage analysis:

a. When the cluster is expanded (or even when the weight is changed), rebalance needs to be performed, especially for clusters with large data scale (above PB level). The resulting large amount of data migration will keep the cluster in a high load state. This in turn causes the performance indicators such as latency and iops of normal business requests to decline; however, in some scenarios when performing cluster expansion, rebalance is not desired (for example, the cluster capacity is insufficient). In this regard, a common strategy is to prepare performance and capacity in advance for each cluster. According to the evaluation, when expansion is needed, create a new cluster directly; if a single cluster must be rebalanced, reduce the cluster load through manual intervention and current limiting; as for the fundamental reason for rebalance, I believe that expansion will cause the cluster status to change, which will lead to changes in the addressing algorithm results. The final data distribution also needs to change;

b. The copy distribution position of the data is calculated through the addressing algorithm. The position is relatively fixed and almost cannot be adjusted manually; but the overall data can usually be changed by changing the weight. Distribution;

c. The central configuration management node only manages shard information and does not know the information of individual user data. The requirements for statistical analysis need to be realized by regularly collecting data node information, and storing and maintaining it.

Summary: Through the above comparative analysis, the addressing strategies of the three types of systems make the systems themselves have their own corresponding advantages and disadvantages. They are not perfect, but they all have their suitable scenarios and businesses. In the system When designing and selecting, comprehensive considerations need to be made.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Solution for Win11 unable to install Chinese language pack Solution for Win11 unable to install Chinese language pack Mar 09, 2024 am 09:15 AM

Win11 is the latest operating system launched by Microsoft. Compared with previous versions, Win11 has greatly improved the interface design and user experience. However, some users reported that they encountered the problem of being unable to install the Chinese language pack after installing Win11, which caused trouble for them to use Chinese in the system. This article will provide some solutions to the problem that Win11 cannot install the Chinese language pack to help users use Chinese smoothly. First, we need to understand why the Chinese language pack cannot be installed. Generally speaking, Win11

Reasons and solutions for scipy library installation failure Reasons and solutions for scipy library installation failure Feb 22, 2024 pm 06:27 PM

Reasons and solutions for scipy library installation failure, specific code examples are required When performing scientific calculations in Python, scipy is a very commonly used library, which provides many functions for numerical calculations, optimization, statistics, and signal processing. However, when installing the scipy library, sometimes you encounter some problems, causing the installation to fail. This article will explore the main reasons why scipy library installation fails and provide corresponding solutions. Installation of dependent packages failed. The scipy library depends on some other Python libraries, such as nu.

An effective solution to solve the problem of garbled characters caused by Oracle character set modification An effective solution to solve the problem of garbled characters caused by Oracle character set modification Mar 03, 2024 am 09:57 AM

Title: An effective solution to solve the problem of garbled characters caused by Oracle character set modification. In Oracle database, when the character set is modified, the problem of garbled characters often occurs due to the presence of incompatible characters in the data. In order to solve this problem, we need to adopt some effective solutions. This article will introduce some specific solutions and code examples to solve the problem of garbled characters caused by Oracle character set modification. 1. Export data and reset the character set. First, we can export the data in the database by using the expdp command.

Oracle NVL function common problems and solutions Oracle NVL function common problems and solutions Mar 10, 2024 am 08:42 AM

Common problems and solutions for OracleNVL function Oracle database is a widely used relational database system, and it is often necessary to deal with null values ​​during data processing. In order to deal with the problems caused by null values, Oracle provides the NVL function to handle null values. This article will introduce common problems and solutions of NVL functions, and provide specific code examples. Question 1: Improper usage of NVL function. The basic syntax of NVL function is: NVL(expr1,default_value).

Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Jun 03, 2024 pm 01:25 PM

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Revealing the method to solve PyCharm key failure Revealing the method to solve PyCharm key failure Feb 23, 2024 pm 10:51 PM

PyCharm is a powerful Python integrated development environment that is widely loved by developers. However, sometimes we may encounter key invalidation problems when using PyCharm, resulting in the inability to use the software normally. This article will reveal the solution to PyCharm key failure and provide specific code examples to help readers quickly solve this problem. Before we start solving the problem, we first need to understand why the key is invalid. PyCharm key failure is usually due to network problems or the software itself

Common causes and solutions for Chinese garbled characters in MySQL installation Common causes and solutions for Chinese garbled characters in MySQL installation Mar 02, 2024 am 09:00 AM

Common reasons and solutions for Chinese garbled characters in MySQL installation MySQL is a commonly used relational database management system, but you may encounter the problem of Chinese garbled characters during use, which brings trouble to developers and system administrators. The problem of Chinese garbled characters is mainly caused by incorrect character set settings, inconsistent character sets between the database server and the client, etc. This article will introduce in detail the common causes and solutions of Chinese garbled characters in MySQL installation to help everyone better solve this problem. 1. Common reasons: character set setting

Java framework security vulnerability analysis and solutions Java framework security vulnerability analysis and solutions Jun 04, 2024 pm 06:34 PM

Analysis of Java framework security vulnerabilities shows that XSS, SQL injection and SSRF are common vulnerabilities. Solutions include: using security framework versions, input validation, output encoding, preventing SQL injection, using CSRF protection, disabling unnecessary features, setting security headers. In actual cases, the ApacheStruts2OGNL injection vulnerability can be solved by updating the framework version and using the OGNL expression checking tool.