How to use Python to complete a NoSQL database sample code sharing
黄舟
Release: 2017-07-18 11:12:57
Original
1923 people have browsed it
The term NoSQL is becoming ubiquitous in recent years. But what exactly does "NoSQL" refer to? How and why is it so useful? In this article, we will use pure Python (as I prefer to call it, "Lightly structured pseudocode") Write a NoSQL database to answer these questions.
OldSQL
In many cases, SQL has become a synonym for "database". In fact, SQL has become a synonym for "database". , SQL is an acronym for Strctured Query Language, and does not refer to the database technology itself. Rather, it refers to the database technology from RDBMS (Relationship A language for retrieving data in a type database management system, Relational Database Management System ). MySQL, MS SQL Server and Oracle are all members of RDBMS.
R in RDBMS, That is, "Relational" (relationship, associated), is the richest part. The data is organized through table(table), and each table is composed of type(type) Composed of associated columns. The types of all tables, columns and their classes are called the schema (schema or schema) of the database. The schema is passed through each table's The description information completely describes the structure of the database. For example, a table called Car may have the following columns:
##Make: a string
Model: a string
Year: a four-digit number; alternatively, a date
Color: a string
VIN (Vehicle Identification Number): a string
In a table, each single entry is called a
row ( row), or a record(record). In order to distinguish each record, a primary key is usually defined. The primary key## in the table # is one of the columns that uniquely identifies each row. In the table Car, VIN is a natural primary key choice because it ensures that each car has a unique identifier. Two different rows may There are the same values in the Make, Model, Year and Color columns, but for different cars, there will definitely be different VINs. On the contrary, as long as two rows have the same VIN, we do not have to check other columns to consider this The two rows refer to the same car.Querying
SQL allows us to obtain useful information by
query
on the database. Query Simply put, a query is to ask a question to the RDBMS using a structured language and interpret the rows returned as the answer to the question. Assume that the database represents all registered vehicles in the United States. In order to obtain all Records, we can roughly translate SQL into Chinese by performing the following SQL query on the database:
SELECT Make, Model FROM Car;
Copy after login
:
"SELECT" : "Show me"
"Make, Model" : "The values of Make and Model"
"FROM Car" : "Yes Each row in table Car"
That is, "Show me the values of Make and Model in each row of table Car". After executing the query, we will get some query results , each of which is Make and Model. If we only care about the color of the car registered in 1994, then we can:
SELECT Color FROM Car WHERE Year = 1994;
Copy after login
At this point, we will get a list similar to the following:
Black
Red
Red
White
Blue
Black
White
Yellow
Copy after login
Finally, we can specify a vehicle by using the table's
(primary key) primary key
, here is VIN:
SELECT * FROM Car WHERE VIN = '2134AFGER245267'
Copy after login
The above query statement will return the specified vehicle Attribute information.
The primary key is defined to be unique and non-repeatable. That is, a vehicle with a certain VIN can only appear at most once in the table. This is very important, why? Let’s look at an example. :
Relations
Suppose we are running a car repair business. Among other necessary things, we also need to track the service history of a car, that is, all the services on the car. Trim records. Then we might create a
ServiceHistory
table containing the following columns:
VIN
Make
Model
Year
Color
Service Performed
Mechanic
Price
Date
In this way, every time a vehicle is repaired, we add a new row to the table and write what we did during the service, which repairman it was, how much it cost and the service time, etc.
But wait, we all know that for the same vehicle, all columns related to the vehicle's own information are unchanged. In other words, if I renovate my Black 2014 Lexus RX 350 10 times, even though the information Make, Model, Year and Color will not change, the information will still be recorded repeatedly every time. Compared with the invalid repeated records , a more reasonable approach is to store such information only once and query it when needed.
So what to do? We can create a second table: Vehicle , which has the following columns:
##VIN
Make
Model
Year
Color
In this way, for the
ServiceHistory table, we can simplify it as follows Some columns:
最后还须需要注意的一小点: DATA 字典, 因为这个点并不十分重要, 因而你很可能会遗漏它。 DATA 就是实际用来存储的 key-value pair, 正是它们实际构成了我们的数据库。
Command Parser
下面来看一些 命令解析器 (command parser) , 它负责解释接收到的消息:
def parse_message(data):
"""Return a tuple containing the command, the key, and (optionally) the
value cast to the appropriate type."""
command, key, value, value_type = data.strip().split(';')
if value_type:
if value_type == 'LIST':
value = value.split(',')
elif value_type == 'INT':
value = int(value)
else:
value = str(value)
else:
value = None
return command, key, value
def update_stats(command, success):
"""Update the STATS dict with info about if executing *command* was a
*success*"""
if success:
STATS[command]['success'] += 1
else:
STATS[command]['error'] += 1
def handle_put(key, value):
"""Return a tuple containing True and the message to send back to the
client."""
DATA[key] = value
return (True, 'key [{}] set to [{}]'.format(key, value))
def handle_get(key):
"""Return a tuple containing True if the key exists and the message to send
back to the client"""
if key not in DATA:
return (False, 'Error: Key [{}] not found'.format(key))
else:
return (True, DATA[key])
def handle_putlist(key, value):
"""Return a tuple containing True if the command succeeded and the message
to send back to the client."""
return handle_put(key, value)
def handle_putlist(key, value):
"""Return a tuple containing True if the command succeeded and the message
to send back to the client"""
return handle_put(key, value)
def handle_getlist(key):
"""Return a tuple containing True if the key contained a list and the
message to send back to the client."""
return_value = exists, value = handle_get(key)
if not exists:
return return_value
elif not isinstance(value, list):
return (False, 'ERROR: Key [{}] contains non-list value ([{}])'.format(
key, value))
else:
return return_value
def handle_increment(key):
"""Return a tuple containing True if the key's value could be incremented
and the message to send back to the client."""
return_value = exists, value = handle_get(key)
if not exists:
return return_value
elif not isinstance(list_value, list):
return (False, 'ERROR: Key [{}] contains non-list value ([{}])'.format(
key, value))
else:
DATA[key].append(value)
return (True, 'Key [{}] had value [{}] appended'.format(key, value))
def handle_delete(key):
"""Return a tuple containing True if the key could be deleted and the
message to send back to the client."""
if key not in DATA:
return (
False,
'ERROR: Key [{}] not found and could not be deleted.'.format(key))
else:
del DATA[key]
def handle_stats():
"""Return a tuple containing True and the contents of the STATS dict."""
return (True, str(STATS))
Let’s take a look at handle_append. If we try to call handle_get but the key does not exist, then we simply return the content returned by handle_get. In addition, we also want to be able to reference the tuple returned by handle_get as a separate return value. Then when the key does not exist, we can simply use return return_value .
If it does exist, then we need to check the return value. Moreover, we also hope to be able to reference the return value of handle_get as a separate variable. In order to be able to handle the above two situations, and also consider the situation where the results need to be processed separately, we use multiple assignment. This eliminates the need to write multiple lines of code while keeping the code clear. return_value = exists, list_value = handle_get(key) can explicitly indicate that we are going to reference the return value of handle_get in at least two different ways.
How Is This a Database?
The above program is obviously not an RDBMS, but it can definitely be called a NoSQL database. The reason it's so easy to create is that we don't have any actual interaction with the data. We just do minimal type checking and store whatever the user sends. If we need to store more structured data, we may need to create a schema for the database to store and retrieve the data. Since NoSQL databases are easier to write, easier to maintain, and easier to implement, why don't we just use mongoDB? Of course there is a reason. As the saying goes, there are gains and losses. We need to weigh the searchability of the database based on the data flexibility provided by the NoSQL database.
Querying Data
Suppose we use the NoSQL database above to store the previous Car data. Then we may use VIN as the key and a list as the value of each column, that is,
2134AFGER245267 = ['Lexus', 'RX350', 2013, Black]
. Of course, we have lost The meaning of each index in the list. We only need to know that somewhere index 1 stores the model of the car, and index 2 stores the Year.The bad thing is Now, what happens when we want to execute the previous query statement? Finding the colors of all the 1994 cars is going to be a nightmare. We must traverse
each value
in DATA to confirm whether this value stores car data or simply other irrelevant data, for example, check index 2 and see index 2 Is the value equal to 1994, and then continues to get the value of index 3. This is worse than table scan, because it not only scans every row of data, but also needs to apply some complex rules to answer the query. The authors of NoSQL databases are certainly aware of these problems, and (given that querying is a very useful feature) they have also figured out some ways to make querying less "out of reach". One approach is to structure the data used, such as JSON, allowing references to other rows to represent relationships. At the same time, most NoSQL databases have the concept of namespace. A single type of data can be stored in a "section" unique to that type in the database, which allows the query engine to take advantage of the "shape" of the data to be queried. information.
Of course, although some more sophisticated methods have existed (and been implemented) to enhance queryability, the compromise between storing a smaller amount of schema and enhancing queryability is always an unavoidable one. The problem of escaping. In this example, our database only supports querying by key. If we need to support richer queries, then things become much more complicated.
Summary
At this point, I hope the concept of “NoSQL” is very clear. We learned a little bit about SQL and saw how an RDBMS works. We saw how to retrieve data from an RDBMS (using SQL
query
). By building a toy-level NoSQL database, we learned about some of the issues faced between queryability and simplicity. , also discusses some of the approaches that some database authors have taken to deal with these problems.
The above is the detailed content of How to use Python to complete a NoSQL database sample code sharing. For more information, please follow other related articles on the PHP Chinese website!
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn