Core points
$lookup
operator in version 3.2, which can perform LEFT-OUTER-JOIN-like operations on two or more sets, thereby achieving data similar to relational databases. manage. However, this operator is limited to use in aggregate operations, which are more complex and usually slower than simple lookup queries. $lookup
operator requires four parameters: localField
(input the search field in the document), from
(the collection to be connected), foreignField
(fields to be found in the collection of from
) and as
(name of the output field). This operator can be used in aggregate queries to match posts, sort in order, limit the number of items, connect user data, flatten user arrays and return only necessary fields. $lookup
operator is useful and can help manage a small amount of relational data in a NoSQL database, it is not a replacement for the more powerful JOIN clause in SQL. If the user document is deleted in MongoDB, the orphan post document will be retained, indicating a lack of constraints. Therefore, if the $lookup
operator is frequently used, it may indicate that the wrong data storage is used, and a relational (SQL) database may be more suitable. Thanks to Julian Motz for his peer review help.
One of the biggest differences between SQL and NoSQL databases is JOIN. In a relational database, the SQL JOIN clause allows you to combine rows from two or more tables using a common field between them. For example, if you have a book and publisher table, you can write the following SQL command:
SELECT book.title, publisher.name FROM book LEFT JOIN book.publisher_id ON publisher.id;
In other words, the book table has a publisher_id field that references the id field in the publisher table.
This is practical because a single publisher can provide thousands of books. If we need to update the publisher's details in the future, we can change the individual record. Data redundancy is minimized because we do not need to repeat the publisher's information for each book. This technology is called standardization.
SQL databases provide a range of standardization and constraints to ensure the maintenance of relationships.
This is not always the case…
Document-oriented databases (such as MongoDB) are designed to store de-normalized data. Ideally, there shouldn't be any relationship between sets. If the same data needs to be in two or more documents, it must be repeated.
This can be frustrating because there is almost no situation where you will never need relational data. Fortunately, MongoDB 3.2 introduced a new operator that can perform LEFT-OUTER-JOIN-like operations on two or more sets. But there is a problem...$lookup
$lookup
Only allowed to be used in aggregate operations. Think of it as a pipeline of a series of operators that query, filter and group results. The output of one operator is used as input to the next operator.
Aggregations are harder to understand than simple lookup queries and usually run slower. However, they are powerful and are a valuable option for complex search operations.
It is best to use an example to explain the aggregation. Suppose we are creating a social media platform with a collection of users. It stores the details of each user in a separate document. For example:
SELECT book.title, publisher.name FROM book LEFT JOIN book.publisher_id ON publisher.id;
We can add as many fields as we want, but all MongoDB documents require a _id
field with a unique value. _id
Similar to SQL primary keys, they will be inserted automatically if needed.
Our social network now needs a collection of posts that store a large number of insightful updates from users. The document stores text, date, rating and references to the user who wrote it in the user_id
field:
{ "_id": ObjectID("45b83bda421238c76f5c1969"), "name": "User One", "email": "userone@email.com", "country": "UK", "dob": ISODate("1999-09-13T00:00:00.000Z") }
We now want to display the last twenty posts rated "important" by all users in reverse order of time. Each returned document should contain text, the time of the post, and the name and country of the associated user.
MongoDB aggregation query passes an array of pipeline operators that define each operation in order. First, we need to use the $match
filter to extract all documents with correct ratings from the post collection:
{ "_id": ObjectID("17c9812acff9ac0bba018cc1"), "user_id": ObjectID("45b83bda421238c76f5c1969"), "date": ISODate("2016-09-05T03:05:00.123Z"), "text": "My life story so far", "rating": "important" }
We now have to sort the matching items in reverse order by using the $sort
operator:
{ "$match": { "rating": "important" } }
Since we only need twenty posts, we can apply the $limit
stage so that MongoDB only needs to process the data we want:
{ "$sort": { "date": -1 } }
We can now use the new $lookup
operator to connect data from the user collection. It requires an object with four parameters:
localField
: Enter the search field in the document from
: Collection to be connectedforeignField
: Fields found in from
collection as
: The name of the output field. Therefore, our operator is:
{ "$limit": 20 }
This will create a new field in our output called userinfo
. It contains an array where each value matches the user document:
{ "$lookup": { "localField": "user_id", "from": "user", "foreignField": "_id", "as": "userinfo" } }
We have a one-to-one relationshippost.user_id
and user._id
because a post can only have one author. Therefore, our userinfo
array will always contain only one item. We can use the $unwind
operator to break it down into a subdocument:
"userinfo": [ { "name": "User One", ... } ]
Output will now be converted to a more practical format, with other operators available for applying:
{ "$unwind": "$userinfo" }
Finally, we can use the $project
stage in the pipeline to return text, time of post, user's name and country:
SELECT book.title, publisher.name FROM book LEFT JOIN book.publisher_id ON publisher.id;
Our final aggregate query matches posts, sorts in order, limits to the latest twenty items, connects user data, flattens user arrays and returns only necessary fields. Complete command:
{ "_id": ObjectID("45b83bda421238c76f5c1969"), "name": "User One", "email": "userone@email.com", "country": "UK", "dob": ISODate("1999-09-13T00:00:00.000Z") }
The result is a collection of up to twenty documents. For example:
{ "_id": ObjectID("17c9812acff9ac0bba018cc1"), "user_id": ObjectID("45b83bda421238c76f5c1969"), "date": ISODate("2016-09-05T03:05:00.123Z"), "text": "My life story so far", "rating": "important" }
MongoDB $lookup
is useful and powerful, but even this basic example requires a complex aggregation query. It cannot replace the more powerful JOIN clause in SQL. MongoDB also does not provide constraints; if the user document is deleted, the orphan post document will be retained.
Ideally, the $lookup
operator should be rarely needed. If you need it frequently, you may have used the wrong data store...
If you have relational data, please use a relational (SQL) database!
That is, $lookup
is a popular addition to MongoDB 3.2. It overcomes some of the more frustrating problems when using a small amount of relational data in a NoSQL database.
In a SQL database, the connection operation combines rows from two or more tables based on the columns associated between them. However, MongoDB, as a NoSQL database, does not support traditional SQL connections. Instead, MongoDB provides two ways to perform similar operations: the $lookup
stage and the $graphLookup
stage in the aggregation. These methods allow you to combine data from multiple collections into a single result set.
$lookup
The stage in MongoDB allows you to connect documents from another collection ("connected" collection) and add the connected documents to the input document. The $lookup
phase specifies the "from" collection, "localField" and "foreignField" to match the document, and the "as" field to output the document. It is similar to the left outer join in SQL, returning all documents from the input collection and matching documents from the "from" collection. $lookup
phase for recursive search. The $graphLookup
stage performs a recursive search on the specified set and can choose to limit the depth and breadth of the search. It is useful for querying hierarchical data or graphs where the number of levels is unknown or may change. $graphLookup
stage $lookup
and $match
stages to filter and convert documents. $project
Yes, you can connect multiple MongoDB collections by linking multiple $lookup
stages in an aggregation pipeline. Each $lookup
stage adds connected documents from another collection to the input document.
When using MongoDB connection, if the document in the input collection does not match any document in the "from" collection, the $lookup
phase adds an empty array to the "as" field. You can handle these null or missing values by adding the $lookup
phase after the $match
phase to filter out documents with empty "as" fields.
Starting from MongoDB 3.6, the $lookup
and $graphLookup
stages can accept sharded sets to be "from" sets. However, due to the additional network overhead, performance may not be as good as non-shaved collections.
You can sort connected documents in MongoDB by adding the $lookup
phase after the $sort
phase in the aggregation pipeline. The $sort
stage sorts the documents in the specified field in ascending or descending order.
find()
method? No, MongoDB connection cannot be used with find()
method. The $lookup
and $graphLookup
stages are part of the aggregation framework that provides more advanced data processing capabilities than the find()
method.
To debug or troubleshoot MongoDB connection failures, you can use the explain()
method to analyze the execution plan of the aggregate pipeline. The explain()
method provides detailed information about the stage, including the number of documents processed, the time spent, and the usage of the index.
The above is the detailed content of Using JOINs in MongoDB NoSQL Databases. For more information, please follow other related articles on the PHP Chinese website!