Bonjour, je voulais partager un exemple du modèle de pagination du curseur (ou modèle de pagination du curseur) car lorsque j'en cherchais un, je ne pouvais trouver qu'un exemple des cas qui avancent mais pas en arrière, ni comment gérer les données au début et à la fin.
Vous pouvez voir le référentiel pour cela ici mais je vais essayer de tout expliquer ici.
J'utilise Python Poetry comme outil de gestion de packages, donc pour cet exemple, je suppose que vous l'avez déjà. La première chose à faire est d’installer les dépendances avec l’installation de poésie. Vous pouvez également utiliser pip pour les installer avec : pip install pymongo loguru.
Maintenant, nous avons également besoin d'une base de données Mongo, vous pouvez télécharger MongoDB Community Edition ici, et vous pouvez la configurer avec ce guide.
Maintenant que nous avons installé les dépendances et la base de données, nous pouvons y ajouter des données. Pour cela nous pouvons utiliser ceci :
from pymongo import MongoClient # Data to add sample_posts = [ {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)}, {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)}, {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)}, {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)}, {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)}, {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)}, {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)}, {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)}, {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)}, {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)}, {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)}, ] # Creating connection token = "mongodb://localhost:27017" client = MongoClient(token) cursor_db = client.cursor_db.content cursor_db.insert_many(sample_posts)
Avec cela, nous créons une connexion à une base de données locale vers le contenu de la collection. Ensuite, nous y ajoutons les valeurs de sample_posts. Maintenant que nous avons des données à rechercher, nous pouvons commencer à les interroger. Commençons par rechercher et lire les données jusqu'à la fin.
# Import libraries from bson.objectid import ObjectId from datetime import datetime from loguru import logger from pymongo import MongoClient # Use token to connect to local database token = "mongodb://localhost:27017" client = MongoClient(token) # Access cursor_db collection (it will be created if it does not exist) cursor_db = client.cursor_db.content default_page_size = 5 def fetch_next_page(cursor, page_size = None): # Use the provided page_size or use a default value page_size = page_size or default_page_size # Check if there is a cursor if cursor: # Get documents with `_id` greater than the cursor query = {"_id": {'$gt': cursor}} else: # Get everything query = {} # Sort in ascending order by `_id` sort_order = 1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(cursor_db.aggregate(pipeline)) # logger.debug(results) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Set the cursor for the next page next_cursor = results[-1]['_id'] # Set the previous cursor if cursor: # in case the cursor have data prev_cursor = results[0]['_id'] else: # In case the cursor don't have data (first page) prev_cursor = None # Indicate you haven't reached the end of the data at_end = False else: # Indicate that there are not more pages available (last page reached) next_cursor = None # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Indicate you have reached the end of the data at_end = True return results, next_cursor, prev_cursor, at_end @logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") if __name__: main() logger.info("--- Execution end ---")
Ce code renvoie ceci :
2024-09-02 08:55:24.388 | INFO | __main__:main:73 - results = [{'_id': ObjectId('66bdfdcf7a0667fd1888c20c'), 'title': 'Post 1', 'content': 'Content 1', 'date': datetime.datetime(2023, 8, 1, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20d'), 'title': 'Post 2', 'content': 'Content 2', 'date': datetime.datetime(2023, 8, 2, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20e'), 'title': 'Post 3', 'content': 'Content 3', 'date': datetime.datetime(2023, 8, 3, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c20f'), 'title': 'Post 4', 'content': 'Content 4', 'date': datetime.datetime(2023, 8, 4, 0, 0)}, {'_id': ObjectId('66bdfdcf7a0667fd1888c210'), 'title': 'Post 5', 'content': 'Content 5', 'date': datetime.datetime(2023, 8, 5, 0, 0)}] 2024-09-02 08:55:24.388 | INFO | __main__:main:74 - next_cursor = ObjectId('66bdfdcf7a0667fd1888c210') 2024-09-02 08:55:24.388 | INFO | __main__:main:75 - prev_cursor = None 2024-09-02 08:55:24.388 | INFO | __main__:main:76 - at_end = False 2024-09-02 08:55:24.388 | INFO | __main__:<module>:79 - --- Execution end ---
Vous pouvez voir que le curseur pointe vers une page suivante et que la précédente est Aucune, également, il identifie qu'il ne s'agit pas de la fin des données. Pour obtenir ces valeurs, il faut mieux regarder la fonction fetch_next_page. Là, nous pouvons voir que nous avons défini la taille de la page, la requête, l'ordre de tri, puis nous créons le pipeline vers l'opération d'agrégation. Pour identifier s'il existe une autre page d'informations, nous utilisons l'opérateur $limit, nous donnons la valeur de page_size + 1 pour vérifier s'il existe, en fait, une autre page avec cela + 1. Pour la vérifier, nous utilisons l'expression len( résultats) > page_size, si le nombre de données renvoyées est supérieur à page_size alors il existe une autre page ; au contraire, c'est la dernière page.
Dans le cas où il y a une page suivante, nous devons supprimer le dernier élément de la liste des informations que nous avons interrogées, car c'était le + 1 dans le pipeline, nous devons définir next_cursor avec le _id de la dernière valeur actuelle de la liste, et définir le prev_cursor (le curseur précédent) selon les cas, s'il y avait un curseur ça veut dire qu'il y a des données avant celui-ci, dans l'autre cas, ça veut dire que c'est le premier groupe de données, donc il y a aucune information précédente, par conséquent, le curseur doit être le premier _id des données trouvées ou Aucun.
Maintenant que nous savons comment rechercher les données et ajouter une validation importante, nous devons activer un moyen de les parcourir vers l'avant, pour cela nous utiliserons la commande d'entrée pour demander à l'utilisateur exécutant le script d'écrire la direction à suivre, cependant, pour le moment, ce ne sera que vers l'avant (f). Nous pouvons mettre à jour notre fonction principale pour le faire comme ceci :
@logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") # Checking if there is more data to show if next_cursor: # Enter a cycle to traverse the data while(True): print(125 * "*") # Ask for the user to move forward or cancel the execution inn = input("Can only move Forward (f) or Cancel (c): ") # Execute action acording to the input if inn == "f": results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, default_page_size) elif inn == "c": logger.warning("------- Canceling execution -------") break else: # In case the user sends something that is not a valid option print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show")
Avec cela, nous sommes capables de parcourir les données jusqu'à la fin, mais quand elles atteignent la fin, elles reviennent au début et le cycle recommence, nous devons donc ajouter quelques validations pour éviter cela et aussi reculer. Pour cela nous allons créer la fonction fetch_previous_page et ajouter quelques modifications à la fonction main :
def fetch_previous_page(cursor, page_size = None): # Use the provided page_size or fallback to the class attribute page_size = page_size or default_page_size # Check if there is a cursor if cursor: # Get documents with `_id` less than the cursor query = {'_id': {'$lt': cursor}} else: # Get everything query = {} # Sort in descending order by `_id` sort_order = -1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(cursor_db.aggregate(pipeline)) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Reverse the results to maintain the correct order results.reverse() # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Set the cursor for the next page next_cursor = results[-1]['_id'] # Indicate you are not at the start of the data at_start = False else: # Reverse the results to maintain the correct order results.reverse() # Indicate that there are not more previous pages available (initial page reached) prev_cursor = None # !!!! next_cursor = results[-1]['_id'] # Indicate you have reached the start of the data at_start = True return results, next_cursor, prev_cursor, at_start
Extrêmement similaire à fetch_next_page, mais la requête (si les conditions sont remplies) utilise l'opérateur $lt et sort_order doit être -1 pour amener les données dans l'ordre nécessaire. Désormais, lors de la validation si len(results) > page_size, si la condition est true, alors il supprime l'élément supplémentaire et inverse l'ordre des données pour qu'elles s'affichent correctement puis définit le curseur précédent sur le premier élément des données et le curseur suivant jusqu'au dernier. Au contraire, les données sont inversées, le curseur précédent est défini sur Aucun (car il n'y a pas de données précédentes) et le curseur suivant est défini sur la dernière valeur de la liste. Dans les deux cas, une variable booléenne appelée at_start est définie pour identifier cette situation. Il faut maintenant ajouter l'interaction avec l'utilisateur pour revenir en arrière dans la fonction principale, il y a donc 3 situations à gérer au cas où nous serions au début, à la fin ou au milieu des données : seulement avancer, seulement reculer , et en avant ou en arrière :
@logger.catch def main(): """Main function.""" # Get the first page results, next_cursor, prev_cursor, at_end = fetch_next_page(None) logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_end = }") # Checking if there is more data to show if not(at_start and at_end): # Enter a cycle to traverse the data while(True): print(125 * "*") # Ask for the user to move forward or cancel the execution if at_end: inn = input("Can only move Backward (b) or Cancel (c): ") stage = 0 elif at_start: inn = input("Can only move Forward (f) or Cancel (c): ") stage = 1 else: inn = input("Can move Forward (f), Backward (b), or Cancel (c): ") stage = 2 # Execute action acording to the input if inn == "f" and stage in [1, 2]: results, next_cursor, prev_cursor, at_end = fetch_next_page(next_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_start = False elif inn == "b" and stage in [0, 2]: results, next_cursor, prev_cursor, at_start = fetch_previous_page(prev_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_end = False elif inn == "c": logger.warning("------- Canceling execution -------") break else: print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show")
Nous avons ajouté une validation à l'entrée des utilisateurs pour identifier l'étape où nous nous trouvons lors du parcours des données. Notez également que at_start et at_end après l'exécution de fetch_next_page et fetch_previous_page respectivement qui sont nécessaires pour être réinitialisés après avoir atteint ces valeurs. limites. Vous pouvez maintenant atteindre la fin des données et revenir en arrière jusqu'au début. La validation après avoir obtenu la première page de données a été mise à jour pour vérifier si les indicateurs at_start et at_end sont True, ce qui indiquera qu'il n'y a plus de données à afficher.
Note: I was facing a bug at this point which I cannot reproduce right now, but it was causing problems when going backward and reaching the start, the cursor was pointing to the wrong place and when you wanted to go forward it skip 1 element. To solve it I added a validation in fetch_previous_page if a parameter called prev_at_start (which is the previous value of at_start) to assing next_cursor the value results[0]['_id'] or, results[-1]['_id'] in case the previous stage was not at the beginning of the data. This will be ommited from now on, but I think is worth the mention.
Now that we can traverse the data from beginning to end and going forward or backward in it, we can create a class that have all this functions and call it to use the example. Also we must add the docstring so everything is documents correctly. The result of that are in this code:
"""Cursor Paging/Pagination Pattern Example.""" from bson.objectid import ObjectId from datetime import datetime from loguru import logger from pymongo import MongoClient class cursorPattern: """ A class to handle cursor-based pagination for MongoDB collections. Attributes: ----------- cursor_db : pymongo.collection.Collection The MongoDB collection used for pagination. page_size : int Size of the pages. """ def __init__(self, page_size: int = 5) -> None: """Initializes the class. Sets up a connection to MongoDB and specifying the collection to work with. """ token = "mongodb://localhost:27017" client = MongoClient(token) self.cursor_db = client.cursor_db.content self.page_size = page_size def add_data(self,) -> None: """Inserts sample data into the MongoDB collection for demonstration purposes. Note: ----- It should only use once, otherwise you will have repeated data. """ sample_posts = [ {"title": "Post 1", "content": "Content 1", "date": datetime(2023, 8, 1)}, {"title": "Post 2", "content": "Content 2", "date": datetime(2023, 8, 2)}, {"title": "Post 3", "content": "Content 3", "date": datetime(2023, 8, 3)}, {"title": "Post 4", "content": "Content 4", "date": datetime(2023, 8, 4)}, {"title": "Post 5", "content": "Content 5", "date": datetime(2023, 8, 5)}, {"title": "Post 6", "content": "Content 6", "date": datetime(2023, 8, 6)}, {"title": "Post 7", "content": "Content 7", "date": datetime(2023, 8, 7)}, {"title": "Post 8", "content": "Content 8", "date": datetime(2023, 8, 8)}, {"title": "Post 9", "content": "Content 9", "date": datetime(2023, 8, 9)}, {"title": "Post 10", "content": "Content 10", "date": datetime(2023, 8, 10)}, {"title": "Post 11", "content": "Content 11", "date": datetime(2023, 8, 11)}, ] self.cursor_db.insert_many(sample_posts) def _fetch_next_page( self, cursor: ObjectId | None, page_size: int | None = None ) -> tuple[list, ObjectId | None, ObjectId | None, bool]: """Retrieves the next page of data based on the provided cursor. Parameters: ----------- cursor : ObjectId | None The current cursor indicating the last document of the previous page. page_size : int | None The number of documents to retrieve per page (default is the class's page_size). Returns: -------- tuple: - results (list): The list of documents retrieved. - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page. - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page. - at_end (bool): Whether this is the last page of results. """ # Use the provided page_size or fallback to the class attribute page_size = page_size or self.page_size # Check if there is a cursor if cursor: # Get documents with `_id` greater than the cursor query = {"_id": {'$gt': cursor}} else: # Get everything query = {} # Sort in ascending order by `_id` sort_order = 1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(self.cursor_db.aggregate(pipeline)) # logger.debug(results) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Set the cursor for the next page next_cursor = results[-1]['_id'] # Set the previous cursor if cursor: # in case the cursor have data prev_cursor = results[0]['_id'] else: # In case the cursor don't have data (first time) prev_cursor = None # Indicate you haven't reached the end of the data at_end = False else: # Indicate that there are not more pages available (last page reached) next_cursor = None # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Indicate you have reached the end of the data at_end = True return results, next_cursor, prev_cursor, at_end def _fetch_previous_page( self, cursor: ObjectId | None, page_size: int | None = None, ) -> tuple[list, ObjectId | None, ObjectId | None, bool]: """Retrieves the previous page of data based on the provided cursor. Parameters: ----------- cursor : ObjectId | None The current cursor indicating the first document of the current page. page_size : int The number of documents to retrieve per page. prev_at_start : bool Indicates whether the previous page was the first page. Returns: -------- tuple: - results (list): The list of documents retrieved. - next_cursor (ObjectId | None): The cursor pointing to the start of the next page, None in case is the last page. - prev_cursor (ObjectId | None): The cursor pointing to the start of the previous page, None in case is the start page. - at_start (bool): Whether this is the first page of results. """ # Use the provided page_size or fallback to the class attribute page_size = page_size or self.page_size # Check if there is a cursor if cursor: # Get documents with `_id` less than the cursor query = {'_id': {'$lt': cursor}} else: # Get everything query = {} # Sort in descending order by `_id` sort_order = -1 # Define the aggregation pipeline pipeline = [ {"$match": query}, # Filter based on the cursor {"$sort": {"_id": sort_order}}, # Sort documents by `_id` {"$limit": page_size + 1}, # Limit results to page_size + 1 to check if there's a next page # {"$project": {"_id": 1, "title": 1, "content": 1}} # In case you want to return only certain attributes ] # Execute the aggregation pipeline results = list(self.cursor_db.aggregate(pipeline)) # Validate if some data was found if not results: raise ValueError("No data found") # Check if there are more documents than the page size if len(results) > page_size: # Deleting extra document results.pop(-1) # Reverse the results to maintain the correct order results.reverse() # Set the cursor for the previous page prev_cursor = results[0]['_id'] # Set the cursor for the next page next_cursor = results[-1]['_id'] # Indicate you are not at the start of the data at_start = False else: # Reverse the results to maintain the correct order results.reverse() # Indicate that there are not more previous pages available (initial page reached) prev_cursor = None # if prev_at_start: # # in case before was at the starting page # logger.warning("Caso 1") # next_cursor = results[0]['_id'] # else: # # in case before was not at the starting page # logger.warning("Caso 2") # next_cursor = results[-1]['_id'] next_cursor = results[-1]['_id'] # Indicate you have reached the start of the data at_start = True return results, next_cursor, prev_cursor, at_start def start_pagination(self): """Inicia la navegacion de datos.""" # Change page size in case you want it, only leave it here for reference page_size = None # Retrieve the first page of results results, next_cursor, prev_cursor, at_end = self._fetch_next_page(None, page_size) at_start = True logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") # if next_cursor: if not(at_start and at_end): while(True): print(125 * "*") if at_end: inn = input("Can only move Backward (b) or Cancel (c): ") stage = 0 # ===================================================== # You could reset at_end here, but in this example that # will fail in case the user sends something different # from Backward (b) or Cancel (c) # ===================================================== # at_end = False elif at_start: inn = input("Can only move Forward (f) or Cancel (c): ") stage = 1 # ===================================================== # You could reset at_end here, but in this example that # will fail in case the user sends something different # from Forward (f) or Cancel (c) # ===================================================== # at_start = False else: inn = input("Can move Forward (f), Backward (b), or Cancel (c): ") stage = 2 # Execute action acording to the input if inn == "f" and stage in [1, 2]: results, next_cursor, prev_cursor, at_end = self._fetch_next_page(next_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_start = False elif inn == "b" and stage in [0, 2]: # results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, at_start, page_size) results, next_cursor, prev_cursor, at_start = self._fetch_previous_page(prev_cursor, page_size) # For this example, you must reset here the value, otherwise you lose the reference of the cursor at_end = False elif inn == "c": logger.warning("------- Canceling execution -------") break else: print("Not valid action, it can only move in the opposite direction.") continue logger.info(f"{results = }") logger.info(f"{next_cursor = }") logger.info(f"{prev_cursor = }") logger.info(f"{at_start = }") logger.info(f"{at_end = }") else: logger.warning("There is not more data to show") @logger.catch def main(): """Main function.""" my_cursor = cursorPattern(page_size=5) # my_cursor.add_data() my_cursor.start_pagination() if __name__: main() logger.info("--- Execution end ---")
The page_size was added as an attribute to the class cursorPattern for it to be easier to define the size of every page and added docstrings to the class and its methods.
Hope this will help/guide someone that needs to implement Cursor Pagination.
Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!