Object-relational mapping (ORM) makes interacting with SQL databases simpler, but it is also considered to be inefficient and slower than raw SQL.
To use an ORM effectively means understanding how it queries the database. In this article I will focus on how to effectively use the Django ORM system to access medium to large data sets.
Django's queryset corresponds to several records (rows) in the database, filtered through optional queries. For example, the following code will get all the people named ‘Dave’ in the database:
person_set = Person.objects.filter(first_name="Dave")
The above code does not run any database queries. You can use person_set, add some filter conditions to it, or pass it to a function, and these operations will not be sent to the database. This is correct because database queries are one of the factors that significantly affect the performance of web applications.
To actually get data from the database, you need to traverse queryset:
for person in person_set: print(person.last_name)
When you iterate over a queryset, all matching records are retrieved from the database and converted into a Django model. This is called evaluation. These models will be saved in the queryset's built-in cache, so that if you iterate through the queryset again, you don't need to re-run the general query.
For example, the following code will only execute a database query once:
pet_set = Pet.objects.filter(species="Dog") # The query is executed and cached. for pet in pet_set: print(pet.first_name) # The cache is used for subsequent iteration. for pet in pet_set: print(pet.last_name)
The most useful thing about queryset cache is that it can effectively test whether the queryset contains data. It will only be traversed when there is data:
restaurant_set = Restaurant.objects.filter(cuisine="Indian") # `if`语句会触发queryset的执行。 if restaurant_set: # 遍历时用的是cache中的数据 for restaurant in restaurant_set: print(restaurant.name)
Sometimes, you may just want to know whether data exists without traversing all the data. In this case, simply using an if statement to make a judgment will completely execute the entire queryset and put the data into the cache, even though you don't need the data!
city_set = City.objects.filter(name="Cambridge") # `if`语句会执行queryset.。 if city_set: # 我们并不需要所有的数据,但是ORM仍然会获取所有记录! print("At least one city called Cambridge still stands!")
In order to avoid this, you can use the exists() method to check whether there is data:
tree_set = Tree.objects.filter(type="deciduous") # `exists()`的检查可以避免数据放入queryset的cache。 if tree_set.exists(): # 没有数据从数据库获取,从而节省了带宽和内存 print("There are still hardwood trees in the world!")
When processing thousands of records, loading them into memory all at once is wasteful. What's worse is that a huge queryset may lock the system process and bring your program to the verge of crash.
To avoid generating the queryset cache while traversing the data, you can use the iterator() method to obtain the data and discard it after processing the data.
star_set = Star.objects.all() # `iterator()`可以一次只从数据库获取少量数据,这样可以节省内存 for star in star_set.iterator(): print(star.name)
Of course, using the iterator() method to prevent cache generation means that queries will be executed repeatedly when traversing the same queryset. So be careful when using iterator() and make sure your code does not repeatedly execute queries when operating on a large queryset
As mentioned before, the queryset cache is powerful for combining if statements and for statements, which allows conditional loops on a queryset. However, for very large query sets, query set caching is not suitable.
The simplest solution is to use exists() in conjunction with iterator() to avoid using the queryset cache by using two database queries.
molecule_set = Molecule.objects.all() # One database query to test if any rows exist. if molecule_set.exists(): # Another database query to start fetching the rows in batches. for molecule in molecule_set.iterator(): print(molecule.velocity)
A more complicated solution is to use Python's "advanced iteration method" to look at the first element of iterator() before starting the loop and then decide whether to loop.
atom_set = Atom.objects.all() # One database query to start fetching the rows in batches. atom_iterator = atom_set.iterator() # Peek at the first item in the iterator. try: first_atom = next(atom_iterator) except StopIteration: # No rows were found, so do nothing. pass else: # At least one row was found, so iterate over # all the rows, including the first one. from itertools import chain for atom in chain([first_atom], atom_set): print(atom.mass)
Queryset's cache is used to reduce the program's queries to the database. Under normal use, it will ensure that the database will only be queried when needed.
Use the exists() and iterator() methods to optimize the program's use of memory. However, since they do not generate queryset cache, they may cause additional database queries.
So you need to pay attention when coding. If the program starts to slow down, you need to see where the bottlenecks in the code are and whether there are some small optimizations that can help you.
The above is the detailed content of Effectively using Django's QuerySets. For more information, please follow other related articles on the PHP Chinese website!