Firestore method to get random documents in collection

Question

It is crucial for my application to be able to randomly select multiple documents from a collection in Firebase. Since there is no built-in native function in Firebase (that I know of) to implement a query that does this, my first thought was to use a query cursor to pick a random starting and ending index, given that I have the numbers in the collection number of documents. This approach works, but only in a limited way, as each document will be served sequentially with its neighbor each time; however, I can achieve random documents if I am able to select the document by its index in its parent collection Inquire

P粉668113768 · Answer

Posting this to help anyone who encounters this problem in the future.

If you use auto IDs, you can generate new auto IDs and query for the closest auto ID, as Dan McGrath's answer.

I recently created a random quotes api that needed to get random quotes from a firestore collection.
This is how I solved this problem:

var db = admin.firestore();
var quotes = db.collection("quotes");

var key = quotes.doc().id;

quotes.where(admin.firestore.FieldPath.documentId(), '>=', key).limit(1).get()
.then(snapshot => {
    if(snapshot.size > 0) {
        snapshot.forEach(doc => {
            console.log(doc.id, '=>', doc.data());
        });
    }
    else {
        var quote = quotes.where(admin.firestore.FieldPath.documentId(), '<', key).limit(1).get()
        .then(snapshot => {
            snapshot.forEach(doc => {
                console.log(doc.id, '=>', doc.data());
            });
        })
        .catch(err => {
            console.log('Error getting documents', err);
        });
    }
})
.catch(err => {
    console.log('Error getting documents', err);
});

The key to the query is this:

.where(admin.firestore.FieldPath.documentId(), '>', key)

If the document is not found, call it again with the opposite operation.

Hope this helps!

P粉985686557 · Answer

Using a randomly generated index and a simple query, you can randomly select documents from a collection or collection group in Cloud Firestore.

This answer is divided into 4 parts, each part has different options:

How to generate random index
How to query random index
Select multiple random documents
Reseed for consistent randomness

How to generate random index

The basis of this answer is to create an index field that, when sorted in ascending or descending order, will cause all documents to be sorted randomly. There are a number of different ways to create this, so let's look at 2, starting with the most accessible method.

Automatically identify version

If you use the randomly generated automatic IDs provided in our client library, you can use the same system to randomly select documents. In this case, the randomly ordered index is the document ID.

Later in our query section, the random value you generate is a new automatic ID (iOS, Android, Web) that you The field queried is the __name__ field, and the "low value" mentioned later is an empty string. This is by far the simplest way to generate a random index, and will work regardless of language and platform.

By default, document names (__name__) are only indexed in ascending order, and you cannot rename existing documents except by deleting and recreating them. If you need either of these, you can still use this method, just store the automatic ID as an actual field named random instead of overloading the document name for this purpose.

Random integer version

When you write a document, you first generate a random integer in a bounded range and set it to a field named random. Depending on the number of documents you expect, you can use different bounded ranges to save space or reduce the risk of conflicts (which reduces the effectiveness of this technique).

You should consider which language you need as there will be different considerations. Although Swift is simple, JavaScript has a notable problem:

32-bit integers: ideal for small (~10K less likely to conflict) data sets
64-bit integers: large data sets (note: JavaScript itself does not support it, still)

This will create an index with documents sorted randomly. Later in our query section, the random value you generate will be another of these values, and the "low value" mentioned later will be -1.

How to query random index

Now that you have a random index, you will need to query it. Below we look at some simple variations that select 1 random document, as well as options for selecting multiple 1 documents.

For all of these options, you need to generate a new random value in the same form as the index value you created when writing the document, represented by the variable random below. We will use this value to find random points on the index.

Surround

Now that you have random values, you can query individual documents:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

Check if the document has been returned. If not, query again, but with the "low value" of the random index. For example, if you do random integers, lowValue is 0:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
                   .order(by: "random")
                   .limit(to: 1)

As long as you have one document, you are guaranteed to return at least 1 document.

Both directions

The wraparound method is simple to implement and allows you to optimize storage with only ascending indexes enabled. One disadvantage is that values may be unfairly protected. For example, if the first 3 documents in 10K (A, B, C) have random index values A:409496, B:436496, C:818992, then the chance of A and C being selected is less than 1/10K, while B will be selected because of A is effectively shielded from close proximity, and has only about a 1/160K chance.

Instead of querying one way and wrapping around if a value is not found, you can randomly choose between >= and , which reduces the probability of unfairly masking a value Halved at the cost of doubling index storage.

If no result is returned in one direction, switch to the other direction:

queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
                   .order(by: "random", descending: true)
                   .limit(to: 1)

queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

Select multiple random documents

Typically, you need to select multiple random documents at once. There are two different ways to adapt the above techniques depending on the trade-offs you want.

Rinse and repeat

This method is very simple. Just repeat the process, including choosing a new random integer each time.

This method will give you a random sequence of documents without having to worry about seeing the same pattern repeatedly.

The trade-off is that it will be slower than the next method since it requires a separate round trip to serve each document.

Keep it up

In this method, just increase the limit number of required documents. This is a bit complicated because you may be returning 0..limit documents in the call. You then need to get the missing document in the same way, but with the limitations reduced to just the differences. If you know that the total number of documents is more than you ask for, you can optimize by ignoring the edge case where enough documents are never retrieved on the second call (but not the first).

The trade-off with this solution is the repeating sequence. Although the documents are sorted randomly, if you end up with overlapping ranges, you'll see the same pattern you saw before. There are ways to alleviate this concern, which we will discuss in the next section on reseeding.

This method is faster than "rinse and repeat" because you will request all documents in one call in the best case or two calls in the worst case.

Reseed for consistent randomness

While this method will give you documents randomly if the document set is static, the probability of returning each document will also be static. This is a problem because some values may have unfairly low or high probabilities depending on the initial random value they were obtained from. In many use cases this is fine, but in some you may want to increase the long-term randomness so that there is a more even chance of any 1 document being returned.

Note that inserted documents will eventually be intertwined, gradually changing the probability, and the same will be true for deleted documents. If the insertion/deletion rate is too small for a given number of documents, there are some strategies to solve this problem.

Multiple random

You don't have to worry about reseeding, you can always create multiple random indexes per document and then randomly select one of them each time. For example, let field random be a map containing subfields 1 to 3:

{'random': {'1': 32456, '2':3904515723, '3': 766958445}}

Now you will randomly query random.1, random.2, random.3, creating a larger distribution of randomness. This essentially uses increased storage space to save the increased computation (document writing) of reseeding.

Reset seed when writing

Every time the document is updated, the random value of the random field will be regenerated. This will move the documents in a random index.

Reseeding on read

If the generated random values are not uniformly distributed (they are random, so this is expected), the same document may be selected at inappropriate times. This problem can be easily solved by updating a randomly selected document with new random values after reading it.

Since writes are more expensive and can become hotspots, you may choose to update only on a subset of read times (e.g., if random(0,100) === 0) update; ).