I am trying to query about 10,000 rows from a table. After trying various other options involving limit
offset
and not finding the desired success, I tried querying each goroutine for a single row. The idea is that each row only takes ~5ms
to query and fetch, but a batch of 10k will take over 20s
.
Shown below is a simplified version of the code:
func queryEmp(IDs[]string, spannerClient *spanner.Client) (Employee,error){ query := "Select name from Employee Where id = @id" g, gCtx := errgroup.WithContext(ctx) for _, ID := range IDs { id := ID g.Go(func() error { tx := spannerClient.Single() defer tx.Close() stmt2 := spanner.NewStatement(query) stmt2.Params = map[string]interface{}{ "ID": id, } qstart := time.Now() it := tx.Query(gCtx, stmt2) defer it.Stop() logrus.Debugf("%s took %v \n", "query execution.", time.Since(qstart)) for { row, err := it.Next() if err == iterator.Done { break } if err != nil { return err } var eID string if err := row.Column(0, &eID); err != nil { return err } } return nil }) } err = g.Wait() }
The results are tracked as follows:
{"message":"query execution. took 39.677µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 34.125µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 26.634µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 29.303µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} ... ... ... {"message":"query execution. took 188.749562ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 276.424692ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 188.62849ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 217.067524ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 276.949166ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} ... ... ... {"message":"query execution. took 454.64281ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 452.0848ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 525.748738ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 454.704656ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} {"message":"query execution. took 455.4276ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"} ... ... ... {"message":"query execution. took 6.767574136s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"} {"message":"query execution. took 6.780578444s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"} {"message":"query execution. took 6.785085491s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"} {"message":"query execution. took 6.779527006s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"}
It starts fine and as expected, but the query time keeps increasing.
MaxSessions
and MinSessions
for spannerClient
are 100
so one would imagine it might see a slight slowdown after 100, but it is not the truth.
Please read here:
<code> Sessions can execute only one transaction at a time. Standalone reads, writes, and queries use a transaction internally, and count toward the one transaction limit. </code>
Non-iterative queries (ReadRow
etc.) give me the same results.
Using tx := spannerClient.Single()
outside a for loop will give similar results.
question:
spannerClient.Single()
in the goroutine, the goroutine is still trying to use the same session/transaction? TLDR: The default maximum session pool size is 400, which means that no more than 400 queries can ever be run in parallel. You need to increase the session pool size to achieve this concurrency.
<小时/>First of all: I don't think sending 10,000 queries in parallel so that each query reads one row is not the most efficient solution to your problem. If you have no other criteria than filtering by employee IDs, and those IDs are scattered all over the place, it would still be more efficient to create the query in the form
select * from employees where id in unnest(@ids)
For a complete example, see this comment: https:// /github.com/googleapis/google-cloud-go/issues/858#issuecomment-550982307
Back to your specific question:
it := tx.Query(gCtx, stmt2)
does not execute the query, it just prepares it to be executed. Executed when row, err := it.Next()
is called for the first time. You can also see this in the recorded execution times. The first statement seems to execute within 30 microseconds, which is impossible. The above is the detailed content of Why does Spanner query using ReadOnlyTransaction in golang goroutine gradually become slower?. For more information, please follow other related articles on the PHP Chinese website!