Threads, a tool that helps and becomes indispensable in the development of modern, high-performance solutions. Regardless of the language, the ability to perform tasks in parallel is something that has great appeal. But obviously there's Uncle Ben's famous quote: "With great power comes great responsibility." How can this solution be used in the best way, aiming for performance, better use of resources and application health? First, it is necessary to understand the basic concepts of this topic.
Threads are the basic units of execution of a process in an operating system. They allow a program to perform multiple operations simultaneously within the same process. Each thread shares the same memory space as the main process, but can execute independently, which is useful for tasks that can be performed in parallel, such as input/output (I/O) operations, complex calculations, or data updates. user interface.
On many systems, threads are managed by the operating system, which allocates CPU time to each thread and manages context switching between them. In programming languages like Java, Python, and C, there are libraries and frameworks that make it easier to create and manage threads.
Threads are mainly used to improve the efficiency and responsiveness of a program. The reasons for using threads, especially focusing on backend are:
Parallelism: Threads allow you to perform multiple operations simultaneously, making better use of available CPU resources, especially on systems with multiple cores.
Performance: In I/O operations, such as reading and writing files or network communication, threads can help improve performance by allowing the program to continue performing other tasks while waiting the completion of these operations.
Modularity: Threads can be used to divide a program into smaller, more manageable parts, each performing a specific task.
However, it is important to manage threads carefully, as incorrect use can lead to problems such as race conditions, deadlocks, and debugging difficulties. For better management of them, a thread pool solution is used.
A thread pool is a software design pattern that involves creating and managing a pool of threads that can be reused to perform tasks. Instead of repeatedly creating and destroying threads for each task, a thread pool maintains a fixed number of threads ready to execute tasks as needed. This can significantly improve the performance of applications that need to handle many simultaneous tasks. The positive points of using a thread pool are:
Improved Performance: Creating and destroying threads is a costly operation in terms of resources. A thread pool minimizes this cost by reusing existing threads.
Resource Management: Controls the number of threads running, avoiding excessive thread creation that can overload the system.
Ease of Use: Simplifies thread management, allowing developers to focus on application logic rather than thread management.
Scalability: Helps scale applications to handle a large number of concurrent tasks efficiently.
Ok, of course I have to create a thread pool to better utilize this feature, but a question that comes up quickly is: "How many threads should the pool contain?". Following basic logic, the more the merrier, right? If everything can be done in parallel, it will soon be done, as it will be faster. Therefore, it is better not to limit the number of threads, or to set a high number, so that this is not a concern. Correct?
It's a fair statement, so let's test it. The code for this test was written in Kotlin just for familiarity and ease of writing the examples. This point is language agnostic.
4 examples were made exploring different system natures. Example 1 and 2 were made to use the CPU, do a lot of math, that is, have massive processing. Example 3 is focused on I/O, the example being a reading of a file and finally, in example 4 it is a situation of API calls in parallel, also focusing on I/O. They all used pools with different sizes, respectively with 1, 2, 4, 8, 16, 32, 50, 100 and 500 threads. All processes occur more than 500 times.
import kotlinx.coroutines.* import kotlin.math.sqrt import kotlin.system.measureTimeMillis fun isPrime(number: Int): Boolean { if (number <= 1) return false for (i in 2..sqrt(number.toDouble()).toInt()) { if (number % i == 0) return false } return true } fun countPrimesInRange(start: Int, end: Int): Int { var count = 0 for (i in start..end) { if (isPrime(i)) { count++ } } return count } @OptIn(DelicateCoroutinesApi::class) fun main() = runBlocking { val rangeStart = 1 val rangeEnd = 100_000 val numberOfThreadsList = listOf(1, 2, 4, 8, 16, 32, 50, 100, 500) for (numberOfThreads in numberOfThreadsList) { val customDispatcher = newFixedThreadPoolContext(numberOfThreads, "customPool") val chunkSize = (rangeEnd - rangeStart + 1) / numberOfThreads val timeTaken = measureTimeMillis { val jobs = mutableListOf<Deferred<Int>>() for (i in 0 until numberOfThreads) { val start = rangeStart + i * chunkSize val end = if (i == numberOfThreads - 1) rangeEnd else start + chunkSize - 1 jobs.add(async(customDispatcher) { countPrimesInRange(start, end) }) } val totalPrimes = jobs.awaitAll().sum() println("Total de números primos encontrados com $numberOfThreads threads: $totalPrimes") } println("Tempo levado com $numberOfThreads threads: $timeTaken ms") customDispatcher.close() } }
Total de números primos encontrados com 1 threads: 9592 Tempo levado com 1 threads: 42 ms Total de números primos encontrados com 2 threads: 9592 Tempo levado com 2 threads: 17 ms Total de números primos encontrados com 4 threads: 9592 Tempo levado com 4 threads: 8 ms Total de números primos encontrados com 8 threads: 9592 Tempo levado com 8 threads: 8 ms Total de números primos encontrados com 16 threads: 9592 Tempo levado com 16 threads: 16 ms Total de números primos encontrados com 32 threads: 9592 Tempo levado com 32 threads: 12 ms Total de números primos encontrados com 50 threads: 9592 Tempo levado com 50 threads: 19 ms Total de números primos encontrados com 100 threads: 9592 Tempo levado com 100 threads: 36 ms Total de números primos encontrados com 500 threads: 9592 Tempo levado com 500 threads: 148 ms
import kotlinx.coroutines.DelicateCoroutinesApi import kotlinx.coroutines.launch import kotlinx.coroutines.newFixedThreadPoolContext import kotlinx.coroutines.runBlocking import kotlin.system.measureTimeMillis fun fibonacci(n: Int): Long { return if (n <= 1) n.toLong() else fibonacci(n - 1) + fibonacci(n - 2) } @OptIn(DelicateCoroutinesApi::class) fun main() = runBlocking { val numberOfThreadsList = listOf(1, 2, 4, 8, 16, 32, 50, 100, 500) for (numberOfThreads in numberOfThreadsList) { val customDispatcher = newFixedThreadPoolContext(numberOfThreads, "customPool") val numbersToCalculate = mutableListOf<Int>() for (i in 1..1000) { numbersToCalculate.add(30) } val timeTaken = measureTimeMillis { val jobs = numbersToCalculate.map { number -> launch(customDispatcher) { fibonacci(number) } } jobs.forEach { it.join() } } println("Tempo levado com $numberOfThreads threads: $timeTaken ms") customDispatcher.close() } }
Tempo levado com 1 threads: 4884 ms Tempo levado com 2 threads: 2910 ms Tempo levado com 4 threads: 1660 ms Tempo levado com 8 threads: 1204 ms Tempo levado com 16 threads: 1279 ms Tempo levado com 32 threads: 1260 ms Tempo levado com 50 threads: 1364 ms Tempo levado com 100 threads: 1400 ms Tempo levado com 500 threads: 1475 ms
import kotlinx.coroutines.* import java.io.File import kotlin.system.measureTimeMillis @OptIn(DelicateCoroutinesApi::class) fun main() = runBlocking { val file = File("numeros_aleatorios.txt") if (!file.exists()) { println("Arquivo não encontrado!") return@runBlocking } val numberOfThreadsList = listOf(1, 2, 4, 8, 16, 32, 50, 100, 500) for (numberOfThreads in numberOfThreadsList) { val customDispatcher = newFixedThreadPoolContext(numberOfThreads, "customPool") val timeTaken = measureTimeMillis { val jobs = mutableListOf<Deferred<Int>>() file.useLines { lines -> lines.forEach { line -> jobs.add(async(customDispatcher) { processLine(line) }) } } val totalSum = jobs.awaitAll().sum() println("Total da soma com $numberOfThreads threads: $totalSum") } println("Tempo levado com $numberOfThreads threads: $timeTaken ms") customDispatcher.close() } } fun processLine(line: String): Int { return line.toInt() + 10 }
Total da soma de 1201 linhas com 1 threads: 60192 Tempo levado com 1 threads: 97 ms Total da soma de 1201 linhas com 2 threads: 60192 Tempo levado com 2 threads: 28 ms Total da soma de 1201 linhas com 4 threads: 60192 Tempo levado com 4 threads: 30 ms Total da soma de 1201 linhas com 8 threads: 60192 Tempo levado com 8 threads: 26 ms Total da soma de 1201 linhas com 16 threads: 60192 Tempo levado com 16 threads: 33 ms Total da soma de 1201 linhas com 32 threads: 60192 Tempo levado com 32 threads: 35 ms Total da soma de 1201 linhas com 50 threads: 60192 Tempo levado com 50 threads: 44 ms Total da soma de 1201 linhas com 100 threads: 60192 Tempo levado com 100 threads: 66 ms Total da soma de 1201 linhas com 500 threads: 60192 Tempo levado com 500 threads: 297 ms
import io.ktor.client.* import io.ktor.client.engine.cio.* import io.ktor.client.request.* import kotlinx.coroutines.DelicateCoroutinesApi import kotlinx.coroutines.launch import kotlinx.coroutines.newFixedThreadPoolContext import kotlinx.coroutines.runBlocking import kotlin.system.measureTimeMillis @OptIn(DelicateCoroutinesApi::class) fun main() = runBlocking { val client = HttpClient(CIO) try { val numberOfThreadsList = listOf(1, 2, 4, 8, 16, 32, 50, 100, 500) for (numberOfThreads in numberOfThreadsList) { val customDispatcher = newFixedThreadPoolContext(numberOfThreads, "customPool") val timeTaken = measureTimeMillis { repeat(500) { val jobs = launch(customDispatcher) { client.get("http://127.0.0.1:5000/example") } jobs.join() } } println("Tempo levado com $numberOfThreads threads: $timeTaken ms") customDispatcher.close() } } catch (e: Exception) { println("Erro ao conectar à API: ${e.message}") } finally { client.close() } }
Tempo levado com 1 threads: 7104 ms Tempo levado com 2 threads: 4793 ms Tempo levado com 4 threads: 4170 ms Tempo levado com 8 threads: 4310 ms Tempo levado com 16 threads: 4028 ms Tempo levado com 32 threads: 4089 ms Tempo levado com 50 threads: 4066 ms Tempo levado com 100 threads: 3978 ms Tempo levado com 500 threads: 3777 ms
Examples 1 to 3 have a common behavior, they all become more performant up to 8 threads, then the processing time increases again, but not example 4, what does this show? Isn't it interesting to always use as many threads as possible?
The simple and quick answer is no.
My machine's processor has 8 cores, that is, it can do 8 tasks at the same time, more than that the time increases as the time to manage the states of each thread ends up degrading performance.
Ok, this answers example 1 to 3, but what about example 4? Why does performance improve the more threads are launched?
Simple, as it is an integration, the machine has no processing, it basically waits for a response, it stays "sleeping" until the response arrives, so yes, here the number of threads can be greater. But be careful, it doesn't mean there can be as many as possible, threads cause resource exhaustion, using them indiscriminately has a reverse effect that will affect the overall health of the service.
Therefore, to define the number of threads that your pool will have, the easiest and safest way is to separate the nature of the task that will be performed. They are separated into two:
Tasks that do not require processing:
When the type of task does not require processing, more threads can be created than there are processor cores on the machine. This happens because it is not necessary to process the information to complete the thread, basically threads of this nature, for the most part, expect responses from integrations, such as writing to a DB or a response from an API.
Tasks that require processing:
When the solution has processing, that is, the machine is actually doing work, the maximum number of threads must be the number of cores in the machine's processor. This is because a processor core is unable to do more than one thing at the same time. For example, if the processor on which the solution runs has 4 cores, then your thread pool must be the size of your processor's cores, a 4-thread pool.
The first point to define when thinking about a thread pool is not necessarily the number that will limit its size, but rather the nature of the task performed. Threads help a lot with the performance of services, but they must be used in the best way so that it does not have the opposite effect and degrade performance, or even worse, cause the entire service's health to be affected. It is clear that smaller pools end up favoring tasks with a lot of processing usage, CPU bounded tasks in other words. If you are not sure whether the solution in which threads will be used has a behavior in which processing will be used massively, err on the side of caution, limit your pool to the number of processors on the machine, believe me, it will save you a lot of headaches.
The above is the detailed content of Threads: How to define and limit execution aiming at performance?. For more information, please follow other related articles on the PHP Chinese website!