Issue record: “No thread for socket” about Memcach

WBOY
Lepaskan: 2016-06-07 16:31:02
asal
1456 orang telah melayarinya

文章系本人原创,转载请保持完整性并注明出自《四火的唠叨》 We own a service supplying not-real-time data to the downstream clients. In order to reduce the average latency and the pressure on database, Memcached was imported as the cluster ce

文章系本人原创,转载请保持完整性并注明出自《四火的唠叨》

Issue record: No thread for socket about MemcachedWe own a service supplying not-real-time data to the downstream clients. In order to reduce the average latency and the pressure on database, Memcached was imported as the cluster center cache, and our service consumed it. I encountered an issue when Memcached was firstly deployed on production environment:

[WARN] (Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]) amazon.platform.servertoolkit.http.jetty.JettyServerManager: No thread for Socket[addr=/10.0.77.98,port=10545,localport=3961] : [Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]]org.mortbay.util.ThreadPool.run(ThreadPool.java:350)
Salin selepas log masuk

The server totally gave no response. So I checked the connection number:

netstat -an | grep ESTABLISHED | wc –l
54
Salin selepas log masuk

And process number:

ps -ef|wc -l
564
Salin selepas log masuk

The both seemed normal. One teammate remaindered me I should have a look at the threads of Jetty:

ps uH p 31636 | wc -l
Salin selepas log masuk

I couldn’t find anything abnormal. I knew I might increase the thread up limit but that would not have any help to resolve the issue, at most it might only delay the issue’s happening.

Actually, this issue was a little more complex than we thought indeed.

The threads were running out from this morning, and client could not connect to our service. I took a snapshot (jstack -F -l 31636) for the stacks in JVM, finding that almost all the threads were hung reading from cache server:

Thread 19795: (state = IN_NATIVE)
 - sun.nio.ch.FileDispatcher.read0(java.io.FileDescriptor, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int) @bci=4, line=21 (Compiled frame)
 - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=116, line=198 (Compiled frame)
 - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=31, line=166 (Compiled frame)
 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=101, line=245 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.readFromChannel() @bci=25 (Interpreted frame)
 - com.schooner.MemCached.SockInputStream.read() @bci=31 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.getLine() @bci=9 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.String, java.lang.Integer, boolean) @bci=583 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.Integer) @bci=6 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String) @bci=3 (Compiled frame)
 - com.danga.MemCached.MemCachedClient.get(java.lang.String) @bci=5 (Compiled frame)
 - amazon.forecast.util.cache.memcached.BSFMemcachedClient.get(java.lang.String, java.lang.String) @bci=14, line=80 (Compiled frame)
 - amazon.forecast.distribution.Forecaster.getForecastDistributions(amazon.forecast.distribution.IForecastDistributionProvider, amazon.forecast.distribution.ForecastRequest)
 ...
Salin selepas log masuk

On my understanding there should be some error on the connection from the client to cache server, then I checked the cache server but found it’s correctly and stably running, and of course, the network was good. We invoked?Memcached service manually several times thru telnet, it did work normally.?I took several snapshots in series again and got the same result — it seemed all the threads were being hung on the “read0″ method.

However, after restarting the client the issue disappeared. By then nothing could explain why all the threads were being hung reading from cache server. One of my teammate told me if the tcp connection was dropped and the client was trying to read, it would not fail until the retransmission time out. The retransmission would happen in a back-off manner, and it would take long time for “read” to fail. Besides,?It didn’t matter whether or not we were able to connect to the cache server. The exact tcp connection was broken on client, and it didn’t detect and was still using the broken socket to read.

Finally, we were still unclear about the root cause, but found a loop discussing about this issue:?

https://github.com/gwhalin/Memcached-Java-Client/issues/20

A working solution is to set socket timeout to release the connection in time if unforeseen issue happens.

文章系本人原创,转载请保持完整性并注明出自《四火的唠叨》

你可能也喜欢:

Label berkaitan:
sumber:php.cn
Kenyataan Laman Web ini
Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn
Tutorial Popular
Lagi>
Muat turun terkini
Lagi>
kesan web
Kod sumber laman web
Bahan laman web
Templat hujung hadapan