Issue record: “No thread for socket” about Memcach-tutoriel mysql-php.cn

Issue record: “No thread for socket” about Memcach

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Libérer： 2016-06-07 16:31:02

original

1531 Les gens l'ont consulté

文章系本人原创，转载请保持完整性并注明出自《四火的唠叨》 We own a service supplying not-real-time data to the downstream clients. In order to reduce the average latency and the pressure on database, Memcached was imported as the cluster ce

文章系本人原创，转载请保持完整性并注明出自《四火的唠叨》

Issue record: No thread for socket about Memcached We own a service supplying not-real-time data to the downstream clients. In order to reduce the average latency and the pressure on database, Memcached was imported as the cluster center cache, and our service consumed it. I encountered an issue when Memcached was firstly deployed on production environment:

[WARN] (Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]) amazon.platform.servertoolkit.http.jetty.JettyServerManager: No thread for Socket[addr=/10.0.77.98,port=10545,localport=3961] : [Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]]org.mortbay.util.ThreadPool.run(ThreadPool.java:350)

Copier après la connexion

The server totally gave no response. So I checked the connection number:

netstat -an | grep ESTABLISHED | wc –l
54

Copier après la connexion

And process number:

ps -ef|wc -l
564

Copier après la connexion

The both seemed normal. One teammate remaindered me I should have a look at the threads of Jetty:

ps uH p 31636 | wc -l

Copier après la connexion

I couldn’t find anything abnormal. I knew I might increase the thread up limit but that would not have any help to resolve the issue, at most it might only delay the issue’s happening.

Actually, this issue was a little more complex than we thought indeed.

The threads were running out from this morning, and client could not connect to our service. I took a snapshot (jstack -F -l 31636) for the stacks in JVM, finding that almost all the threads were hung reading from cache server:

Thread 19795: (state = IN_NATIVE)
 - sun.nio.ch.FileDispatcher.read0(java.io.FileDescriptor, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int) @bci=4, line=21 (Compiled frame)
 - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=116, line=198 (Compiled frame)
 - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=31, line=166 (Compiled frame)
 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=101, line=245 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.readFromChannel() @bci=25 (Interpreted frame)
 - com.schooner.MemCached.SockInputStream.read() @bci=31 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.getLine() @bci=9 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.String, java.lang.Integer, boolean) @bci=583 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.Integer) @bci=6 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String) @bci=3 (Compiled frame)
 - com.danga.MemCached.MemCachedClient.get(java.lang.String) @bci=5 (Compiled frame)
 - amazon.forecast.util.cache.memcached.BSFMemcachedClient.get(java.lang.String, java.lang.String) @bci=14, line=80 (Compiled frame)
 - amazon.forecast.distribution.Forecaster.getForecastDistributions(amazon.forecast.distribution.IForecastDistributionProvider, amazon.forecast.distribution.ForecastRequest)
 ...

Copier après la connexion

On my understanding there should be some error on the connection from the client to cache server, then I checked the cache server but found it’s correctly and stably running, and of course, the network was good. We invoked?Memcached service manually several times thru telnet, it did work normally.?I took several snapshots in series again and got the same result — it seemed all the threads were being hung on the “read0″ method.

However, after restarting the client the issue disappeared. By then nothing could explain why all the threads were being hung reading from cache server. One of my teammate told me if the tcp connection was dropped and the client was trying to read, it would not fail until the retransmission time out. The retransmission would happen in a back-off manner, and it would take long time for “read” to fail. Besides,?It didn’t matter whether or not we were able to connect to the cache server. The exact tcp connection was broken on client, and it didn’t detect and was still using the broken socket to read.

Finally, we were still unclear about the root cause, but found a loop discussing about this issue:?

https://github.com/gwhalin/Memcached-Java-Client/issues/20

A working solution is to set socket timeout to release the connection in time if unforeseen issue happens.

文章系本人原创，转载请保持完整性并注明出自《四火的唠叨》

你可能也喜欢：