之前使用的是apache-tomcat-7.0.26+jdk1.6.0_31运行很久了,算是正常,因为有时候也会出现close_wait过多问题,大约2-3千吧,然后就自动恢复了。
现在升级版本到apache-tomcat-8.0.9+jdk1.7.0_72运行7-8个小时就要重启,不然就报close_wait超高一万多个,然后就报socket connect timeout,必须重启,时间也不确定就是7-8个小时,有时长点十多个小时,不一定。查看tomcat的catalina.out有大量下面错误:
27-Oct-2015 22:25:33.621 INFO [pool-1300-thread-1] org.apache.coyote.AbstractProcessor.setErrorState An error occurred in processing while on a non-container thread. The connection will be closed immediately
java.io.IOException: APR error: -32
at org.apache.coyote.http11.InternalAprOutputBuffer.writeToSocket(InternalAprOutputBuffer.java:292)
at org.apache.coyote.http11.InternalAprOutputBuffer.writeToSocket(InternalAprOutputBuffer.java:245)
at org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:214)
at org.apache.coyote.http11.AbstractOutputBuffer.flush(AbstractOutputBuffer.java:306)
at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:763)
at org.apache.coyote.Response.action(Response.java:177)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:345)
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:313)
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:110)
at net.bull.javamelody.FilterServletOutputStream.flush(FilterServletOutputStream.java:52)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.PrintWriter.flush(PrintWriter.java:320)
at com.iyd.commons.bigpipe.Pagelet.writeValue(Pagelet.java:86)
at com.iyd.commons.bigpipe.Pagelet.call(Pagelet.java:78)
at com.iyd.commons.bigpipe.Pagelet.call(Pagelet.java:1)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
不知道是不是和版本升级有关系,请各位达人帮忙看看,非常感谢!
First of all, you need to know what close_wait is. This state is similar to time_wait. Both will hold the connection for a certain period of time . We know that almost all operating systems have limits on the number of handles (number of connections) for a single process. For example, the default limit for most Linux systems is 1024.
Example
One day, your personality explodes and 1,000 people suddenly come to the website, then tomcat will consume 1,000 connections to handle these requests
When the request returns, these 1000 connections are not destroyed immediately, but all are in the wait state
At this time, another 1,000 people have come to your website (your character continues to explode), but the previous 1,000 connections cannot accept new requests (all are in wait status), so you now only have 24 connections It's available, so the other 976 people are stuck in a long wait. . . .
After 5 minutes, the first 1000 connections are gradually released, and now you have 1000 more connections available
Unfortunately, those 976 people who were stuck in waiting have almost left, and only 3 are left. But the good news is that because of the available connections, they can all happily connect to your website. Have fun playing
At this point in the story, I think you also understand what caused the problem. It is because there are too many connections in wait, causing the available connections to be overwhelmed.
So, how to optimize, the answer is very simple, disable wait, let the connection be closed immediately after returning and become a usable connection.
Take tomcat as an example
Note that keepAliveTimeout="0" is the key. Other servers will have relevant configurations. If you are interested, check out the documentation yourself.
Modify the system handle limit and set the system tcp timeout