nginx acts as a proxy for two socket.io servers. The working mode of socket.io is polling and upgrading to websocket
Phenomena
When requesting services through nginx, a large number of 400 errors appear. Sometimes it can be upgraded to websocket. Sometimes it keeps reporting errors. But when accessing directly through ip port
, it will be 100% successful.
Analysis
sid
sid is the key to our problem. When initially creating a connection (polling mode is simulating a long connection), the client will initiate such a request:
https://***/?eio=3&transport=polling&t=1540820717277-0
After receiving it, the server will create an object, bind it to the connection, and return a sid (session id) to mark the session. What does a session refer to? A session is a series of interactions, and these interactions are related. In our scenario, when the next http request comes, I need to find the long connection that was previously bound to the theory (not yet here) websocket, so theoretically). We know that http requests are stateless and each request is independent, so socket.io introduced sid to do this. The server will generate a sid after receiving the request. Look at the response:
Copy code The code is as follows:
{"sid":"eogal3frqlptoalp5est","upgrades":["websocket"] ,"pinginterval":8000,"pingtimeout":10000}
Every subsequent request needs to bring this sid, and the connection to establish a websocket request is no exception. Therefore, sid is the key to polling and upgrading polling to websocket. The request after this is similar to:
https://***/?eio=3&transport=polling&t=1540820717314-1&sid=eogal3frqlptoalp5est or wss://***/?eio=3&transport=websocket&t=1540820717314-1&sid=eogal3frqlptoalp5est
Then the question is, what happens if the sid in the request is not generated by the server? The server will not recognize it, return a 400 to you, and tell you
invalid sid
This is the problem we encountered. The default load balancing strategy of nginx is polling, so the request may not be generated. If we go to the machine with this SID, we will receive a 400 at this time. If we are lucky, it may be sent to the original machine. With better luck, we can even persist until the websocket connection is established.
Solution
Two solutions are proposed here
nginx's load balancing uses ip_hash, which can guarantee a client's request All go to one server
Do not use polling mode, only use websocket
Both options have their own pros and cons. The second obvious one is that older browsers and clients that don't support websockets won't work. The first type of problem is hidden deeper. Just imagine what will happen if you add or remove machines. At this time, the mode of the ip_hash policy will change, and all previous connections will become invalid. For microservices, expansion and contraction are very frequent operations. (Especially when the product is in the development stage), this kind of lossy expansion and contraction is most likely unacceptable.
The above is the detailed content of How to solve the pitfall of nginx proxy socket.io service. For more information, please follow other related articles on the PHP Chinese website!