libmemcached的MEMCACHED_MAX_BUFFER问题
最近给服务增加了一个cache_put_latency指标,加了之后,吓了一跳。发现往memcached put一个10KB左右的数据,latency居然有7ms左右,难于理解,于是花了一些精力找原因。我分别写了一个shell和C++的测试程序。 1、shell脚本使用nc发送set命令。 #/bin/env ba
最近给服务增加了一个cache_put_latency指标,加了之后,吓了一跳。发现往memcached put一个10KB左右的数据,latency居然有7ms左右,难于理解,于是花了一些精力找原因。我分别写了一个shell和C++的测试程序。
1、shell脚本使用nc发送set命令。
#/bin/env bash let s=1 let i=0 let len=8*1024 while true do if (( i >= $len )) then break fi str=${str}1 let i++ done let i=0 begin_time=`date +%s` while true do if (( i >= 1000 )) then break fi printf "set $i 0 0 $len\r\n${str}\r\n" | nc 10.234.4.24 11211 if [[ $? -eq 0 ]] then echo "echo key: $i" fi let i++ done end_time=`date +%s` let use_time=end_time-begin_time echo "set time consumed: $use_time" let i=0 begin_time=`date +%s` while true do if (( i >= 1000 )) then break fi printf "get $i\r\n" | nc 10.234.4.22 11211 > /dev/null 2>&1 let i++ done end_time=`date +%s` let use_time=end_time-begin_time echo "get time consumed: $use_time"
2、C++程序则通过libmemcached set。
#include <iostream> #include <map> #include <string> #include <sys> #include <time.h> #include <stdlib.h> #include "libmemcached/memcached.h" using namespace std; uint32_t item_size = 0; uint32_t loop_num = 0; bool single_server = false; std::string local_ip; std::map<:string uint32_t> servers; int64_t getCurrentTime() { struct timeval tval; gettimeofday(&tval, NULL); return (tval.tv_sec * 1000000LL + tval.tv_usec); } memcached_st* mc_init() { memcached_st * mc = memcached_create(NULL); if (mc == NULL) { cout ::iterator iter; for (iter = servers.begin(); iter != servers.end(); ++iter) { if (single_server && iter->first != local_ip) { continue; } memcached_return rc = memcached_server_add(mc, iter->first.c_str(), iter->second); if(rc != MEMCACHED_SUCCESS) { cout first first <p>测试发现二者的结果是相背的。shell脚本set 1000次8KB的item,只要3s左右,平均需要3ms。而C++版本则需要39s左右,平均耗时39ms。照理说shell脚本需要不断连接服务器和启动nc进程,应该更慢才对。我用ltrace跟踪了一下,发现8KB的数据需要发送两次,两次write都是非常快的,但是等memcached返回时用了很多时间,主要的时间就耗费在这个地方。</p> <pre class="brush:php;toolbar:false"> 23:32:37.069922 [0x401609] memcached_set(0x19076200, 0x7fffdad68560, 32, 0x1907a570, 8192 <unfinished ...> 23:32:37.070034 [0x3f280c5f80] SYS_write(3, "set 29 0 600 8192\r\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8196) = 8196 23:32:37.071657 [0x3f280c5f80] SYS_write(3, "aaaaaaaaaaaaaaa\r\n", 17) = 17 23:32:37.071741 [0x3f280c5f00] SYS_read(3, "STORED\r\n", 8196) = 8 (39ms) </unfinished>
和剑豪讨论下之后,剑豪马上去grep了一把代码,发现原来libmemcached居然有MEMCACHED_MAX_BUFFER这样一个常量,其值为8196。并且它还没有对应的memcached_behavior_set函数。在memcached_constants.h中将其直接改成81960,然后就欣喜地发现cache_put_latency从7ms降低到1ms左右。
问题完美虽然地解决了,但是意犹未尽,于是想搞明白为什么会出现这种奇怪的现象。瓶颈貌似在服务器端,于是对memcached做了一些修改。在状态切换的时候加上一个精确到微秒的时间。
static int64_t getCurrentTime() { struct timeval tval; gettimeofday(&tval, NULL); return (tval.tv_sec * 1000000LL + tval.tv_usec); } static void conn_set_state(conn *c, enum conn_states state) { assert(c != NULL); assert(state >= conn_listening && state state) { if (settings.verbose > 2) { fprintf(stderr, "%d: going from %s to %s, time: %lu\n", c->sfd, state_text(c->state), state_text(state), getCurrentTime()); } c->state = state; if (state == conn_write || state == conn_mwrite) { MEMCACHED_PROCESS_COMMAND_END(c->sfd, c->wbuf, c->wbytes); } } }
从打印的时间戳可以看出来,时间主要花在conn_nread状态处理代码中。最后定位到第二次read花费的时间非常多。
15: going from conn_waiting to conn_read, time: 1348466584440118 15: going from conn_read to conn_parse_cmd, time: 1348466584440155 NOT FOUND 98 >15 STORED 15: going from conn_nread to conn_write, time: 1348466584480099(36ms) 15: going from conn_write to conn_new_cmd, time: 1348466584480145 15: going from conn_new_cmd to conn_waiting, time: 1348466584480152
value的数据可能在conn_read中读完了,这个时候只需要memmove一下就好了。如果没有在conn_read状态中读完,那么就需要conn_nread自己来一次read了(因为套接字被设置成了异步,所以还可能需要多次read),关键就是这个read太慢了。
case conn_nread: if (c->rlbytes == 0) { complete_nread(c); break; } /* first check if we have leftovers in the conn_read buffer */ if (c->rbytes > 0) { int tocopy = c->rbytes > c->rlbytes ? c->rlbytes : c->rbytes; if (c->ritem != c->rcurr) { memmove(c->ritem, c->rcurr, tocopy); } c->ritem += tocopy; c->rlbytes -= tocopy; c->rcurr += tocopy; c->rbytes -= tocopy; if (c->rlbytes == 0) { break; } } /* now try reading from the socket */ res = read(c->sfd, c->ritem, c->rlbytes); if (res > 0) { pthread_mutex_lock(&c->thread->stats.mutex); c->thread->stats.bytes_read += res; pthread_mutex_unlock(&c->thread->stats.mutex); if (c->rcurr == c->ritem) { c->rcurr += res; } c->ritem += res; c->rlbytes -= res; break; }
折腾了好久,在libmemcached的io_flush函数前后也打了不少时间戳,发现libmemcached发送数据是非常快的。突然灵感闪现,我想起来了TCP_NODELAY这个参数,于是在libmemcached memcached_connect.c文件中的set_socket_options函数中增加了这个参数(事实上set_socket_options函数里面可以设置TCP_NODELAY,没有仔细看)。
int flag = 1; int error = setsockopt(ptr->fd, IPPROTO_TCP, TCP_NODELAY, (char *)&flag, sizeof(flag) ); if (error == -1) { printf("Couldn't setsockopt(TCP_NODELAY)\n"); exit(-1); }else { printf("set setsockopt(TCP_NODELAY)\n"); }
在不改MEMCACHED_MAX_BUFFER的情况下,现在set 100KB的item也是一瞬间的事情了。不过新的困惑又出现了,Nagle算法什么情况会起作用呢?为什么第一个包没被缓存,第二个包一定会被缓存呢?
libmemcached发送一个set命令是分成三部分的,首先是header(set 0 0 600 8192\r\n,共18个字节),然后是value(8192个字节),最后是’\r\n’(两个字节),一共是8212个字节。memcached在conn_read状态一共能读取2048+2048+4096+8196=16KB的数据,因此对于8KB的数据是完全可以在conn_read状态读完的。通过在conn_read状态处理的代码中增加下面的打印语句可以发现有些情况下,conn_read最后一次只读取了4个字节(正常情况应该是2048+2048+4096+20),剩下的16个字节放到conn_nread中读了。
res = read(c->sfd, c->rbuf + c->rbytes, avail); if (res > 0) { char buf[10240] = {0}; sprintf(buf, "%.*s", res, c->rbuf + c->rbytes); printf("avail=%d, read=%d, str=%s\n", avail, res, buf);
未设置TCP_NODELAY选项时,使用netstat可以看到客户端socket的Send-Q一直会维持在8214和8215之间。
tcp 0 8215 10.232.42.91:59836 10.232.42.91:11211 ESTABLISHED 25800/t
设置TCP_NODELAY选项时,客户端socket的Send-Q就一直为0了。
tcp 0 0 10.232.42.91:59890 10.232.42.91:11211 ESTABLISHED 26554/t.quick
原文地址:libmemcached的MEMCACHED_MAX_BUFFER问题, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



iPhone 15 Pro vs. iPhone 14 Pro: Specs Comparison Here is a spec comparison between iPhone 15 Pro Max and iPhone 14 Pro Max: iPhone 15 Pro Max iPhone 14 Pro Max Display size 6.7 inches 6.7 inches Display technology Super Retina 2,000 nits Dimensions 6.29x3.02x0.32 inches 6.33x3.06x0.31 inches Weight 221 grams 240 grams

Memcached is a commonly used caching technology that can greatly improve the performance of web applications. In PHP, the commonly used Session processing method is to store the Session file on the server's hard disk. However, this method is not optimal because the server's hard disk will become one of the performance bottlenecks. The use of Memcached caching technology can optimize Session processing in PHP and improve the performance of Web applications. Session in PHP

The latest iPhone Pro series is equipped with a powerful 48MP sensor that ensures highly detailed and crystal clear photos to capture every precious moment. However, one potential drawback is the size of full-resolution images, especially those in ProRAW format. Although the maximum storage space offered by the iPhone is 512GB, capturing a lot of ProRAW images (about 75MP each) and video (440MB per minute, 60FPS) can quickly eat up your storage space. This can cause problems if you plan to use your iPhone as your main camera for large projects or travel. But wouldn't it be great if you could take those high-resolution 48MP photos without worrying about storage limitations? it's quick

Although Apple will launch the iPhone's video playback time to let users know that the iPhone battery is almost available. But normal users don't use their iPhone to view videos all day long. 7 iPhones tested for endurance in daily applications. Includes 7 models including iPhone15ProMax, iPhone15Pro, iPhone15Plus, iPhone15, iPhone14ProMax, iPhone14 and iPhone13ProMax. Running through some daily applications, such as Spotify, Zoom, Tiktok, Headspace, think about Apps, games, etc., we can see the battery life of different iPhones. this

Caching library in PHP8.0: Memcached With the rapid development of the Internet, modern applications require efficient and reliable caching technology to improve performance and handle large amounts of data. Due to PHP's popularity and open source nature, the PHP caching library has become an essential tool in the web development community. Memcached is a widely used open source high-speed memory caching system that can handle millions of simultaneous connected cache requests and can be used in many different types of applications, such as social networks, online

java8's stream takes maxpublicstaticvoidmain(String[]args){Listlist=Arrays.asList(1,2,3,4,5,6);Integermax=list.stream().max((a,b)->{if (a>b){return1;}elsereturn-1;}).get();System.out.println(max);}Note: The size is determined here through positive and negative numbers and 0 values. Instead of writing it directly if(a>b){returna;}elseretur

According to news on August 22, as the release of Samsung’s new generation flagship mobile phone S25 Ultra approaches, more and more details are beginning to emerge. The well-known blogger @ibinguniverse revealed more specifications of the S25 Ultra on Weibo today. The most eye-catching one is that its body width is the same as the Apple iPhone 16 Pro Max, both 77.6mm. 1. Thanks to Samsung’s further optimization of the frame design, the screen size of the S25 Ultra has been increased to 6.86 inches while maintaining the same width as the iPhone 16 Pro Max, providing users with a more immersive visual experience. The blogger further pointed out in the comment area that the black edges of the S25 Ultra are better than those of the iPhone 16 Pro Max&

With the development of the Internet, PHP applications have become more and more common in the field of Internet applications. However, high concurrent access by PHP applications can lead to high CPU usage on the server, thus affecting the performance of the application. In order to optimize the performance of PHP applications, Memcached caching technology has become a good choice. This article will introduce how to use Memcached caching technology to optimize the CPU usage of PHP applications. Introduction to Memcached caching technology Memcached is a
