MySQL OOM 系统二 OOM Killer_MySQL-mysql教程-PHP中文网

MySQL OOM 系统二 OOM Killer_MySQL

php中文网

发布： 2016-08-20 08:48:12

原创

1367人浏览过

这里就涉及到一个问题，到底kill掉谁呢？一般稍微了解一些linux内核的同学第一反应是谁用的最多，就kill掉谁。这当然是linux内核首先考虑的一种重要因素，但是也不完全是这样的，我们查一些linux的内核方面的资料，可以知道其实kill谁是由/proc//oom_score来决定的，这个值每个进程一个，是由linux内核的oom_badness()函数负责计算的。那下面我们来仔细读一读badness()函数。

在badness()函数的注释部分，写明了badness()函数的处理思路：

         1) we lose the minimum amount of work done
         2) we recover a large amount of memory
         3) we don't kill anything innocent of eating tons of memory
         4) we want to kill the minimum amount of processes (one)
         5) we try to kill the process the user expects us to kill, this algorithm has been meticulously tuned to meet the principle of least surprise ... (be careful when you change it)

总的来说就是Kill掉最小数量的进程来获取最大数量的内存，这与我们Kill掉占用内存最大的进程是吻合的。

        /*
         * The memory size of the process is the basis for the badness.
         */

points = p->mm->total_vm;

分数的起始是进程实际使用的RAM内存，注意这里不包括SWAP，即OOM Killer只会与进程实际的物理内存有关，与Swap是没有关系的，并且我们可以看到，进程实际使用的物理内存越多，分数就越高，分数越高就越容易被牺牲掉。

        /*
         * Processes which fork a lot of child processes are likely
         * a good choice. We add the vmsize of the childs if they
         * have an own mm. This prevents forking servers to flood the
         * machine with an endless amount of childs
         */
          ...
                  if (chld->mm != p->mm && chld->mm)
                        points += chld->mm->total_vm;

这段表示子进程占用的内存都会计算到父进程上。

        s = int_sqrt(cpu_time);
        if (s)
                points /= s;
        s = int_sqrt(int_sqrt(run_time));
        if (s)
                points /= s;

这表明进程占用的CPU时间越长或者进程运行的时间越长，分数越低，越不容易被Kill掉。

       /*
        * Niced processes are most likely less important, so double
        * their badness points.
        */
        if (task_nice(p) > 0)
                points *= 2;

如果进程优先级低（nice值，正值低优先级，负值高优先级），则Point翻倍。

       /*
        * Superuser processes are usually more important, so we make it
        * less likely that we kill those.
        */
        if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
                                p->uid == 0 || p->euid == 0)
                points /= 4;

super用户的进程优先级较低。

        /*
         * We don't want to kill a process with direct hardware access.
         * Not only could that mess up the hardware, but usually users
         * tend to only have this flag set on applications they think
         * of as important.
         */
        if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
                points /= 4;

直接可以访问原始设备的进程优先级较高。

        /*
         * Adjust the score by oomkilladj.
         */
        if (p->oomkilladj) {
                if (p->oomkilladj > 0)
                        points oomkilladj;
                else
                        points >>= -(p->oomkilladj);

}

每个进程有个oomkilladj 可以设置该进程被kill的优先级，这个参数看上去对Point影响还是比较大的，oomkilladj 最大+15，最小是-17，越大越容易被干掉，这个值由于是移位运算，所以影响还是比较大的。

下面我写个小程序实验一下:

 #define MEGABYTE 1024*1024*1024
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 int main(int argc, char *argv[])
{
void *myblock = NULL;
myblock = (void *) malloc(MEGABYTE);
printf("Currently allocating 1GB\n");
sleep(1);
int count = 0;
while( count < 10)
{
 memset(myblock,1,100*1024*1024);
 myblock = myblock + 100*1024*1024;
 count++;
 printf("Currently allocating %d00 MB\n",count);
 sleep(10);
  }
  exit(0);
 }

登录后复制

上面的程序先申请一个1G的内存空间，然后100M为单位，填充这些内存空间。在一个2G内存，400M Swap空间的机器上跑3个上面的进程。我们看一下运行结果：