How to find the definition of a function
To start, let’s try to find the definition of the strpos function.
The first step to try is to go to the PHP 5.4 root directory and enter strpos in the search box at the top of the page. The result of the search is a large list showing where strpos appears in the PHP source code.
Since this result is not very helpful to us, we use a little trick: we search for "PHP_FUNCTION strpos" (don't miss the double quotes, they are important), instead of strpos.
Now we get two entry links:
/PHP_5_4/ext/standard/
php_string.h 48 PHP_FUNCTION(strpos);
string.c 1789 PHP_FUNCTION( strpos)
The first thing to note is that both locations are in the ext/standard folder. This is what we expect to find, because the strpos function (like most string, array, and file functions) is part of the standard extension.
Now, open the two links in a new tab and see what code is hidden behind them.
You will see that the first link takes you to the php_string.h file, which contains the following code:
// ... PHP_FUNCTION(strpos); PHP_FUNCTION(stripos); PHP_FUNCTION(strrpos); PHP_FUNCTION(strripos); PHP_FUNCTION(strrchr); PHP_FUNCTION(substr); // ...
This is a typical header file (ending with .h suffix file) looks like: a simple list of functions, the functions are defined elsewhere. In fact, we are not interested in any of this because we already know what we are looking for.
The second link is more interesting: it takes us to the string.c file, which contains the actual source code of the function.
Before I take you through this function step by step, I recommend that you try to understand this function yourself. It's a very simple function, and even though you don't know the real details, most of the code looks clear.
All PHP functions use the same basic structure. Each variable is defined at the top of the function, then the zend_parse_parameters function is called, and then the main logic comes, including the calls to RETURN_*** and php_error_docref.
So, let us start with the definition of the function:
zval *needle;
char *haystack;
char *found = NULL;
char needle_char[2];
long offset = 0;
int haystack_len;
The first line defines a pointer needle pointing to zval. zval is the definition that represents any PHP variable within PHP. What it really looks like will be discussed in the next article.
The second line defines haystack, a pointer to a single character. At this point, you need to remember that in C language, arrays represent pointers to their first element. For example, the haystack variable will point to the first character of the $haystackString variable you passed. haystack + 1 will point to the second character, haystack + 2 will point to the third character, and so on. So by incrementing the pointer one by one, the entire string can be read.
Then the problem comes, PHP needs to know where the string ends. Otherwise, it will keep incrementing the pointer without stopping. To solve this problem, PHP also saves an explicit length, which is the haystack_len variable.
Now, in the above definition, we are interested in the offset variable, which is used to save the third parameter of the function: the offset to start the search. It is defined using long, which, like int, is also an integer data type. Now the difference between the two is not important, but what you need to know is that in PHP, integer values are stored as long, and the length of strings is stored as int.
Now take a look at the following three lines:
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sz|l", &haystack, &haystack_len, &needle, &offset) == FAILURE) { return; }
What these three lines of code do is to get the parameters passed to the function, and then store them above in the declared variable.
The first parameter passed to the function is the number of parameters passed. This number is provided via the ZEND_NUM_ARGS() macro.
The next function is the TSRMLS_CC macro, which is a feature of PHP. You will find this strange macro scattered in many places in the PHP code base. Is part of the Thread-Safe Resource Manager (TSRM), which ensures that PHP does not shuffle variables across multiple threads. This is not very important to us, just ignore it when you see TSRMLS_CC (or TSRMLS_DC) in your code. (One weird thing you need to note is that there is no comma before "argument". This is because regardless of whether you create the function using thread safety, the macro will be interpreted as empty or, trsm_ls. Therefore, the comma is part of the macro .)
Now, we come to the important stuff: the "sz|l" string marks the parameters received by the function. :
s // 第一个参数是字符串 z // 第二个参数是一个zval结构体,任意的变量 | // 标识接下来的参数是可选的 l // 第三个参数是long类型(整型)
In addition to s, z, l, there are more identification types, but most of them can clearly understand their meaning from the characters. For example, b is boolean, d is double (floating point number), a is array, f is callback (function), and o is object.
接下来的参数&haystack;,&haystack;_len,&needle;,&offset;指定了需要赋值的参数的变量。你可以看到,它们都是使用引用(&)传递的,意味着它们传递的不是变量本身,而是指向它们的指针。
这个函数调用之后,haystack会包含haystack字符串,haystack_len是字符串的长度,needle是needle的值,offset是开始的偏移量。
而且,这个函数使用FAILURE(当你尝试传递无效参数到函数时会发生,比如传递一个数组赋值到字符串)来检查。这种情况下zend_parse_parameters函数会抛出警告,而此函数马上返回(会返回null给PHP的用户层代码)。
在参数解析完毕以后,主函数体开始:
if (offset < 0 || offset > haystack_len) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Offset not contained in string"); RETURN_FALSE; }
这段代码做的事情很明显,如果offset超出了边界,一个E_WARNING级别的错误会通过php_error_docref函数抛出,然后函数使用RETURN_FALSE宏返回false。
php_error_docref是一个错误函数,你可以在扩展目录找到它(比如,ext文件夹)。它的名字根据它在错误页面中返回文档参考(就是那些不会正常工作的函数)定义。还有一个zend_error函数,它主要被Zend Engine使用,但也经常出现在扩展代码中。
两个函数都使用sprintf函数,比如格式化信息,因此错误信息可以包含占位符,那些占位符会被后面的参数填充。下面有一个例子:
php_error_docref(NULL TSRMLS_CC, E_WARNING, "Failed to write %d bytes to %s", Z_STRLEN_PP(tmp), filename); // %d is filled with Z_STRLEN_PP(tmp) // %s is filled with filename
让我们继续解析代码:
if (Z_TYPE_P(needle) == IS_STRING) { if (!Z_STRLEN_P(needle)) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Empty delimiter"); RETURN_FALSE; } found = php_memnstr(haystack + offset, Z_STRVAL_P(needle), Z_STRLEN_P(needle), haystack + haystack_len); }
前面的5行非常清晰:这个分支只会在needle为字符串的情况下执行,而且如果它是空的话会抛出错误。然后到了比较有趣的一部分:php_memnstr被调用了,这个函数做了主要的工作。跟往常一样,你可以点击该函数名然后查看它的源码。
php_memnstr返回指向needle在haystack第一次出现的位置的指针(这就是为什么found变量要定义为char *,例如,指向字符的指针)。从这里可以知道,偏移量(offset)可以通过减法被简单地计算,可以在函数的最后看到:
RETURN_LONG(found - haystack);
最后,让我们来看看当needle作为非字符串的时候的分支:
else { if (php_needle_char(needle, needle_char TSRMLS_CC) != SUCCESS) { RETURN_FALSE; } needle_char[1] = 0; found = php_memnstr(haystack + offset, needle_char, 1, haystack + haystack_len); }
我只引用在手册上写的”如果 needle 不是一个字符串,那么它将被转换为整型并被视为字符顺序值。”这基本上说明,除了写strpos($str, 'A'),你还可以写strpos($str, 65),因为A字符的编码是65。
如果你再查看变量定义,你可以看到needle_char被定义为char needle_char[2],即有两个字符的字符串,php_needle_char会将真正的字符(在这里是’A’)到needle_char[0]。然后strpos函数会设置needle_char[1]为0。这背后的原因是因为,在C里面,字符串是使用’’结尾,就是说,最后一个字符被设置为NUL(编码为0的字符)。在PHP的语法环境里,这样的情况不存在,因为PHP存储了所有字符串的长度(因此它不需要0来帮助找到字符串的结尾),但是为了保证与C函数的兼容性,还是在PHP的内部实现了。
Zend functions
我对strpos这个函数感觉好累,让我们找另一个函数吧:strlen。我们使用之前的方法:
从PHP5.4源码根目录开始搜索strlen。
你会看到一堆无关的函数的使用,因此,搜索“PHP_FUNCTION strlen”。当你这么搜索的时候,你会发现一些奇怪的事情发生了:没有任何的结果。
原因是,strlen是少数通过Zend Engine而不是PHP扩展定义的函数。这种情况下,函数不是使用PHP_FUNCTION(strlen)定义,而是ZEND_FUNCTION(strlen)。因此,我们也要搜索“ZEND_FUNCTION strlen”。
我们都知道,我们需要点击没有分号结尾的链接跳到源码的定义。这个链接带我们到下面的函数定义:
ZEND_FUNCTION(strlen) { char *s1; int s1_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &s1, &s1_len) == FAILURE) { return; } RETVAL_LONG(s1_len); }
这个函数实现太简单了,我不觉得我还需要进一步的解释。
方法
我们会谈论类和对象如何工作的更多细节在其他文章里,但作为一个小小的剧透:你可以通过在搜索框搜索ClassName::methodName来搜索对象方法。例如,尝试搜索SplFixedArray::getSize。
The above is the detailed content of How to use php internal functions. For more information, please follow other related articles on the PHP Chinese website!