Whether in development, during interviews or during technical discussions, security requires in-depth understanding and mastery.
Goal The goal of this tutorial is to give you an understanding of how to secure the web applications you build. Learn how to protect against the most common security threats: SQL injection, manipulation of GET and POST variables, buffer overflow attacks, cross-site scripting attacks, in-browser data manipulation, and remote form submission. Quick Introduction to Security What is the most important part of a web application? Depending on who answers the question, the answer to this question may vary. Business people need reliability and scalability. IT support teams need robust and maintainable code. End users require beautiful user interfaces and high performance in performing tasks. However, if the answer is “security”, then everyone will agree that it is important for web applications. However, most discussions stop here. Although security is on the project checklist, often it is not addressed until the project is delivered. The number of web application projects that take this approach is staggering. Developers worked for months, adding security features only at the end to make the web application available to the public. The result is often a mess or even a need for rework because the code has been inspected, unit tested, and integrated into a larger framework before security features were added to it. After adding security, major components may stop working. The integration of security adds an extra burden or step to an otherwise smooth (but unsafe) process. This tutorial provides a great way to integrate security into your PHP web application. It discusses several general security topics and then dives into major security vulnerabilities and how to plug them. After completing this tutorial, you will have a better understanding of security. Topics include: SQL injection attacks Manipulating GET strings Buffer overflow attacks Cross-site scripting attacks (XSS) In-browser data manipulation Remote form submission Web Security 101 Discussing the details of implementing security Before that, it's best to discuss web application security from a high-level perspective. This section introduces some basic tenets of security philosophy that should be kept in mind no matter what kind of web application you are creating. Some of these ideas come from Chris Shiflett (whose book on PHP security is an invaluable treasure trove), some from Simson Garfinkel (see Related topics), and some from years of accumulated knowledge. Rule 1: Never trust external data or input The first thing you must realize about web application security is that external data should not be trusted. External data includes any data that is not entered directly by the programmer into the PHP code. Any data from any other source (such as GET variables, form POST, databases, configuration files, session variables, or cookies) cannot be trusted until steps are taken to ensure security. For example, the following data elements can be considered safe because they are set in PHP. Listing 1. Safe and flawless code
[php]$myUsername = 'tmyer'; $arrayUsers = array('tmyer', 'tom', 'tommy'); define("GREETING", 'hello there' . $myUsername); [/php] However, the following data elements are all flawed. 、 Listing 2. Unsafe, defective code
[PHP] $ myusername = $ _post ['username']; // Tainted! $ arrayusers = Array ($ myusername, 'tom', 'tommy'); / / /tainted! define("GREETING", 'hello there' . $myUsername); //tainted! [/php] Why is the first variable $myUsername defective? Because it comes directly from the form POST. Users can enter any string into this input field, including malicious commands to clean files or run previously uploaded files. You might ask, "Can't you avoid this danger by using a client-side (JavaScript) form validation script that only accepts the letters A-Z?" Yes, this is always a beneficial step, but as we'll see later , anyone can download any form to their machine, modify it, and resubmit whatever they need. The solution is simple: the sanitization code must be run on $_POST['username']. If you don't do this, you risk polluting these objects any other time you use $myUsername (such as in an array or constant). A simple way to sanitize user input is to use regular expressions to process it. In this example, only letters are expected to be accepted. It might also be a good idea to limit the string to a specific number of characters, or require all letters to be lowercase. Listing 3. Making user input safe
[php]$myUsername = cleanInput($_POST['username']); //clean! $arrayUsers = array($myUsername, 'tom', 'tommy') ; //clean! define("GREETING", 'hello there' . $myUsername); //clean! function cleanInput($input){ $clean = strtolower($input); $clean = preg_replace("/[^a-z]/ ", "", $clean); $clean = substr($clean,0,12); return $clean; }[/php] Rule 2: Disable PHP settings that make security difficult to implement Already Now that you can't trust user input, you should also know that you shouldn't trust the way PHP is configured on the machine. For example, make sure register_globals is disabled. If register_globals is enabled, it's possible to do careless things like use a $variable to replace a GET or POST string with the same name. By disabling this setting, PHP forces you to reference the correct variable in the correct namespace. To use variables from a form POST, $_POST['variable'] should be quoted. This way you won't mistake this particular variable for a cookie, session, or GET variable. The second setting to check is the error reporting level. During development, you want to get as many error reports as possible, but when you deliver the project, you want the errors to be logged to a log file rather than displayed on the screen. why? Because malicious hackers can use error reporting information (such as SQL errors) to guess what the application is doing. This kind of reconnaissance can help hackers breach the application. To close this vulnerability, edit the php.ini file to provide a suitable destination for the error_log entries and set display_errors to Off. Rule 3: If you can’t understand it, you can’t protect it Some developers use strange syntax, or organize statements very tightly, resulting in short but ambiguous code. This approach may be efficient, but if you don't understand what the code is doing, you can't decide how to protect it. For example, which of the two pieces of code below do you like? Listing 4. Make the code easy to protect
$input = ”; if (isset($_POST['username'])){ $input = $_POST['username']; }else{ $input = ”; }[/php] In the second clearer code snippet, it's easy to see that $input is flawed and needs to be cleaned up before it can be safely processed. Rule 4: "Defense in depth" is the new magic This tutorial will use examples. to illustrate how to protect online forms while taking the necessary steps in the PHP code that handles the form. Likewise, even if you use PHP regex to ensure that GET variables are entirely numeric, you can still take steps to ensure that SQL queries use escaped user input Defense in depth is not just a good idea, it ensures that you don't get into serious trouble Now that the basic rules have been discussed, let's look at the first threat: SQL injection attacks Preventing SQL injection attacks In SQL injection. In the attack, the user adds information to the database query by manipulating the form or GET query string. For example, consider a simple login database where each record has a username field and a password field. A login form that allows users to log in. Listing 5. Simple login form
[php]
Login
< ;/body> [/php] This form accepts a username and password entered by the user and submits the user input to a file named verify.php. In this file, PHP handles the data from the login form, as shown below: Listing 6. Insecure PHP form handling code . 'user']; $pw = $_POST['pw']; $sql = "select count(*) as ctr from users where username='".$username."' and password='". $ pw."' limit 1"; $result = mysql_query($sql); while ($data = mysql_fetch_object($result)){ if ($data->ctr == 1){ //they' re okay to enter the application! $okay = 1; } } if ($okay){ $_SESSION['loginokay'] = true; header("index.php"); }else{ header("login.php") ; } ?> [/php] This code looks fine, right? Code like this is used by hundreds (if not thousands) of PHP/MySQL sites around the world. What's wrong with it? Well, remember "user input cannot be trusted". No information from the user is escaped here, thus leaving the application vulnerable. Specifically, any type of SQL injection attack is possible. For example, if the user enters foo as the username and ' or '1′='1 as the password, the following string is actually passed to PHP, which then passes the query to MySQL: $sql = “select count( *) as ctr from users where username='foo' and password=” or '1′='1′ limit 1″; This query always returns a count of 1, so PHP will allow access via the password characters. By injecting some malicious SQL at the end of the string, the hacker can pretend to be a legitimate user. The solution to this problem is to use PHP's built-in mysql_real_escape_string() function as a wrapper for any user input. Characters are escaped, making it impossible to pass special characters such as apostrophes in the string and letting MySQL operate based on the special characters. Listing 7. Safe PHP form processing code ]$okay = 0; $username = $_POST['user']; $pw = $_POST['pw']; $sql = “select count(*) as ctr from users where username='".mysql_real_escape_string($username)."' and password='". mysql_real_escape_string($pw)."' limit 1″;
$result = mysql_query($sql); while ($data = mysql_fetch_object($result)){ if ($data->ctr == 1){ //they're okay to enter the application! $okay = 1; } } if ($okay){ $_SESSION['loginokay'] = true; header("index.php"); }else{ header("login.php"); } ?>[/php] Use mysql_real_escape_string() As a wrapper around user input, you avoid any malicious SQL injection in user input. If a user attempts to pass a malformed password via SQL injection, the following query will be passed to the database: select count(*) as ctr from users where username='foo' and password='' or '1'='1′ limit 1″ Nothing in the database matches such a password. With just one simple step, you plugged a big hole in your web application. The rule of thumb here is that user input for SQL queries should always be escaped. However, there are still several security holes that need to be plugged. The next item is to manipulate the GET variable. Prevent users from manipulating variables In the previous section, users were prevented from logging in with malformed passwords. If you're smart, you should apply what you learned to ensure that all user input to your SQL statements is escaped. However, the user is now safely logged in. Just because a user has a valid password doesn't mean he will play by the rules - there are many opportunities for him to do damage. For example, an application might allow users to view special content. All links point to locations like template.php?pid=33 or template.php?pid=321. The part of the URL after the question mark is called the query string. Because the query string is placed directly in the URL, it is also called a GET query string. In PHP, if register_globals is disabled, this string can be accessed with $_GET['pid']. In the template.php page, you might do something similar to Listing 8. Listing 8. Example template.php content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… ?> [/php] What’s here Wrong? First, the GET variable pid from the browser is implicitly trusted to be safe. What will happen? Most users are not smart enough to construct semantic attacks. However, if they notice pid=33 in the browser's URL location field, they might start causing trouble. If they put in another number, then that's probably fine; but if they put in something else, like a SQL command or the name of a file (like /etc/passwd), or some other mischief like 3,000 characters long value, what happens? In this case, remember the basic rule, don't trust user input. Application developers know that personal identifiers (PIDs) accepted by template.php should be numeric, so they can use PHP's is_numeric() function to ensure that non-numeric PIDs are not accepted, as shown below: Listing 9. Using is_numeric() to Limit GET variables new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… }else{ // didn't pass the is_numeric() test, do something else! }?> [/php] This method seems to be valid, but the following inputs can easily pass the is_numeric() check: 100 (valid ) 100.1 (Shouldn’t have decimal places) +0123.45e6 (Scientific notation – bad) 0xff33669f (Hex – Danger! Danger!) So, what should security-conscious PHP developers do? Woolen cloth? Many years of experience shows that the best way is to use regular expressions to ensure that the entire GET variable is composed of numbers, as shown below: Listing 10. Use regular expression to limit GET variables [php] & lt;? PHP $ pid = $_GET['pid']; if (strlen($pid)){ if (!ereg("^[0-9]+$",$pid)){ //do something appropriate, like maybe logging them out or sending them back to home page } }else{ //empty $pid, so send them back to the home page }
//we create an object of a fictional class Page, which is now //moderately protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… ?>[/php] All you need to do is use strlen() to check whether the length of the variable is non-zero; if so, use a full number Regular expressions to ensure that data elements are valid. If the PID contains letters, slashes, periods, or anything resembling hexadecimal, then this routine captures it and blocks the page from user activity. If you look behind the scenes of the Page class, you'll see that security-conscious PHP developers have escaped the user input $pid, thereby protecting the fetchPage() method, as shown below: Listing 11. The fetchPage() ) method to escape
[php]class Page{ function fetchPage($pid){ $sql = “select pid,title,desc,kw,content, status from page where pid=' ”.mysql_real_escape_string($pid).”'”; //etc, etc…. } } ?> [/php] You may ask, “Since we have ensured that the PID is a number, why still To escape?" Because I don't know how many different contexts and situations the fetchPage() method is used. Protection must be provided everywhere where this method is called, and escaping in the method embodies the meaning of defense in depth. What happens if the user tries to enter a very long value, such as up to 1000 characters, trying to launch a buffer overflow attack? The next section discusses this in more detail, but for now you can add another check to ensure that the input PID is of the correct length. You know that the maximum length of the database's pid field is 5 digits, so you can add the following check. Listing 12. Using regular expressions and length checks to limit GET variables
[php]$pid = $_GET['pid']; if (strlen($pid)){ if (! ereg("^[0-9]+$",$pid) && strlen($pid) > 5){ //do something appropriate, like maybe logging them out or sending them back to home page } }else{ //empty $pid, so send them back to the home page } //we create an object of a fictional class Page, which is now //even more protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… ?> [/php ] Now, no one can cram a 5,000-digit value into a database application -- at least not where GET strings are involved. Just imagine the hackers gnashing their teeth when they are frustrated trying to break into your application! And because error reporting is turned off, it's harder for hackers to conduct reconnaissance. Buffer Overflow Attack Buffer Overflow Attack An attempt to overflow a memory allocation buffer in a PHP application (or, more precisely, in Apache or the underlying operating system). Keep in mind that you may be writing your web application in a high-level language like PHP, but ultimately you're calling C (in the case of Apache). Like most low-level languages, C has strict rules for memory allocation. Buffer overflow attack sends a large amount of data to the buffer, causing part of the data to overflow into adjacent memory buffers, thereby destroying the buffer or rewriting the logic. This can cause a denial of service, corrupt data, or execute malicious code on the remote server. The only way to prevent buffer overflow attacks is to check the length of all user input. For example, if there is a form element that requires the user's name, then add a maxlength attribute with a value of 40 on this field and check it using substr() on the backend. Listing 13 gives a brief example of the form and PHP code. Listing 13. Checking the length of user input ,40); //continue processing…. } ?> $_SERVER['PHP_SELF'];?>” method=”post”>
“name” id=”name” size=”20″ maxlength=”40″/ >
[/php] Why Provide both the maxlength attribute and the substr() check on the backend? Because defense in depth is always good. The browser prevents users from entering very long strings that PHP or MySQL cannot safely handle (imagine someone trying to enter a name that is up to 1,000 characters long), while backend PHP checks ensure that no one is manipulating form data remotely or in the browser . As you can see, this approach is similar to using strlen() in the previous section to check the length of the GET variable pid. In this example, any input value longer than 5 digits is ignored, but the value can easily be truncated to the appropriate length, as shown below: Listing 14. Changing the length of the input GET variable
[php] $pid = $_GET['pid']; if (strlen($pid)){ if (!ereg("^[0-9]+$",$pid)){ / /if non numeric $pid, send them back to home page } }else{ //empty $pid, so send them back to the home page } //we have a numeric pid, but it may be too long, so let's check if (strlen($pid)>5){ $pid = substr($pid,0,5); } //we create an object of a fictional class Page, which is now //even more protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page // … //… ?>[/php] Note that buffer overflow attacks are not limited to long strings of numbers or letters. You may also see long hexadecimal strings (often looking like xA3 or xFF). Remember, the goal of any buffer overflow attack is to flood a specific buffer and place malicious code or instructions into the next buffer, thereby corrupting data or executing malicious code. The simplest way to deal with hex buffer overflow is to not allow input to exceed a certain length. If you are dealing with a form text area that allows longer entries in the database, there is no way to easily limit the length of the data on the client side. After the data reaches PHP, you can use regular expressions to clear out any hex-like strings. Listing 15. Preventing hexadecimal strings ,0,40); //clean out any potential hexadecimal characters $name = cleanHex($name); //continue processing…. } function cleanHex($input){ $clean = preg_replace(”! [][xX]([A-Fa-f0-9]{1,3})!”, “”,$input); return $clean; } ?> [/php] You may find this series of operations a bit My son is too strict. After all, hexadecimal strings have legitimate uses, such as printing characters in a foreign language. How you deploy the hex regex is up to you. A better strategy is to only remove hex strings if there are too many of them on a line, or if the string exceeds a certain number of characters (such as 128 or 255). Cross-site scripting attacks In a cross-site scripting (XSS) attack, there is often a malicious user entering information into a form (or through other user input methods) that inserts malicious client-side tags into a process or database. For example, let's say you have a simple guest book program on your site that allows visitors to leave their name, email address, and a brief message. A malicious user could take advantage of this opportunity to insert something other than a brief message, such as an image that would be inappropriate for other users or JavaScript that would redirect the user to another site, or steal cookie information. Fortunately, PHP provides the strip_tags() function, which can remove any content surrounded by HTML tags. The strip_tags() function also allows you to provide a list of allowed tags, such as or . Listing 16 gives an example that builds on the previous example.用户 List 16. Clear the html mark from the user input
[php] & lt ;? php if ($ _post ['submit'] == "go") { // strip_tags $ name = strip_tags ($ _ post [ 'name']); $name = substr($name,0,40); //clean out any potential hexadecimal characters $name = cleanHex($name); //continue processing…. } function cleanHex($input){ $clean = preg_replace ("![][xX]([A-Fa-f0-9]{1,3})!", "",$input); return $clean ; } ?> “” method=”post”>
[/php] From a security perspective, for public user input Using strip_tags() is necessary. If the form is in a protected area (such as a content management system) and you trust users to perform their tasks correctly (such as creating HTML content for a Web site), then using strip_tags() may be unnecessary and affect productivity . One more question: If you want to accept user input, such as comments on a post or a guest entry, and need to display this input to other users, then be sure to put the response in PHP's htmlspecialchars() function. This function converts the ampersand, symbols into HTML entities. For example, the ampersand (&) becomes &. In this case, even if the malicious content escapes the processing of strip_tags() on the front end, it will be processed by htmlspecialchars() on the back end. In-browser data manipulation There is a type of browser plug-in that allows users to tamper with header elements and form elements on the page. Using Tamper Data, a Mozilla plug-in, it's easy to manipulate simple forms with many hidden text fields to send instructions to PHP and MySQL. Before the user clicks Submit on the form, he can start Tamper Data. When submitting the form, he will see a list of form data fields. Tamper Data allows the user to tamper with this data before the browser completes the form submission. Let’s go back to the example we built earlier. String length has been checked, HTML tags cleaned, and hexadecimal characters removed. However, some hidden text fields are added, as shown below: Listing 17. Hidden variables
[/php] Note that one of the hidden variables exposes the table name :users. You'll also see an action field with a value of create. Anyone with basic SQL experience can tell that these commands probably control a SQL engine in the middleware. Someone who wants to wreak havoc can simply change the table name or provide another option, such as delete. Figure 1 illustrates the scope of damage Tamper Data can provide. Note that Tamper Data allows users to access not only form data elements, but also HTTP headers and cookies. Figure 1. Tamper Data window
The easiest way to defend against this tool is to assume that any user could potentially use Tamper Data (or a similar tool). Provide only the minimum amount of information the system needs to process the form, and submit the form to some dedicated logic. For example, the registration form should only be submitted to the registration logic. What if you have built a common form processing function and many pages use this common logic? What if you use hidden variables to control flow? For example, you might specify in a hidden form variable which database table to write to or which file repository to use. There are 4 options: Don’t change anything and secretly pray that there aren’t any malicious users on the system. Rewrite function to use safer dedicated form processing functions and avoid using hidden form variables. Use md5() or other encryption mechanisms to encrypt table names or other sensitive information in hidden form variables. Don't forget to decrypt them on the PHP side. Make the meaning of values ambiguous by using abbreviations or nicknames, and then convert these values in PHP form processing functions. For example, if you want to reference the users table, you can reference it with u or any string (such as u8y90×0jkL). The latter two options are not perfect, but they are much better than having the user easily guess the middleware logic or data model. What questions are left now? Remote form submission. Remote form submission The benefit of the Web is that you can share information and services. The downside is sharing information and services because some people do things without any scruples. Take a form as an example. Anyone can visit a Web site and create a local copy of the form using File > Save As on the browser. He can then modify the action parameter to point to a fully qualified URL (not to formHandler.php, but to http://www.yoursite.com/formHandler.php since the form is on this site) and do what he wants If you make any changes, click Submit, and the server will receive this form data as a legal communication flow. First of all, you may consider checking $_SERVER['HTTP_REFERER'] to determine whether the request comes from your own server. This method can block most malicious users, but it cannot block the most sophisticated hackers. These people are smart enough to tamper with the referrer information in the header to make the remote copy of the form look like it was submitted from your server. A better way to handle remote form submission is to generate a token based on a unique string or timestamp and put this token in the session variable and the form. After submitting the form, check if the two tokens match. If it doesn't match, you know someone is trying to send data from a remote copy of the form. To create random token, you can use PHP built -in MD5 (), Uniqid (), and RAND () functions, as shown below: List 18. Defense remote forms submit [php] & lt;? Php session_start ( ); if ($_POST['submit'] == "go"){ //check token if ($_POST['token'] == $_SESSION['token']){ //strip_tags $name = strip_tags($_POST['name']); $name = substr($name,0,40); //clean out any potential hexadecimal characters $name = cleanHex($name); // continue processing…. }else{ //stop all processing! remote form posting attempt! } } $token = md5(uniqid(rand(), true)); $_SESSION['token']= $token; function cleanHex($input){ $clean = preg_replace(”![][xX] ([A-Fa-f0-9]{1,3})!", "",$input); return $clean; } ?>
[/php] This technique works because session data cannot be migrated between servers in PHP. Even if someone obtains your PHP source code, moves it to their own server, and submits information to your server, all your server will receive is an empty or malformed session token and the originally provided form token. They don't match and the remote form submission fails. Conclusion This tutorial discussed a number of issues: Using mysql_real_escape_string() to prevent SQL injection issues. Use regular expressions and strlen() to ensure that GET data has not been tampered with. Use regular expressions and strlen() to ensure user-submitted data does not overflow memory buffers. Use strip_tags() and htmlspecialchars() to prevent users from submitting potentially harmful HTML tags. Prevent the system from being breached by tools like Tamper Data. Use a unique token to prevent users from submitting forms to the server remotely. This tutorial does not cover more advanced topics such as file injection, HTTP header spoofing, and other vulnerabilities. However, what you learn can help you add enough security right away to make your current project safer.
The above has introduced the security of PHP applications, including application and security aspects. I hope it will be helpful to friends who are interested in PHP tutorials.
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn