PHP security - filter input-PHP Tutorial-php.cn

Filter input

Filtering is the foundation of web application security. It is the process by which you verify the legitimacy of your data. By ensuring that all data is filtered upon entry, you can prevent tainted (unfiltered) data from being mistrusted and misused in your program. Most vulnerabilities in popular PHP applications ultimately result from improper input sanitization.

What I mean by filtering input is three different steps:

l # .

The reason for identifying the input as the first step is because if you don't know what it is, you can't filter it correctly. Input refers to all data originating from outside. For example, everything sent from the client is input, but the client is not the only external data source, other sources such as databases and RSS feeds are also external data sources.

The data entered by the user is very easy to identify. PHP uses two super public arrays $_GET and $_POST to store user input data. Other inputs are much harder to identify; for example, many elements in the $_SERVER array are manipulated by the client. It is often difficult to determine which elements of the $_SERVER array constitute the input, so the best approach is to treat the entire array as input.

In some cases, what you give as input depends on your point of view. For example, session data is stored on the server, and you may not think of session data as an external data source. If you hold this view, you can save the session data inside your software. It is wise to realize that the security of the session location is tied to the security of the software. The same idea can be extended to the database, you can also think of it as part of your software.

Generally speaking, it is safer to treat session save locations and databases as input, and this is what I recommend in all important PHP application development.

Once the input is recognized, you can filter it. Filtration is a somewhat formal term that has many synonyms in everyday expressions, such as verification, cleaning, and purification. Although these terms are slightly different, they all refer to the same process: preventing illegal data from entering your application.

There are many ways to filter data, some of which are more secure. The best way is to think of filtering as an inspection process. Please don't try to correct illegal data with good intentions. Let your users follow your rules. History has proven that trying to correct illegal data often leads to security vulnerabilities. For example, consider the following approach that attempts to prevent directory spanning (accessing the upper directory).

CODE:

<?php
 
  $filename = str_replace(&#39;..&#39;, &#39;.&#39;,
$_POST[&#39;filename&#39;]);
 
  ?>

Copy after login

Can you think of how $_POST['filename'] should be set so that $filename becomes the path to the user password file in the Linux system ../../etc/passwd?

The answer is simple:

  .../.../etc/passwd

Copy after login

This specific error can be replaced repeatedly until no longer found:

##CODE:

  <?php
  $filename = $_POST[&#39;filename&#39;];
  while (strpos($_POST[&#39;filename&#39;], &#39;..&#39;) !=  =
FALSE)
  {
    $filename = str_replace(&#39;..&#39;, &#39;.&#39;,
$filename);
  }
  ?>

Copy after login

# Of course, function basename( ) can replace all the logic above and achieve the purpose more safely. The important point, however, is that any attempt to correct illegal data can lead to potential errors and allow illegal data to pass through. Just checking is a safer option.

译注：这一点深有体会，在实际项目曾经遇到过这样一件事，是对一个用户注册和登录系统进行更改，客户希望用户名前后有空格就不能登录，结果修改时对用户登录程序进行了更改，用trim（）函数把输入的用户名前后的空格去掉了（典型的好心办坏事），但是在注册时居然还是允许前后有空格！结果可想而知。

除了把过滤做为一个检查过程之外，你还可以在可能时用白名单方法。它是指你需要假定你正在检查的数据是非法的，除非你能证明它是合法的。换而言之，你宁可在小心上犯错。使用这个方法，一个错误只会导致你把合法的数据当成是非法的。尽管不想犯任何错误，但这样总比把非法数据当成合法数据要安全得多。通过减轻犯错引起的损失，你可以提高你的应用的安全性。尽管这个想法在理论上是很自然的，但历史证明，这是一个很有价值的方法。

如果你能正确可靠地识别和过滤输入，你的工作就基本完成了。最后一步是使用一个命名约定或其它可以帮助你正确和可靠地区分已过滤和被污染数据的方法。我推荐一个比较简单的命名约定，因为它可以同时用在面向过程和面向对象的编程中。我用的命名约定是把所有经过滤的数据放入一个叫$clean的数据中。你需要用两个重要的步骤来防止被污染数据的注入：

l 经常初始化$clean为一个空数组。

l 加入检查及阻止来自外部数据源的变量命名为clean，

实际上，只有初始化是至关紧要的，但是养成这样一个习惯也是很好的：把所有命名为clean的变量认为是你的已过滤数据数组。这一步骤合理地保证了$clean中只包括你有意保存进去的数据，你所要负责的只是不在$clean存在被污染数据。

为了巩固这些概念，考虑下面的表单，它允许用户选择三种颜色中的一种；

CODE:

 <form action="process.php" method="POST">
  Please select a color:
  <select name="color">
    <option value="red">red</option>
    <option
value="green">green</option>
    <option
value="blue">blue</option>
  </select>
  <input type="submit" />
  </form>

Copy after login

在处理这个表单的编程逻辑中，非常容易犯的错误是认为只能提交三个选择中的一个。在第二章中你将学到，客户端能提交任何数据作为$_POST['color']的值。为了正确地过滤数据，你需要用一个switch语句来进行：

CODE:

  <?php
 
  $clean = array(  );
  switch($_POST[&#39;color&#39;])
  {
    case &#39;red&#39;:
    case &#39;green&#39;:
    case &#39;blue&#39;:
      $clean[&#39;color&#39;] = $_POST[&#39;color&#39;];
      break;
  }
 
  ?>

Copy after login

本例中首先初始化了$clean为空数组以防止包含被污染的数据。一旦证明$_POST['color']是red, green, 或blue中的一个时，就会保存到$clean['color']变量中。因此，可以确信$clean['color']变量是合法的，从而在代码的其它部分使用它。当然，你还可以在switch结构中加入一个default分支以处理非法数据的情况。一种可能是再次显示表单并提示错误。特别小心不要试图为了友好而输出被污染的数据。

上面的方法对于过滤有一组已知的合法值的数据很有效，但是对于过滤有一组已知合法字符组成的数据时就没有什么帮助。例如，你可能需要一个用户名只能由字母及数字组成：

CODE:

 <?php
 
  $clean = array(  );
 
  if (ctype_alnum($_POST[&#39;username&#39;]))
  {
    $clean[&#39;username&#39;] = $_POST[&#39;username&#39;];
  }
 
  ?>

Copy after login

尽管在这种情况下可以用正则表达式，但使用PHP内置函数是更完美的。这些函数包含错误的可能性要比你自已写的代码出错的可能性要低得多，而且在过滤逻辑中的一个错误几乎就意味着一个安全漏洞。

以上就是PHP安全-过滤输入的内容，更多相关内容请关注PHP中文网（www.php.cn）！