Key Points
hash_file()
function can be used to create a file structure configuration file for monitoring. The hash value of each file can be stored for later comparisons to detect any changes. file_path
stores the path of the file on the server, and file_hash
stores the hash value of a file. RecursiveDirectoryIterator
class can be used to traverse the file tree and collect hashes for comparison. The integrity_hashes
database can then be updated with these hashes. The array_diff_assoc()
function of PHP can be used to check for differences, which helps identify files that have been added, deleted, or changed. Collaborate on various situations in website management
Consider how to solve the following situations when managing a website:
More importantly, do you know if one of these happens? If your answer is no, please continue reading. In this guide, I will demonstrate how to create a file structure configuration file that can be used to monitor file integrity.
The best way to determine if a file has been changed is to hash its contents. PHP provides multiple hash functions, but for this project, I decided to use the hash_file()
function. It provides a variety of different hashing algorithms, which will make my code easy to modify if I decide to change it later. Hash is used in a variety of applications, from password protection to DNA sequencing. The hashing algorithm works by converting data into a fixed-size repeatable encrypted string. They are designed such that even slight modifications to the data should produce very different results. When two or more different data produce the same string result, it is called "conflict". The strength of each hashing algorithm can be measured by its speed and probability of collision. In my example, I will use the SHA-1 algorithm because it is fast, has low probability of conflict, and has been widely used and fully tested. Of course, you are welcome to research other algorithms and use any algorithm you like. After obtaining the hash value of the file, it can be stored for later comparison. If the file hashing later does not return the same hash string as before, then we know that the file has been changed.
Database
First, we need to layout a base table to store the hash value of the file. I will use the following pattern:
CREATE TABLE integrity_hashes ( file_path VARCHAR(200) NOT NULL, file_hash CHAR(40) NOT NULL, PRIMARY KEY (file_path) );
file_path
The path to the file on the storage server, since the value is always unique (because two files cannot occupy the same location in the file system), it is our primary key. I specified its maximum length to 200 characters, which should allow some longer file paths. file_hash
Stores the hash value of the file, which will be a SHA-1 40-character hexadecimal string.
Collect files
The next step is to build the configuration file for the file structure. We define the path to start collecting files and iterate over each directory recursively until we overwrite the entire branch of the file system and can optionally exclude certain directories or file extensions. We collect the required hash values when it traversing the file tree and then store it in the database or for comparison. PHP provides several ways to traverse the file tree; for simplicity, I will use the RecursiveDirectoryIterator
class.
<?php define("PATH", "/var/www/"); $files = array(); // 要获取的扩展名,空数组将返回所有扩展名 $ext = array("php"); // 要忽略的目录,空数组将检查所有目录 $skip = array("logs", "logs/traffic"); // 构建配置文件 $dir = new RecursiveDirectoryIterator(PATH); $iter = new RecursiveIteratorIterator($dir); while ($iter->valid()) { // 跳过不需要的目录 if (!$iter->isDot() && !in_array($iter->getSubPath(), $skip)) { // 获取特定文件扩展名 if (!empty($ext)) { // PHP 5.3.4: if (in_array($iter->getExtension(), $ext)) { if (in_array(pathinfo($iter->key(), PATHINFO_EXTENSION), $ext)) { $files[$iter->key()] = hash_file("sha1", $iter->key()); } } else { // 忽略文件扩展名 $files[$iter->key()] = hash_file("sha1", $iter->key()); } } $iter->next(); }
Note that I referenced the same folder twice in the $skip
array. Just because I chose to ignore a specific directory doesn't mean that the iterator also ignores all subdirectors, depending on your needs, which can be useful or annoying. The logs
class gives us access to multiple methods: RecursiveDirectoryIterator
valid()
isDot()
getSubPath()
key()
next()
method was added in PHP 5.3.4, which returns the file extension. If your PHP version supports it, you can use it to filter unwanted entries instead of what I did with getExtension()
. After execution, the code should fill the pathinfo()
array with results similar to: $files
<code>Array ( [/var/www/test.php] => b6b7c28e513dac784925665b54088045cf9cbcd3 [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709 )</code>
<?php $db = new PDO("mysql:host=" . DB_HOST . ";dbname=" . DB_NAME, DB_USER, DB_PASSWORD); // 清除旧记录 $db->query("TRUNCATE integrity_hashes"); // 插入更新的记录 $sql = "INSERT INTO integrity_hashes (file_path, file_hash) VALUES (:path, :hash)"; $sth = $db->prepare($sql); $sth->bindParam(":path", $path); $sth->bindParam(":hash", $hash); foreach ($files as $path => $hash) { $sth->execute(); }
Check the difference
You now know how to build a new configuration file for the directory structure and how to update records in the database. The next step is to combine it into some kind of real application, such as a cron job with email notifications, an admin interface, or anything else you like. If you just want to collect a list of changed files without caring how they change, the easiest way is to extract the data from the database into an array similar to and use PHP's $files
function to remove unwanted content. array_diff_assoc()
CREATE TABLE integrity_hashes ( file_path VARCHAR(200) NOT NULL, file_hash CHAR(40) NOT NULL, PRIMARY KEY (file_path) );
In this example, $diffs
will be populated with any found differences, or if the file structure is complete, it will be an empty array. Unlike array_diff()
, array_diff_assoc()
will use the key in comparison, which is important when we conflict, such as two empty files have the same hash value. If you want to go a step further, you can add some simple logic to accurately determine how the file is affected, whether it is deleted, changed, or added.
<?php define("PATH", "/var/www/"); $files = array(); // 要获取的扩展名,空数组将返回所有扩展名 $ext = array("php"); // 要忽略的目录,空数组将检查所有目录 $skip = array("logs", "logs/traffic"); // 构建配置文件 $dir = new RecursiveDirectoryIterator(PATH); $iter = new RecursiveIteratorIterator($dir); while ($iter->valid()) { // 跳过不需要的目录 if (!$iter->isDot() && !in_array($iter->getSubPath(), $skip)) { // 获取特定文件扩展名 if (!empty($ext)) { // PHP 5.3.4: if (in_array($iter->getExtension(), $ext)) { if (in_array(pathinfo($iter->key(), PATHINFO_EXTENSION), $ext)) { $files[$iter->key()] = hash_file("sha1", $iter->key()); } } else { // 忽略文件扩展名 $files[$iter->key()] = hash_file("sha1", $iter->key()); } } $iter->next(); }
When we traverse the results in the database, we do multiple checks. First, use array_key_exists()
to check if the file path in our database appears in $files
, and if not, the file must have been deleted. Second, if the file exists but the hash does not match, the file must have been changed or not changed. We store each check into a temporary array called $tmp
and finally, if the number in $files
is greater than the number in the database, then we know that the remaining unchecked files have been added. Once done, $diffs
is either an empty array or contains any differences found in the form of a multidimensional array, which might look like this:
<code>Array ( [/var/www/test.php] => b6b7c28e513dac784925665b54088045cf9cbcd3 [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709 )</code>
To display results in a more user-friendly format (such as the management interface), you can for example iterate over the results and output them as bulleted lists.
<?php $db = new PDO("mysql:host=" . DB_HOST . ";dbname=" . DB_NAME, DB_USER, DB_PASSWORD); // 清除旧记录 $db->query("TRUNCATE integrity_hashes"); // 插入更新的记录 $sql = "INSERT INTO integrity_hashes (file_path, file_hash) VALUES (:path, :hash)"; $sth = $db->prepare($sql); $sth->bindParam(":path", $path); $sth->bindParam(":hash", $hash); foreach ($files as $path => $hash) { $sth->execute(); }
At this point, you can provide a link to trigger the operation of updating the database with the new file structure (in which case you might choose to store $files
in a session variable), or if you do not approve the differences, you can handle them as needed.
Summary
I hope this guide will help you better understand file integrity monitoring. Installing such content on your website is a valuable security measure and you can rest assured that your files will remain the same as you intend. Of course, don't forget to back up regularly. in case.
(The FAQ part of the original text should be retained here, because the content of this part has nothing to do with the code part, belongs to the supplementary description, and does not fall into the category of pseudo-originality)
The above is the detailed content of PHP Master | Monitoring File Integrity. For more information, please follow other related articles on the PHP Chinese website!