


Match the overlap between the csdn user database and official users and filter out the overlapping users_PHP tutorial
Process:
1. Obtain the user database of csdn and import it locally
When trying to open editplus, it prompts that there is insufficient memory and cannot find a way. My colleague checked it under Linux and the basic format is as follows:
Username # Password # Email
Username# Password# Email
Corresponding data structure:
CREATE TABLE IF NOT EXISTS `csdn_userdb` (
`id` int(10) NOT NULL auto_increment,
`username` varchar(50) character set gbk NOT NULL,
`password` varchar(50) character set gbk NOT NULL,
`email` varchar(50) character set gbk NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `email` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=gbk AUTO_INCREMENT=1;
I have always suspected that fopen opens files into the cache, but practice has proved that it is very fast, so there should be no Write to the cache, the following is the code for importing data
$ link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn.net.sql ", "r");
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$ e) = explode(" # ",$buffer);
mysql_query("INSERT INTO csdn_userdb(username,password,email) VALUES ('$u','$p','$e')",$link );
if ($i%1000 == 0) echo $i."n";
}
fclose($handle);
?>
The efficiency of the above code is very poor, so the modified code is as follows:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn .net.sql", "r");
$perpage = 50;
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$e) = explode(" # ",$buffer);
$insertValue[] = "('$u','$p','$e') ";
if ($i% $perpage == 0){ $perpage == 0){
$insrtValueString = implode(',',$insertValue);
mysql_query("INSERT INTO csdn_userdb( username,password,email) VALUES $instrtValueString",$link);
echo $i."n";
$insertValue = array();
}
}
fclose($ handle);
In order to find out what factors affect the efficiency of imported data, we conducted tests based on different settings
Total CSDN user data 6428600
When $perpage=500 ;Data after import: 5,902,000; data loss 526600 Loss rate: 8%; Data table engine: MyISAM; Index: Yes; Total time taken: 15 minutes
When $perpage=200, total data after import: 6,210,200; data loss :218400; Loss rate: 3.3%; Data table engine: MYISAM; Index: Yes; Total time taken: 30 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3 %; Data table engine: INNODB; Index: Yes; Total time taken: 65 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3%; Data table engine: MYISAM ; Index: None; Total time taken: 14 minutes (rebuild the index separately after the data is imported)
When $perpage=50, the total number of data after import: 6,371,200; data loss: 57400, loss rate: 0.8%; data table engine : MYISAM; Index: None: Total time taken: 20 minutes
Based on the above situation, the summary is as follows:
1. Importing data first and then adding index is twice as efficient as adding index first and then importing data
2. The efficiency of InnoDB in single-process data insertion is much lower than that of MYISAM
3. When perpage=50, the data loss rate is less than 1%
Because there will be a timeout problem when executing through the browser, and the efficiency is low, it is run through the command line. I encountered a little trouble during the process and was delayed. It took less time
At first I executed the following code:
php.exe E:usrwwwimportcsdndb.php
But it kept reporting an error: call to undefined function mysql_connect
After much trouble, I found that php.ini was not loaded
Correct code For:
php.exe -c E:/usr/local/apache2/php.ini importcsdndb.php
2. Import the user data that needs to be matched to the local
command line Enter msyql (you don’t know how to use Baidu)
Then execute: mysql>source C:/Users/zhudong/Desktop/userdb.sql
3. Compare and filter users
After the comparison program is written, remember to add it in the command Run under the line:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
//$handle_email = fopen("E:/records_email.txt","a");
$username_num = $email_num = $uid = 0;
while ($uid<2181106) {
$nextuid=$uid+10000;
$query = mysql_query("SELECT * FROM pw_members WHERE uid>'$uid' AND uid<'$nextuid'");
while ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
$username = $rt['username'];
$email = $rt['email'];
$query2 = mysql_query("SELECT * FROM scdn_userdb WHERE username='$username' OR email='$email'");
while ($rt2 = mysql_fetch_array($query2,MYSQL_ASSOC)) {
if ($rt['password'] = md5($rt2['password'])) {
if ($rt2['username'] == $username) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'username_num='.$username_num."rn";
continue;
}
/*
if ($rt2['email'] == $email) {
$email_num++;
fwrite($handle_email,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'email_num='.$email_num."rn";
}
*/
}
}
mysql_free_result($query2);
}
$uid = $nextuid;
}
?>
您看到的以上的代码是非常蹩脚的,因为其效率特别低 ,几百万的数据,要跑10多个小时,怎么能忘记连表查询这么基本的东西呢,以下为修正后的方法
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
while($uid<2181106) {//此处的数字为要对比用户库的最大ID
$nextuid= $uid+10000;
$query = mysql_query("SELECT m.uid,m.username,m.password,m.email,u.password as csdn_password,u.email as csdn_email FROM own_members m LEFT JOIN csdn_userdb u USING(username) WHERE m.uid>'$uid' AND m.uid<='$nextuid' AND u.username!=''");
while ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
if ($rt['password'] == md5($rt['csdn_password'])) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt['username'].'|'.$rt['csdn_password'].'|'.$rt['csdn_email']."rn");
echo 'username_num='.$username_num."rn";
}
}
$uid = $nextuid;
echo 'uid='.$uid;
}
?>
总对比时间25分钟,相比较之前10多个小时的执行真是大有提升
总重名用户:34175
占总会员比例:1.7%
1.7%的重名用户还是挺严重的,希望本文对各位站长对比出本站的用户有所帮助

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Google Authenticator is a tool used to protect the security of user accounts, and its key is important information used to generate dynamic verification codes. If you forget the key of Google Authenticator and can only verify it through the security code, then the editor of this website will bring you a detailed introduction on where to get the Google security code. I hope it can help you. If you want to know more Users please continue reading below! First open the phone settings and enter the settings page. Scroll down the page and find Google. Go to the Google page and click on Google Account. Enter the account page and click View under the verification code. Enter your password or use your fingerprint to verify your identity. Obtain a Google security code and use the security code to verify your Google identity.

Go language is an efficient, concise and easy-to-learn programming language. It is favored by developers because of its advantages in concurrent programming and network programming. In actual development, database operations are an indispensable part. This article will introduce how to use Go language to implement database addition, deletion, modification and query operations. In Go language, we usually use third-party libraries to operate databases, such as commonly used sql packages, gorm, etc. Here we take the sql package as an example to introduce how to implement the addition, deletion, modification and query operations of the database. Assume we are using a MySQL database.

Hibernate polymorphic mapping can map inherited classes to the database and provides the following mapping types: joined-subclass: Create a separate table for the subclass, including all columns of the parent class. table-per-class: Create a separate table for subclasses, containing only subclass-specific columns. union-subclass: similar to joined-subclass, but the parent class table unions all subclass columns.

Apple's latest releases of iOS18, iPadOS18 and macOS Sequoia systems have added an important feature to the Photos application, designed to help users easily recover photos and videos lost or damaged due to various reasons. The new feature introduces an album called "Recovered" in the Tools section of the Photos app that will automatically appear when a user has pictures or videos on their device that are not part of their photo library. The emergence of the "Recovered" album provides a solution for photos and videos lost due to database corruption, the camera application not saving to the photo library correctly, or a third-party application managing the photo library. Users only need a few simple steps

HTML cannot read the database directly, but it can be achieved through JavaScript and AJAX. The steps include establishing a database connection, sending a query, processing the response, and updating the page. This article provides a practical example of using JavaScript, AJAX and PHP to read data from a MySQL database, showing how to dynamically display query results in an HTML page. This example uses XMLHttpRequest to establish a database connection, send a query and process the response, thereby filling data into page elements and realizing the function of HTML reading the database.

How to use MySQLi to establish a database connection in PHP: Include MySQLi extension (require_once) Create connection function (functionconnect_to_db) Call connection function ($conn=connect_to_db()) Execute query ($result=$conn->query()) Close connection ( $conn->close())

To handle database connection errors in PHP, you can use the following steps: Use mysqli_connect_errno() to obtain the error code. Use mysqli_connect_error() to get the error message. By capturing and logging these error messages, database connection issues can be easily identified and resolved, ensuring the smooth running of your application.

PHP is a back-end programming language widely used in website development. It has powerful database operation functions and is often used to interact with databases such as MySQL. However, due to the complexity of Chinese character encoding, problems often arise when dealing with Chinese garbled characters in the database. This article will introduce the skills and practices of PHP in handling Chinese garbled characters in databases, including common causes of garbled characters, solutions and specific code examples. Common reasons for garbled characters are incorrect database character set settings: the correct character set needs to be selected when creating the database, such as utf8 or u
