Home Backend Development PHP Tutorial Sphinx+Mysql+Php 12亿DNS数据秒查

Sphinx+Mysql+Php 12亿DNS数据秒查

Jun 20, 2016 pm 12:57 PM

    最近得到一个接近12亿的全球ns节点的数据,本来想用来做一个全国通过dns反查域名然后进行全国范围的网站收集和扫描的,后来发现网站的数量不是很准确,加上一个人的精力和财力实在难以完成这样一个庞大的任务,就没有做下去,只留下了这个搭建的笔记。
    文本格式,简单的文本搜索,速度太慢,一次搜索接近花掉5-10分钟时间,决定将其倒入数据库进行一次优化,速度应该能提升不到,电脑上只有AMP的环境,那么就决定将其倒入到mysql中,
一开始使用Navicat进行倒入,刚好数据的格式是 ip,ns 这样的格式,倒入了接近5个小时发现还没有倒入到百分之一,这可是纯文本格式化的时候大小为54G的数据文件啊!
    后来发现用mysql自带的load data local infile只话了30分钟左右,第一次导入的时候忘记新建键了,只好重新导入一次

mysql> load data local infile 'E:\\dns\\rite\\20141217-rdns.txt' into table dnsfields terminated by ',';Query OK, 1194674130 rows affected, 1700 warnings (29 min 26.65 sec)Records: 1194674130  Deleted: 0  Skipped: 0  Warnings: 1700
Copy after login

因为添加了一个id字段,所以导入速度明显下降,不过大概也只花了1个半小时左右的时间就完成了55G数据的导入。
接着是建立索引,因为我需要的模糊查询,所以在这里建立的是Full Text+Btree,差不多花了3天时间索引才建立完成,期间因为一不小心把mysql的执行窗口关闭了,以为就这么完蛋了,最后发现其实mysql还在后台默默的建立索引。
建立了索引之后发现查询速度也就比没有建立索引快那么一点,执行了一条

select * from ns where ns like '%weibo.com'
Copy after login

花掉了210秒的时间,还是太慢了。
然后就开始使用SPhinx来做索引提升速度,
从官方下载了64位的SPHINX MYSQL SUPPORT的包下载地址
接着配置配置文件,src里配置要mysql的账号密码

source src1{    sql_host        = localhost    sql_user        = root    sql_pass        = root    sql_db          = ns    sql_port        = 3306      sql_query       = \        SELECT id,ip,ns from ns //这里写上查询语句    sql_attr_uint       = id
Copy after login

然后searchd里也需要配置一下,端口和日志,pid文件的路径配置好即可

searchd{    listen          = 9312    listen          = 9306:mysql41    log         = E:/phpStudy/splinx/file/log.log    query_log       = E:/phpStudy/splinx/file/query.log    pid_file        = E:/phpStudy/splinx/file/searchd.pid
Copy after login

然后切换到sphinx的bin目录进行建立索引,执行

searchd test1 #test1是你source的名称
Copy after login

我大概建立了不到2个小时的时间就建立完成了,
然后切换到api目录下执行

E:\phpStudy\splinx\api>test.py asdDEPRECATED: Do not call this method or, even better, use SphinxQL instead of anAPIQuery 'asd ' retrieved 1000 of 209273 matches in 0.007 secQuery stats:        'asd' found 209291 times in 209273 documentsMatches:1. doc_id=20830, weight=12. doc_id=63547, weight=13. doc_id=96147, weight=14. doc_id=1717000, weight=15. doc_id=2213385, weight=16. doc_id=3916825, weight=17. doc_id=3981791, weight=18. doc_id=5489598, weight=19. doc_id=9348383, weight=110. doc_id=18194414, weight=111. doc_id=18194415, weight=112. doc_id=18195126, weight=113. doc_id=18195517, weight=114. doc_id=18195518, weight=115. doc_id=18195519, weight=116. doc_id=18195520, weight=117. doc_id=18195781, weight=118. doc_id=18195782, weight=119. doc_id=18200301, weight=120. doc_id=18200303, weight=1
Copy after login

进行了测试,发现速度真的很快,写了一个PHP脚本进行调用

<?phpinclude 'sphinxapi.php';$conn=mysql_connect('127.0.0.1','root','root');mysql_select_db('ns',$conn);$sphinx = new SphinxClient();$now=time();$sphinx->SetServer ( '127.0.0.1', 9312 );$result = $sphinx->query ('weibo.com', 'test1'); foreach($result['matches'] as $key => $val){    $sql="select * from ns where id='{$key}'";    $res=mysql_query($sql);    $res=mysql_fetch_array($res);    echo "{$res['ip']}:{$res['ns']}";}echo time()-$now;?>
Copy after login

基本实现了秒查!,最后输出的时间只花掉了0!

123.125.104.176:w-176.service.weibo.com123.125.104.178:w-178.service.weibo.com123.125.104.179:w-179.service.weibo.com123.125.104.207:w-207.service.weibo.com123.125.104.208:w-208.service.weibo.com123.125.104.209:w-209.service.weibo.com123.125.104.210:w-210.service.weibo.com202.106.169.235:staff.weibo.com210.242.10.56:weibo.com.tw218.30.114.174:w114-174.service.weibo.com219.142.118.228:staff.weibo.com60.28.2.221:w-221.hao.weibo.com60.28.2.222:w-222.hao.weibo.com60.28.2.250:w-222.hao.weibo.com61.135.152.194:sina152-194.staff.weibo.com61.135.152.212:sina152-212.staff.weibo.com65.111.180.3:pr1.cn-weibo.com160.34.0.155:srm-weibo.us2.cloud.oracle.com202.126.57.40:w1.weibo.vip.hk3.tvb.com202.126.57.41:w1.weibo.hk3.tvb.com0
Copy after login
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

Build a React App With a Laravel Back End: Part 2, React Build a React App With a Laravel Back End: Part 2, React Mar 04, 2025 am 09:33 AM

This is the second and final part of the series on building a React application with a Laravel back-end. In the first part of the series, we created a RESTful API using Laravel for a basic product-listing application. In this tutorial, we will be dev

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Notifications in Laravel Notifications in Laravel Mar 04, 2025 am 09:22 AM

In this article, we're going to explore the notification system in the Laravel web framework. The notification system in Laravel allows you to send notifications to users over different channels. Today, we'll discuss how you can send notifications ov

Explain the concept of late static binding in PHP. Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP Logging: Best Practices for PHP Log Analysis PHP Logging: Best Practices for PHP Log Analysis Mar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

See all articles