


Use PHPdig to create your own Google [Graphic Tutorial]_PHP Tutorial
一、什么是PHPdig?
PHPdig是国外非常流行的垂直搜索引擎产品(与其说是产品,不如说是一项区别于传统搜索引擎的搜索技术),采用PHP语言编写,利用了PHP程序运行的高效性,极大地提高了搜索反应速度,它可以像Google或者Baidu以及其它搜索引擎一样搜索互联网,搜索内容除了普通的网页外还包括txt, doc, xls, pdf等各式的文件,具有强大的内容搜索和文件解析功能。PHPdig同传统的搜索引擎一样,包含了以下三种最基本的技术:
1.Spider技术
2.网页结构化信息抽取技术或元数据采集技术
3.分词、索引技术
区别于传统搜索引擎,PHPdig适用于专业化更强、层次更深的个性化搜索引擎,利用它打造针对某一领域的垂直搜索引擎是最好的选择。
二、如何获得这PHPdig?
PHPdig是免费产品(需要保留版权),最新版本是 phpdig-1.8.9 为了避免Apache以及MYSQL的版本兼容性问题,建议采用较低级的版本,其网站地址是:http://www.phpdig.net ,下载地址是:http://www.phpdig.net/navigation.php?action=download 说明一下,我试用过phpdig-1.8.9版本,但出现了很多问题,改用PHPdig-1.8.8则问题较少。
三、具体步骤
1.获取产品
访问http://www.phpdig.net/navigation.php?action=download下载PHPdig-1.8.8至桌面,解压缩至Apache服务器html目录,一般路径为:D:\usr\www\html\,(如果你没有安装Apache服务器请事先安装,推荐使用Mappm-Server v1.1.9 Final,Mappm-Server 采用傻瓜式安装,一次搞定,方便调试和运行 PHP/CGI MySQL 程序)。
2.运行并配置PHPdig数据库
打开浏览器输入http://localhost/phpdig/按回车键,页面列出PHPdig的所有文件及包含文件夹,找一找发现没有默认首页文件(default,index),单击search.php文件出现错误提示:Unable to connect to database : Check the connection script。提示无法完成数据库连接,原来我们还没有完成PHPdig的数据库配置。返回进入admin目录找到install.php文件,单击运行,乍一看,全英文界面(说明一下,PHPdig目前所有版本均不支持中文界面),没有关系,如果你有过汉化经验不妨自己动手将其汉化,这里提供一份我自己汉化的cn-language.php文档的下载(请将其拷贝至locales目录下)。另外你还需修改includes目录下的config.php文件(语言修改)和style.css文件(字体修改和样式修改)。
进入install.php后系统要求我们输入PHPdig管理用户名和密码,默认情况下均为admin,进入后出现如下界面(汉化后):
(图1)
所需提供的信息有:
如果你是在本地测试,请输入默认情况下的服务器名称localhost(localhost是Mappm-Server下的默认务服务器名称,也就是mysql的默认服务器名称,Mappm-Server内置mysql数据库)数据库服务器端口默认为3126,可以不填,数据库sock协议默认为空,用户名默认为root(Mappm-Server默认用户名),密码是你在安装Mappm-Server时输入的用户密码,PHPdig数据库名称默认为phpdig,可任意修改,同时,你可以对数据库中的数据表加前缀,默认为空。
如果你要上传到与Internet相连的web服务器请向服务器提供商索要mysql服务器的名称或者IP地址以及数据库服务器端口、sock协议、用户名、密码等,数据库名称以及数据表前缀的设置同上。
至于右边的四个单选按钮,你可以视情况而定,初次使用(安装)选择默认的“建立数据库”
确认上述信息无误后单击安装按钮,如果连接数据库不成功会提示“不能连接数据库”的错误信息,如果数据库连接成功则会直接跳入管理页面如下图:
(图2)
3. 界面区域介绍
Area 1 is a text input area. The default text has three lines, all starting with http. At a glance, everyone knows that the website address of the website to be spidered is entered here (it is recommended to only spider one website at a time).
Area 2 is the spider option. The search depth refers to how many levels of directories the website has been spidered to. The number of links per page refers to the maximum number of linked web pages below that can be crawled for a certain web page. By default, they are all 0, which means that the entire site will be spidered.
Area 3 displays database status information, including websites that have been spidered, keywords, indexes, and site information that is being spidered.
Area 4 is a drop-down list box that lists the URLs of spidered sites. Select one of the sites and you can clear and update it in area 5.
Area 5 not only provides clearing and updating operations for the sites selected in Area 4, but also provides relevant statistical information entrances and spider control.
4. Run spider for a specific site
If you are very interested in the content of Tianji Software channel, you can make a more professional search engine than Google to search for the content of Tianji Software. Your search engine will be more comprehensive and deeper than Google. Let's take the content of the spider Tianji software channel as an example to introduce how to spider a website.
1) Enter http://soft.yesky.com in Area 1 of Figure 2, and keep the search depth and number of links per page at the default of 0
2) Click the spider button, the page jumps to the spider information page, and the program starts to automatically spider the content of the site http://soft.yesky.com.
Note: The process of the spider website is very slow. If the website has too much content, the process may last from a few hours to a day, but you don’t have to worry about the script running timeout because the system timeout is set to a maximum of 48 hours. . During this process, you can also interrupt the running of the spider program and restart the spider program to run the unfinished website. It should be noted that if you accidentally close the spider running page during this process, the system does not actually stop the spider and is still consuming system resources. You can reopen the spider page and click the Stop spider link to release system resources.
(Picture 3)
5. Search using PHPdig
After a period of time, the result of running the spider program is to capture the information on the http://soft.yesky.com website into the server database, mainly the title information, keyword information and page address information of the other party's content. Wait, at this point, you can search by accessing search.php.
(Picture 4)
You can choose the number of search results to display, and you can choose fuzzy search or precise search. In addition, you can choose to search for a certain site. By default, all sites that have been spidered will be searched.
(Picture 5)
The picture above is the search results page for searching "QQ2006".
6. Problems
Due to PHPdig’s language setting issues, system word segmentation issues, and character processing issues in the MYSQL database, there are still many uncertain factors in PHPdig’s search for Chinese vocabulary. These things need to be further solved and improved by us. We welcome your comments. Friends who are interested should go to the Taoba-PHPdig theme community to discuss this.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Google has introduced DisplayPort Alternate Mode with the Pixel 8 series, and it's present on the newly launched Pixel 9 lineup. While it's mainly there to let you mirror the smartphone display with a connected screen, you can also use it for desktop

Google recently responded to the performance concerns about the Tensor G4 of the Pixel 9 line. The company said that the SoC wasn't designed to beat benchmarks. Instead, the team focused on making it perform well in the areas where Google wants the c

After rain in summer, you can often see a beautiful and magical special weather scene - rainbow. This is also a rare scene that can be encountered in photography, and it is very photogenic. There are several conditions for a rainbow to appear: first, there are enough water droplets in the air, and second, the sun shines at a low angle. Therefore, it is easiest to see a rainbow in the afternoon after the rain has cleared up. However, the formation of a rainbow is greatly affected by weather, light and other conditions, so it generally only lasts for a short period of time, and the best viewing and shooting time is even shorter. So when you encounter a rainbow, how can you properly record it and photograph it with quality? 1. Look for rainbows. In addition to the conditions mentioned above, rainbows usually appear in the direction of sunlight, that is, if the sun shines from west to east, rainbows are more likely to appear in the east.

Google's AI assistant, Gemini, is set to become even more capable, if the APK teardown of the latest update (v15.29.34.29 beta) is to be considered. The tech behemoth's new AI assistant could reportedly get several new extensions. These extensions wi

The Pixel 9 series is almost here, having been scheduled for an August 13 release. Based on recent rumours, the Pixel 9, Pixel 9 Pro and Pixel 9 Pro XL will mirror the Pixel 8 and Pixel 8 Pro (curr. $749 on Amazon) by starting with 128 GB of storage.

A few months have passed since Android Authority demonstrated a new Android desktop mode that Google had hidden away within Android 14 QPR3 Beta 2.1. Arriving hot on the heels of Google adding DisplayPort Alt Mode support for the Pixel 8 and Pixel 8

More promotional materials relating to the Pixel 9 series have leaked online. For reference, the new leak arrived shortly after 91mobiles shared multiple images that also showcased the Pixel Buds Pro 2 and Pixel Watch 3 or Pixel Watch 3 XL. This time

Google is roughly a fortnight away from fully revealing new hardware. As usual, countless sources have leaked details about new Pixel devices, whether that be the Pixel Watch 3, Pixel Buds Pro 2 or Pixel 9 smartphones. It also seems that the company
