Table of Contents
Regular Expression
[content]
Home CMS Tutorial DEDECMS How to use dedecms collection

How to use dedecms collection

Jul 16, 2019 pm 03:04 PM
dedecms

Taking the official website of Dreamweaver as an example, we collect the PHP tutorial column under the Webmaster Academy and open the list address http://www.dedecms.com/web-art/PHP_jiaocheng.

How to use dedecms collection

#Log in to the backend, enter "Collection Node Management", create a new node, and select the content model as "Normal Article".

1. Set the basic information of the node (Recommended learning: dedecms tutorial)

First fill in a node name that is easy to remember, and select The target page code is GB2312. The anti-hotlink mode does not need to be set. Since the target site has no restrictions, this item will not be modified. The system default timeout is 10 seconds.

2. Set the list URL acquisition rules

In this step we need to make some settings, obtain the article list address, return to the target site list page, and observe the changes between pages , it can be found that only the numbers after "14_" have regular incremental changes.

Home page: http://www.dedecms.com/web-art/PHP_jiaocheng/list_14_1.html

Middle: http://www.dedecms.com/web-art/PHP_jiaocheng /list_14_(*).html

Last page: http://www.dedecms.com/web-art/PHP_jiaocheng/list_14_172.html

Copy a paging address and return to "New On the "Add Collection Node" page, select "Source Attribute" as "Batch Generate List URL", paste the address into the "Matching URL", modify the rule change as (*), and enter 1 in the "Batch Generate Address Settings" (*) To 172, what this means is to generate all addresses from the first page to the last 172 pages of the list.

Test it. In the pop-up box, we can see that 172 address records are looped out, and it is set up smoothly. Sometimes we encounter a list that is difficult to obtain, then we can copy the irregular address into the "Manually specified list URL" text box to collect it.

3. Set article URL matching rules

The article address source page has been specified above. In this step, you need to find the article address page that meets the requirements among these pages. . Open a list page and observe that the box in the left column contains all the addresses we need. In this case, the pages that are clearly distinguished can be filtered using the "HTML at the beginning of the region" and "HTMLL at the end of the region" settings.

But other methods can also be used. Move the mouse to various link addresses and observe the complete address displayed in the lower left corner of the browser. The addresses we need all contain "PHP_jiaocheng/20", then we fill it in "Must Contain".

Both methods can filter out addresses. When it comes to complex pages, they can be used together. With the addition of regular rules, there are almost no addresses that cannot be filtered out. Compare with the figure below. Finally confirm and go to the next step "Web content acquisition rules".

How to use dedecms collection

4. Web page content acquisition rules

The above introduces the list setting method, next we enter the setting of content acquisition rules , if the collection is to serve, the function of the above one to three steps is just that the appetizer serves as a guide for the following main course. The next step is to introduce how to collect article content from the target site. This step is the most core part of the entire collection.

Continue to return to the PHP tutorial list of DreamWeaver and open an article in the list. Here we take the article "Regular Expressions" as an example: http://www.dedecms.com/web -art/PHP_jiaocheng/20070420/38633.html, copy this address to the "Preview URL"; because all articles of DreamWeaver are not paginated, there is no need to set the pagination here, and you can directly enter the "Fixed Collection Project" page

(Note: If the collected content contains paging, you only need to set the matching rules in the paging navigation part. Here are all listed paging lists, top and bottom pages, or incomplete paging lists that can be set according to the content. Yes)

The following is the quoted content:

All listed paginated list: The paginated content lists all links, as shown in the figure below

Up and down page form or incomplete paging list: a single page displays the current paging content, an incomplete display list form

5. Fixed collection items

Enter here In the first step, we start to analyze the page source code. Collection is nothing more than analyzing the structure of the HTML page to obtain the content we need. Therefore, we are required to have a certain understanding of HTML code and be able to find the required content by viewing the page source file. It is best to open several more pages for analysis and find the similarities.

It is recommended that everyone use Dreamweaver analysis. When analyzing the page code, it will be much more convenient to use the search function more often. Especially after finding the tag, search to see if there are any duplications to reduce analysis errors.

1) Article title: The title of this page is "Regular Expression" Copy it, press Ctrl F key in Dreamweaver to search all, there are 30 records. Because of the uniqueness, here we select the "

Regular Expression

" tag on line 105, copy it to the matching rule of the "Fixed Collection Project" article title, and replace it with the keyword "[content]" Title, ultimately

[content]

.

2) Author: Continue searching with author as the keyword. Only 110 lines have unique occurrences. Copy them together with the tags before and after alluse to the matching rules, and use [content] to replace the place to be collected.

3) Source: Same as above. Find the tag in line 109, copy it, and use [content] to replace the place to be collected. If the source contains hyperlink tags that you want to remove, in the filter rule box, fill in the following rules to filter them out:

<a>]*)> <br></a><br>
Copy after login

4) Release time: Copy, paste and modify the same operations as above at line 111.

5) Article content: Search for the beginning of the article content. For example, "Part One" found the target in line 118. Click the status bar

and found that all the article content could not be selected. Continue to the previous

, blue content selects all content, knowing that
is the real container of the article content. Copy the tags before and after the content to the matching rules.

At this point, the content filtering settings have been completed.

6. Node collection

If your collection node is completed in one go and the test is successful, click the button as prompted to collect directly, but the node is written before Yes, you need to go to the "Node Management Page" to check the nodes to be collected and press the "Collect" button to collect. If you want to collect new content from all nodes, go to the monitoring collection page to operate.

You can set the number of data collected per page for each page collection. Generally speaking, do not set it too large, otherwise the system may not be able to process it and some parts cannot be collected. It is recommended not to exceed 15.

The number of threads refers to how many threads are collecting at the same time each time. Increasing the number of threads can speed up the collection, but it will also increase the occupation of server resources, so please use it with caution. If the target site has an anti-refresh limit, you can set it here according to the anti-refresh limit time of the target site. If not, the default is 0 seconds.

Additional options These three settings should be easy to understand literally, so you can choose according to your actual needs.

Collection completed.

For more wordpress related technical articles, please visit the wordpress tutorial column to learn!

The above is the detailed content of How to use dedecms collection. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Where is the imperial cms resource network template? Where is the imperial cms resource network template? Apr 17, 2024 am 10:00 AM

Empire CMS template download location: Official template download: https://www.phome.net/template/ Third-party template website: https://www.dedecms.com/diy/https://www.0978.com.cn /https://www.jiaocheng.com/Installation method: Download template Unzip template Upload template Select template

How dedecms implements template replacement How dedecms implements template replacement Apr 16, 2024 pm 12:12 PM

Template replacement can be implemented in Dedecms through the following steps: modify the global.cfg file and set the required language pack. Modify the taglib.inc.php hook file and add support for language suffix template files. Create a new template file with a language suffix and modify the required content. Clear Dedecms cache.

What website can dedecms do? What website can dedecms do? Apr 16, 2024 pm 12:24 PM

Dedecms is an open source CMS that can be used to create various types of websites, including: news websites, blogs, e-commerce websites, forums and community websites, educational websites, portals, other types of websites (such as corporate websites, personal websites, photo album websites, video sharing website)

How to upload local videos to dedecms How to upload local videos to dedecms Apr 16, 2024 pm 12:39 PM

How to upload local videos using Dedecms? Prepare the video file in a format that is supported by Dedecms. Log in to the Dedecms management backend and create a new video category. Upload video files on the video management page, fill in the relevant information and select the video category. To embed a video while editing an article, enter the file name of the uploaded video and adjust its dimensions.

How to use dedecms How to use dedecms Apr 16, 2024 pm 12:15 PM

Dedecms is an open source Chinese CMS system that provides content management, template system and security protection. The specific usage includes the following steps: 1. Install Dedecms. 2. Configure the database. 3. Log in to the management interface. 4. Create content. 5. Set up the template. 6. Manage users. 7. Maintain the system.

Accurate and reliable dedecms conversion tool evaluation report Accurate and reliable dedecms conversion tool evaluation report Mar 12, 2024 pm 07:03 PM

Accurate and reliable dedecms conversion tool evaluation report With the rapid development of the Internet era, website construction has become one of the necessary tools for many companies and individuals. In website construction, using a content management system (CMS) can manage website content and functions more conveniently and efficiently. Among them, dedecms, as a well-known CMS system, is widely used in various website construction projects. However, sometimes we are faced with the need to convert the dedecms website to other formats, in which case we need to use a conversion tool

A simple way to learn dedecms encoding conversion function A simple way to learn dedecms encoding conversion function Mar 14, 2024 pm 02:09 PM

Learning dedecms encoding conversion function is not complicated. Simple code examples can help you quickly master this skill. In dedecms, the encoding conversion function is usually used to deal with problems such as Chinese garbled characters and special characters to ensure the normal operation of the system and the accuracy of data. The following will introduce in detail how to use the encoding conversion function of dedecms, allowing you to easily cope with various encoding-related needs. 1.UTF-8 to GBK In dedecms, if you need to convert UTF-8 encoded string to G

What loopholes does dedecms have? What loopholes does dedecms have? Aug 03, 2023 pm 03:56 PM

DedeCMS is an open source content management system that has some potential vulnerabilities and security risks: 1. SQL injection vulnerability. Attackers can perform unauthorized operations or obtain sensitive data by constructing malicious SQL query statements; 2. File Upload vulnerability, attackers can upload files containing malicious code to the server to execute arbitrary code or obtain server permissions; 3. Sensitive information leakage; 4. Unauthenticated vulnerability exploitation.

See all articles