Table of Contents
Introduction
Project Start
Technical points
Home Web Front-end JS Tutorial How can Node crawl headline videos in batches and save them (code implementation)

How can Node crawl headline videos in batches and save them (code implementation)

Sep 19, 2018 pm 05:02 PM
node.js download web crawler video

The content of this article is about how Node implements batch crawling and saving of headline videos (code implementation). It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Introduction

The general routine for crawling videos or pictures in batches is to use a crawler to obtain a collection of file links, and then save the files one by one through methods such as writeFile. However, the video link of Toutiao cannot be captured in the html file (server-side rendering output) that needs to be crawled. The video link is dynamically calculated and added to the video tag based on the known key or hash value of the video through the algorithm or decryption method in certain js files when the page is rendered on the client side. This is also an anti-crawling measure for the website.

When we browse these pages, we can see the calculated file address through the audit element. However, when downloading in batches, it is obviously not advisable to manually obtain video links one by one. Fortunately, puppeteer provides the function of simulating access to Chrome, allowing us to crawl the final page rendered by the browser.

Project Start

Command
npm i
npm start
Copy after login

Notice: The process of installing puppeteer is a little slow, please wait patiently.

Configuration file
// 配置相关
module.exports =  {
  originPath: 'https://www.ixigua.com', // 页面请求地址
  savePath: 'D:/videoZZ' // 存放路径
}
Copy after login

Technical points

puppeteer

Official API

puppeteer provides a high-level API to control Chrome or Chromium.

puppeteer Main function:

  • Use web pages to generate PDFs and images

  • Crawl SPA applications and generate pre-rendered content (i.e. "SSR" server-side rendering)

  • Can capture content from the website

  • Automated form submission, UI testing, keyboard input, etc.

API used:

  • puppeteer.launch() Launch browser instance

  • browser .newPage() Create a new page

  • page.goto() Enter the specified webpage

  • page.screenshot() Screenshot

  • page.waitFor() The page waits, which can be time, a certain element, or a certain function

  • page.$eval() Gets a specified element, Equivalent to document.querySelector

  • ##page.$$eval() to obtain a certain type of element, equivalent to document.querySelectorAll

  • page.$( '#id .className') Get an element in the document, the operation is similar to jQuery

Code example

const puppeteer = require('puppeteer');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
 
  await browser.close();
})();
Copy after login
Video file download method

  • Download video main method

const downloadVideo = async video => {
  // 判断视频文件是否已经下载
  if (!fs.existsSync(`${config.savePath}/${video.title}.mp4`)) {
    await getVideoData(video.src, 'binary').then(fileData => {
      console.log('下载视频中:', video.title)
      savefileToPath(video.title, fileData).then(res =>
        console.log(`${res}: ${video.title}`)
      )
    })
  } else {
    console.log(`视频文件已存在:${video.title}`)
  }
}
Copy after login
  • Get video data

getVideoData (url, encoding) {
  return new Promise((resolve, reject) => {
    let req = http.get(url, function (res) {
      let result = ''
      encoding && res.setEncoding(encoding)
      res.on('data', function (d) {
        result += d
      })
      res.on('end', function () {
        resolve(result)
      })
      res.on('error', function (e) {
        reject(e)
      })
    })
    req.end()
  })
}
Copy after login
  • Save video data to local

savefileToPath (fileName, fileData) {
  let fileFullName = `${config.savePath}/${fileName}.mp4`
  return new Promise((resolve, reject) => {
    fs.writeFile(fileFullName, fileData, 'binary', function (err) {
      if (err) {
        console.log('savefileToPath error:', err)
      }
      resolve('已下载')
    })
  })
}
Copy after login
Target website: 西瓜视频 Project function: Download the latest 20 videos under the headline account [Weichen Finance]
Project address:
Github address

The above is the detailed content of How can Node crawl headline videos in batches and save them (code implementation). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What should I do if I download other people's wallpapers after logging into another account on wallpaperengine? What should I do if I download other people's wallpapers after logging into another account on wallpaperengine? Mar 19, 2024 pm 02:00 PM

When you log in to someone else's steam account on your computer, and that other person's account happens to have wallpaper software, steam will automatically download the wallpapers subscribed to the other person's account after switching back to your own account. Users can solve this problem by turning off steam cloud synchronization. What to do if wallpaperengine downloads other people's wallpapers after logging into another account 1. Log in to your own steam account, find cloud synchronization in settings, and turn off steam cloud synchronization. 2. Log in to someone else's Steam account you logged in before, open the Wallpaper Creative Workshop, find the subscription content, and then cancel all subscriptions. (In case you cannot find the wallpaper in the future, you can collect it first and then cancel the subscription) 3. Switch back to your own steam

How to download links starting with 115://? Download method introduction How to download links starting with 115://? Download method introduction Mar 14, 2024 am 11:58 AM

Recently, many users have been asking the editor, how to download links starting with 115://? If you want to download links starting with 115://, you need to use the 115 browser. After you download the 115 browser, let's take a look at the download tutorial compiled by the editor below. Introduction to how to download links starting with 115:// 1. Log in to 115.com, download and install the 115 browser. 2. Enter: chrome://extensions/ in the 115 browser address bar, enter the extension center, search for Tampermonkey, and install the corresponding plug-in. 3. Enter in the address bar of 115 browser: Grease Monkey Script: https://greasyfork.org/en/

Is it infringing to post other people's videos on Douyin? How does it edit videos without infringement? Is it infringing to post other people's videos on Douyin? How does it edit videos without infringement? Mar 21, 2024 pm 05:57 PM

With the rise of short video platforms, Douyin has become an indispensable part of everyone's daily life. On TikTok, we can see interesting videos from all over the world. Some people like to post other people’s videos, which raises a question: Is Douyin infringing upon posting other people’s videos? This article will discuss this issue and tell you how to edit videos without infringement and how to avoid infringement issues. 1. Is it infringing upon Douyin’s posting of other people’s videos? According to the provisions of my country's Copyright Law, unauthorized use of the copyright owner's works without the permission of the copyright owner is an infringement. Therefore, posting other people’s videos on Douyin without the permission of the original author or copyright owner is an infringement. 2. How to edit a video without infringement? 1. Use of public domain or licensed content: Public

Introduction to how to download and install the superpeople game Introduction to how to download and install the superpeople game Mar 30, 2024 pm 04:01 PM

The superpeople game can be downloaded through the steam client. The size of this game is about 28G. It usually takes one and a half hours to download and install. Here is a specific download and installation tutorial for you! New method to apply for global closed testing 1) Search for "SUPERPEOPLE" in the Steam store (steam client download) 2) Click "Request access to SUPERPEOPLE closed testing" at the bottom of the "SUPERPEOPLE" store page 3) After clicking the request access button, The "SUPERPEOPLECBT" game can be confirmed in the Steam library 4) Click the install button in "SUPERPEOPLECBT" and download

How to download Quark network disk to local? How to save files downloaded from Quark Network Disk back to the local computer How to download Quark network disk to local? How to save files downloaded from Quark Network Disk back to the local computer Mar 13, 2024 pm 08:31 PM

Many users need to download files when using Quark Network Disk, but we want to save them locally, so how to set this up? Let this site introduce to users in detail how to save files downloaded from Quark Network Disk back to the local computer. How to save files downloaded from Quark network disk back to your local computer 1. Open Quark, log in to your account, and click the list icon. 2. After clicking the icon, select the network disk. 3. After entering Quark Network Disk, click My Files. 4. After entering My Files, select the file you want to download and click the three-dot icon. 5. Check the file you want to download and click Download.

How to download foobar2000? -How to use foobar2000 How to download foobar2000? -How to use foobar2000 Mar 18, 2024 am 10:58 AM

foobar2000 is a software that can listen to music resources at any time. It brings you all kinds of music with lossless sound quality. The enhanced version of the music player allows you to get a more comprehensive and comfortable music experience. Its design concept is to play the advanced audio on the computer The device is transplanted to mobile phones to provide a more convenient and efficient music playback experience. The interface design is simple, clear and easy to use. It adopts a minimalist design style without too many decorations and cumbersome operations to get started quickly. It also supports a variety of skins and Theme, personalize settings according to your own preferences, and create an exclusive music player that supports the playback of multiple audio formats. It also supports the audio gain function to adjust the volume according to your own hearing conditions to avoid hearing damage caused by excessive volume. Next, let me help you

How to make money from posting videos on Douyin? How can a newbie make money on Douyin? How to make money from posting videos on Douyin? How can a newbie make money on Douyin? Mar 21, 2024 pm 08:17 PM

Douyin, the national short video platform, not only allows us to enjoy a variety of interesting and novel short videos in our free time, but also gives us a stage to show ourselves and realize our values. So, how to make money by posting videos on Douyin? This article will answer this question in detail and help you make more money on TikTok. 1. How to make money from posting videos on Douyin? After posting a video and gaining a certain amount of views on Douyin, you will have the opportunity to participate in the advertising sharing plan. This income method is one of the most familiar to Douyin users and is also the main source of income for many creators. Douyin decides whether to provide advertising sharing opportunities based on various factors such as account weight, video content, and audience feedback. The TikTok platform allows viewers to support their favorite creators by sending gifts,

How to publish Xiaohongshu video works? What should I pay attention to when posting videos? How to publish Xiaohongshu video works? What should I pay attention to when posting videos? Mar 23, 2024 pm 08:50 PM

With the rise of short video platforms, Xiaohongshu has become a platform for many people to share their lives, express themselves, and gain traffic. On this platform, publishing video works is a very popular way of interaction. So, how to publish Xiaohongshu video works? 1. How to publish Xiaohongshu video works? First, make sure you have a video content ready to share. You can use your mobile phone or other camera equipment to shoot, but you need to pay attention to the image quality and sound clarity. 2. Edit the video: In order to make the work more attractive, you can edit the video. You can use professional video editing software, such as Douyin, Kuaishou, etc., to add filters, music, subtitles and other elements. 3. Choose a cover: The cover is the key to attracting users to click. Choose a clear and interesting picture as the cover to attract users to click on it.

See all articles