Home Backend Development XML/RSS Tutorial Detailed example of using Ruby and Nokogiri to simulate crawlers to export RSS seeds

Detailed example of using Ruby and Nokogiri to simulate crawlers to export RSS seeds

May 02, 2017 am 09:42 AM

# encoding: utf-8
require 'thread'
require 'nokogiri'
require 'open-uri'
require 'rss/maker'
 
$result=Queue.new
def extract_readme_header(no,name,url)
  frame = Nokogiri::HTML(open(url))
  return unless frame
  readme=$url+frame.css('frame')[1]['src']
  return unless readme
  open(readme) do |f|
    doc = Nokogiri::HTML(f.read)
    text=doc.css("div#content div#filecontents p")[0..4].map { |c| c.content }.join(" ").strip
    return if text.length==0
    if text !~ /(rails)|(activ_)/i
      puts "========= #{no} #{name} : #{text[0..50]}"
      date = f.last_modified
      $result << [no,name,readme,date,text]
    end
  end
rescue
  puts $!.to_s
end
 
def make_rss(items)
  RSS::Maker.make("2.0") do |m|
    m.channel.title = "GtitHub recently updated projects"
    m.channel.link = "http://localhost"
    m.channel.description = "GitHub recently updated projects"
    m.items.do_sort = true
    items.each do |no,name,url,date,descr|
      i = m.items.new_item
      i.title = name
      i.link = url
      i.description=descr
      i.date = date
    end
  end
end
 
############################## M A I N ########################
 
############# Scan list of recent project
 
lth=[]
$url="http://rdoc.info"
puts "get url #{$url}..."
doc = Nokogiri::HTML(open($url))
doc.css(&#39;ul.libraries&#39;)[1].css(&#39;li&#39;).each_with_index do |li,i|
  aname =li.css(&#39;a&#39;).first
  name=aname.content
  purl=$url+aname[&#39;href&#39;]
  lth << Thread.new(i,name,purl) { |j,n,u| extract_readme_header(j,n,u)  }
end
 
################ wait all readme are read
 
lth.each { |th| th.join() }
 
################ dequeue results and sort them by date descending
 
result=[]
result << $result.shift while $result.size>0
result.sort!  { |a,b| a[0] <=> b[0] }
 
 
################ format results in rss
 
File.open("RubyFeeds.rss","w") do |file|
  file.write make_rss(result)
end
Copy after login

The above is the detailed content of Detailed example of using Ruby and Nokogiri to simulate crawlers to export RSS seeds. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to use C++ to implement a simple web crawler program? How to use C++ to implement a simple web crawler program? Nov 04, 2023 am 11:37 AM

How to use C++ to implement a simple web crawler program? Introduction: The Internet is a treasure trove of information, and a large amount of useful data can be easily obtained from the Internet through web crawlers. This article will introduce how to use C++ to write a simple web crawler program, as well as some common tips and precautions. 1. Preparation to install a C++ compiler: First, you need to install a C++ compiler on your computer, such as gcc or clang. You can enter "g++-v" or "clang" through the command line

How to use PHP and XML to implement RSS subscription management and display on the website How to use PHP and XML to implement RSS subscription management and display on the website Jul 29, 2023 am 10:09 AM

How to use PHP and XML to implement RSS subscription management and display on a website. RSS (Really Simple Syndication) is a standard format for publishing frequently updated blog posts, news, audio and video content. Many websites provide RSS subscription functions, allowing users to easily obtain the latest information. In this article, we will learn how to use PHP and XML to implement the RSS subscription management and display functions of the website. First, we need to create an RSS subscription to XM

In-depth analysis of the similarities and differences between Golang and Ruby In-depth analysis of the similarities and differences between Golang and Ruby Jun 01, 2024 pm 08:46 PM

The main difference between Go and Ruby is that Go is a statically typed compiled language that supports lightweight parallelism and efficient memory management, and is suitable for writing high-concurrency applications; Ruby is a dynamically typed interpreted language that supports true parallelism but memory management It requires manual control and is suitable for writing flexible web applications.

Why have Python, Ruby and other languages ​​deprecated the increment operator? Why have Python, Ruby and other languages ​​deprecated the increment operator? May 11, 2023 pm 04:37 PM

Many people may notice a phenomenon, that is, in some modern programming languages ​​​​(of course, not referring to "recent" programming languages), the increment and decrement operators have been cancelled. In other words, there is no such expression as i++ or j-- in these languages, but only i+=1 or j-=1 Such an expression. This answer will explore the background and reasons for this phenomenon from the perspective of design philosophy. Strictly speaking, it may be biased to say "i++ is disappearing", because it seems that only Python, Rust and Swift among mainstream programming languages ​​do not support the increment and decrement operators. When I first came into contact with Python, this was also

PHP application: Get rss subscription content through function PHP application: Get rss subscription content through function Jun 20, 2023 pm 06:25 PM

With the rapid development of the Internet, more and more websites have begun to provide RSS subscription services, allowing users to easily obtain updated content from the website. As a popular server-side scripting language, PHP has many functions for processing RSS subscriptions, allowing developers to easily extract the required data from RSS sources. This article will introduce how to use PHP functions to obtain RSS subscription content. 1. What is RSS? The full name of RSS is "ReallySimpleSyndication" (abbreviated

How does Ruby use Mysql2 connection to operate MySQL? How does Ruby use Mysql2 connection to operate MySQL? Apr 17, 2023 pm 10:07 PM

Ruby operates MySQL using mysql2 to connect to mysql and operate mysql. geminstallmysql2 connects to mysql to establish a connection: require'mysql2'conn=Mysql2::Client.new({host:'192.168.200.73',username:'root',password:'P@ssword1!'}) The accepted connection options include: Mysql2::Clie

How to write a simple RSS subscriber via PHP How to write a simple RSS subscriber via PHP Sep 25, 2023 pm 07:05 PM

How to write a simple RSS subscriber through PHP RSS (ReallySimpleSyndication) is a format used to subscribe to website content. Through the subscriber, you can get the latest articles, news, blogs and other updates. In this article, we will write a simple RSS subscriber using PHP to demonstrate how to obtain and display the content of an RSS feed. Confirm environment and preparation Before starting, make sure you have a PHP environment and have the SimpleXML extension installed.

How to implement a simple data conversion function using MySQL and Ruby How to implement a simple data conversion function using MySQL and Ruby Sep 21, 2023 am 08:07 AM

How to use MySQL and Ruby to implement a simple data conversion function. In actual development work, data conversion is often required to convert one data format into another data format. This article will introduce how to use MySQL and Ruby to implement a simple data conversion function, and provide specific code examples. First, we need to install and configure the MySQL and Ruby environments. Make sure you have a MySQL database installed and can connect to the database via the command line or other tools. In addition, you need to install

See all articles