


Detailed example of using Ruby and Nokogiri to simulate crawlers to export RSS seeds
# encoding: utf-8 require 'thread' require 'nokogiri' require 'open-uri' require 'rss/maker' $result=Queue.new def extract_readme_header(no,name,url) frame = Nokogiri::HTML(open(url)) return unless frame readme=$url+frame.css('frame')[1]['src'] return unless readme open(readme) do |f| doc = Nokogiri::HTML(f.read) text=doc.css("div#content div#filecontents p")[0..4].map { |c| c.content }.join(" ").strip return if text.length==0 if text !~ /(rails)|(activ_)/i puts "========= #{no} #{name} : #{text[0..50]}" date = f.last_modified $result << [no,name,readme,date,text] end end rescue puts $!.to_s end def make_rss(items) RSS::Maker.make("2.0") do |m| m.channel.title = "GtitHub recently updated projects" m.channel.link = "http://localhost" m.channel.description = "GitHub recently updated projects" m.items.do_sort = true items.each do |no,name,url,date,descr| i = m.items.new_item i.title = name i.link = url i.description=descr i.date = date end end end ############################## M A I N ######################## ############# Scan list of recent project lth=[] $url="http://rdoc.info" puts "get url #{$url}..." doc = Nokogiri::HTML(open($url)) doc.css('ul.libraries')[1].css('li').each_with_index do |li,i| aname =li.css('a').first name=aname.content purl=$url+aname['href'] lth << Thread.new(i,name,purl) { |j,n,u| extract_readme_header(j,n,u) } end ################ wait all readme are read lth.each { |th| th.join() } ################ dequeue results and sort them by date descending result=[] result << $result.shift while $result.size>0 result.sort! { |a,b| a[0] <=> b[0] } ################ format results in rss File.open("RubyFeeds.rss","w") do |file| file.write make_rss(result) end
The above is the detailed content of Detailed example of using Ruby and Nokogiri to simulate crawlers to export RSS seeds. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



How to use C++ to implement a simple web crawler program? Introduction: The Internet is a treasure trove of information, and a large amount of useful data can be easily obtained from the Internet through web crawlers. This article will introduce how to use C++ to write a simple web crawler program, as well as some common tips and precautions. 1. Preparation to install a C++ compiler: First, you need to install a C++ compiler on your computer, such as gcc or clang. You can enter "g++-v" or "clang" through the command line

How to use PHP and XML to implement RSS subscription management and display on a website. RSS (Really Simple Syndication) is a standard format for publishing frequently updated blog posts, news, audio and video content. Many websites provide RSS subscription functions, allowing users to easily obtain the latest information. In this article, we will learn how to use PHP and XML to implement the RSS subscription management and display functions of the website. First, we need to create an RSS subscription to XM

The main difference between Go and Ruby is that Go is a statically typed compiled language that supports lightweight parallelism and efficient memory management, and is suitable for writing high-concurrency applications; Ruby is a dynamically typed interpreted language that supports true parallelism but memory management It requires manual control and is suitable for writing flexible web applications.

Many people may notice a phenomenon, that is, in some modern programming languages (of course, not referring to "recent" programming languages), the increment and decrement operators have been cancelled. In other words, there is no such expression as i++ or j-- in these languages, but only i+=1 or j-=1 Such an expression. This answer will explore the background and reasons for this phenomenon from the perspective of design philosophy. Strictly speaking, it may be biased to say "i++ is disappearing", because it seems that only Python, Rust and Swift among mainstream programming languages do not support the increment and decrement operators. When I first came into contact with Python, this was also

With the rapid development of the Internet, more and more websites have begun to provide RSS subscription services, allowing users to easily obtain updated content from the website. As a popular server-side scripting language, PHP has many functions for processing RSS subscriptions, allowing developers to easily extract the required data from RSS sources. This article will introduce how to use PHP functions to obtain RSS subscription content. 1. What is RSS? The full name of RSS is "ReallySimpleSyndication" (abbreviated

Ruby operates MySQL using mysql2 to connect to mysql and operate mysql. geminstallmysql2 connects to mysql to establish a connection: require'mysql2'conn=Mysql2::Client.new({host:'192.168.200.73',username:'root',password:'P@ssword1!'}) The accepted connection options include: Mysql2::Clie

How to write a simple RSS subscriber through PHP RSS (ReallySimpleSyndication) is a format used to subscribe to website content. Through the subscriber, you can get the latest articles, news, blogs and other updates. In this article, we will write a simple RSS subscriber using PHP to demonstrate how to obtain and display the content of an RSS feed. Confirm environment and preparation Before starting, make sure you have a PHP environment and have the SimpleXML extension installed.

How to use MySQL and Ruby to implement a simple data conversion function. In actual development work, data conversion is often required to convert one data format into another data format. This article will introduce how to use MySQL and Ruby to implement a simple data conversion function, and provide specific code examples. First, we need to install and configure the MySQL and Ruby environments. Make sure you have a MySQL database installed and can connect to the database via the command line or other tools. In addition, you need to install
