Home Database Mysql Tutorial Ubuntu12.04+Nutch2.2.1+MySQL 配置笔记

Ubuntu12.04+Nutch2.2.1+MySQL 配置笔记

Jun 07, 2016 pm 03:24 PM
match

日期:2013/10/13 系统 :Ubuntu12.04LTS JDK :1.7.0_21 Nutch :2.2.1 MySQL :5.5.32 ------------------------------------------------------------------------------------------------------------------------------------------------------------

    

日期:2013/10/13

系统:Ubuntu12.04LTS

JDK:1.7.0_21

Nutch:2.2.1

MySQL:5.5.32

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Pre1:安装配置OracleJDK

Pre2:安装配置MySQL      sudo apt-get install mysql-server,mysql-client

Pre3:安装配置Apache Ant  sudo apt-get install ant

Start:Ubuntu下搭建Nutch2.2.1,并以MySQL作为数据库,UTF-8为默认编码综合配置

     

Step1:MySQL配置

首先编辑 /etc/mysql/my.cnf 文件在[mysqld]下面添加以下内容:

innodb_file_format=barracuda
innodb_file_per_table=true
innodb_large_prefix=true
character-set-server=utf8
collation-server=utf8mb4_unicode_ci
max_allowed_packet=500M
Copy after login

然后创建数据库与数据表:

CREATE DATABASE nutch DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8;
Copy after login
CREATE TABLE `webpage` (
`id` varchar(767) NOT NULL,
`headers` blob,
`text` mediumtext DEFAULT NULL,
`status` int(11) DEFAULT NULL,
`markers` blob,
`parseStatus` blob,
`modifiedTime` bigint(20) DEFAULT NULL,
`score` float DEFAULT NULL,
`typ` varchar(32) CHARACTER SET latin1 DEFAULT NULL,
`batchId` varchar(32) CHARACTER SET latin1 DEFAULT NULL, 
`baseUrl` varchar(767) DEFAULT NULL,
`content` longblob,
`title` varchar(2048) DEFAULT NULL,
`reprUrl` varchar(767) DEFAULT NULL,
`fetchInterval` int(11) DEFAULT NULL,
`prevFetchTime` bigint(20) DEFAULT NULL,
`inlinks` mediumblob,
`prevSignature` blob,
`outlinks` mediumblob,
`fetchTime` bigint(20) DEFAULT NULL,
`retriesSinceFetch` int(11) DEFAULT NULL,
`protocolStatus` blob,
`signature` blob,
`metadata` blob,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
DEFAULT CHARSET=utf8;
Copy after login
注:表中的字段根据nutch的conf文件“gora-sql-mapping”进行设置。同时也可通过自动方式生成数据库和表:配置好“gora-sql-mapping”、“gora.properties”及其它文件后,首次通过运行”bin/nutchinject urls”即可自动生成数据库和表,不过或许在自动生成的时候你会遇到问题,不过没有关系,通过及时查看hadoop.log文件你便会发现很多问题(如下图之一)与MySQL支持的数据类型、数据长度有关,只需要根据日志提示做修改、调试(可借助navicat工具像SQLServer方便操作数据库),然后再重复自动生成过程,直到成功为止。


Step2:Nutch配置

获取Nutch2.2.1,从官网http://www.apache.org/dyn/closer.cgi/nutch/下载,然后解压至本地安装目录,如本地根目录为${APACHE_NUTCH_HOME}

 配置nutch对mysql的支持,修改${APACHE_NUTCH_HOME}/ivy/ivy.xml文件

将以下行的注释取消:

<dependency org="”mysql”" name="”mysql-connector-java”" rev="”5.1.18″" conf="”*-">default”/></dependency>
Copy after login
<span><dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default"></dependency></span>
Copy after login

修改以下行:

<pre class="brush:php;toolbar:false"><span><dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"></dependency></span>
Copy after login
为:

<span><dependencyorg name="gora-core"><span>rev="0.2.1"</span>conf="*->default"/></dependencyorg></span>
Copy after login

Step3:数据库连接配置

编辑${APACHE_NUTCH_HOME}/conf/gora.properties文件,注释掉默认的数据库连接配置,同时添加以下配置内容:

###############################
#  MySQL configure   #
###############################
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=xxxx(MySQL用户名)
gora.sqlstore.jdbc.password=xxxx(MySQL密码)
Copy after login

Step4:数据表映射配置

修改${APACHE_NUTCH_HOME}/conf/gora.properties文件,这里的修改建议按照前面介绍的自动生成数据表的方法进行修改,网上说的要将primarykey的长度从512修改成767,即:

改:  为:

Step5:nutch-site.xml配置

添加以下配置:

<property>
	<name>http.agent.name</name>
	<value>Your Nutch Spider</value>
</property>
<property>
	<name>http.accept.language</name>
	<value>zh-cn, en-us,en-gb,en;q=0.7,*;q=0.3</value>
	<description>*</description>
</property>
<property>
	<name>parser.character.encoding.default</name>
	<value>utf-8</value>
	<description>*</description>
</property>
<property>
	<name>storage.data.store.class</name>
	<value>org.apache.gora.sql.store.SqlStore</value>
	<description>*</description>
</property>
Copy after login
特别需要注意,本人在配置过程中也遇到了:
java.lang.NullPointerException 
at org.apache.avro.util.Utf8.<init>(Utf8.java:37) 
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) 
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) 
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) 
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) 
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) </init>
Copy after login
解决办法就是在上述文件中另外添加一个属性:
<property>

    <name>generate.batch.id</name>

    <value>*</value>

</property>
Copy after login
Step6:使用ant 构建Nutch

(关于ant的命令,这里就不说明了),只需要切换到${APACHE_NUTCH_HOME}下执行ant clean 然后ant 即可。构建完毕后会在${APACHE_NUTCH_HOME}目录下生成runtime 文件夹。

Step:7 网页抓取,种子配置

创建种子文件

cd${APACHE_NUTCH_HOME}/runtime/local 
mkdir -p urls 
echo 'http://www.sina.com.cn' > urls/seed.txt
echo 'http://www.ifeng.com' > urls/seed.txt
Copy after login
执行爬取操作
bin/nutchcrawl urls -depth 5   -topN 10
Copy after login

至此,已经完成了基本的配置。






Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What is the best graphics card for i7 3770? What is the best graphics card for i7 3770? Dec 29, 2023 am 09:12 AM

What graphics card is good for Core i73770? RTX3070 is a very powerful graphics card with excellent performance and advanced technology. Whether you're playing games, rendering graphics, or performing machine learning, the RTX3070 can handle it with ease. It uses NVIDIA's Ampere architecture, has 5888 CUDA cores and 8GB of GDDR6 memory, which can provide a smooth gaming experience and high-quality graphics effects. RTX3070 also supports ray tracing technology, which can present realistic light and shadow effects. All in all, the RTX3070 is a powerful and advanced graphics card suitable for those who pursue high performance and high quality. RTX3070 is an NVIDIA series graphics card. Powered by 2nd generation NVID

i73770 with rx5600xt (i73770 with rx5600xt) i73770 with rx5600xt (i73770 with rx5600xt) Jan 04, 2024 am 11:26 AM

i73770 with rx5600xt Because the RX5600XT graphics card is matched with the R53600CPU, we chose the i7-3770. The evaluation results of the RX5600XT graphics card are as follows: The RX5600XT graphics card is an excellent graphics card and performed very well after testing. It adopts AMD's RDNA architecture, has 6GBGDDR6 video memory and 192-bit memory interface, supports PCIe4.0 bus, and has excellent gaming performance. In all tests, the RX5600XT graphics card performed well. At high resolutions, it delivers a smooth gaming experience and maintains frame rates above 60 FPS in most games. In the latest games, it can also provide good

Is it suitable to match 11600kf with 3070? Is it suitable to match 11600kf with 3070? Jan 02, 2024 am 11:54 AM

Is it reasonable to use 13600kf with 3070? "Words Play with Flowers" is a popular text puzzle game with new levels updated every day. Among them, Nostalgia Cleaning is one of the levels, which requires players to find 12 places in the picture that do not match the era. Today, I will share with you the strategy for clearing the nostalgic cleaning level of "Word Play Flowers", so that players who have not yet cleared the level will know the specific operation methods. If it is used to play games, then there is no difference between I513600KF and I713700KF in terms of gaming experience. In this case, just choose I513600KF for the CPU. For the graphics card, you can choose RTX3070. It should be noted that different games have different hardware requirements. If the little cutie just plays DOTALO

Which motherboard is suitable for Core 10th generation i5_mx330? Which motherboard is suitable for Core 10th generation i5_mx330? Dec 27, 2023 pm 02:17 PM

1. MX330 chassis overview MX330 is a mid-tower chassis produced by Shenzhen Hangjia Technology Co., Ltd. It has a simple and elegant appearance and is made of high-quality steel plates. It has excellent cooling performance and scalability, and is very suitable for use with high-performance motherboards and processors 2. Introduction to the Core 10th Generation i5 processor The Core 10th Generation i5 processor is a high-performance processor launched by Intel. Using a 10nm process, it has higher frequency and lower power consumption. It has powerful multi-core processing capabilities and intelligent acceleration technology, which can meet the daily use needs and light gaming needs of most users. 3. The heat dissipation performance of the MX330 chassis. The design of the MX330 chassis adopts an all-black grid style. The front and top are equipped with large-area meshes, which can

Recommended minimum CPU suitable for 1150 motherboard Recommended minimum CPU suitable for 1150 motherboard Jan 04, 2024 pm 09:22 PM

When choosing the lowest-end central processing unit (CPU), it's important to balance affordability with basic needs. For users using 1150 motherboards, choosing a suitable minimum configuration CPU can meet the needs of daily office work, web browsing and light entertainment. This article will recommend several suitable CPUs for you in terms of performance, price, power consumption, and scalability. Among the selections of the lowest-performance CPU, performance is an important consideration. For general office and light entertainment users, a quad-core processor is enough to meet the needs. Intel's i3 series and PentiumG series are both good choices. The i3 series has higher performance and larger cache, suitable for handling multi-tasking and multi-threaded applications. The PentiumG series is

How to choose the right motherboard computer accessories to improve the performance of Xiaomi computers? How to choose the right motherboard computer accessories to improve the performance of Xiaomi computers? Dec 28, 2023 am 10:11 AM

Which motherboard is better for Xiaomi computers? In today's era of rapid development of information technology, computers have become one of the indispensable tools in people's lives. When choosing a computer, the motherboard is one of the most important components. As a well-known technology company, Xiaomi has also launched a series of high-performance computer products. So, what kind of motherboard should Xiaomi choose? This article will elaborate on performance from multiple aspects such as performance, stability, scalability and brand reputation. Performance is one of the most important considerations when choosing a motherboard. Xiaomi's high-end computers have the highest configurations, so you need to choose a motherboard with powerful performance to match. We can consider choosing a motherboard that supports the latest generation of processors, such as Intel's 10th generation Core processors. Motherboard memory and storage expansion

Is it suitable to pair i56500 with GTX1070 (Is it suitable to pair i56500 with GTX1070) Is it suitable to pair i56500 with GTX1070 (Is it suitable to pair i56500 with GTX1070) Jan 08, 2024 am 08:25 AM

Is it suitable to pair i56500 with GTX1070? If you want to be more serious, the answer is no. But based on the actual situation, there is no problem with this combination. Why say no? Because for those large-scale stand-alone games that require more configuration now, if you use i76700k or i56500 with gtx1070, and the other configurations and settings are the same, the frame rate will usually be different, and the frame rate of 6700k will tend to be higher. In fact, this is more like the barrel principle. It is not a question of whether you can afford it or not. The key lies in how high the demand for the CPU is in the game. But considering the compatibility of i56500 and gtx1070 in most games, the frame rate has already Pretty impressive and capable of providing a great gaming experience, so something like this

Choose the right mouse for your laptop Choose the right mouse for your laptop Jan 02, 2024 pm 09:54 PM

What kind of mouse should I use with my laptop? It is best to use a wireless mouse. 1. The wireless mouse does not have the problem of wires getting tangled together, making the operation more convenient. 2. Equipped with a wireless mouse, you can avoid cluttered cables and provide more freedom when moving. 3. There is no need to use a cable to connect the wireless mouse to the notebook, and the cable will not be easily pulled out, making the use experience better. 4. In situations such as business trips, wireless mice are more convenient to carry. When using a mouse with a laptop, you should choose a wireless mouse. Because a wireless mouse does not require a cable, it is more convenient to use and can avoid tangles in the cable. At the same time, the sensitivity and response speed of a wireless mouse are better than that of a wired mouse, which can improve work efficiency. If you need to use it for a long time, it is recommended to choose a charging

See all articles