Alex的Hadoop菜鸟教程:第9课Sqoop1从Hbase或者Hive导出mysql_MySQL
今天讲讲怎么用sqoop将Hbase或者Hive的东西导出到mysql。不过事先要告诉大家
目前sqoop没有办法把数据直接从Hbase导出到mysql。必须要通过Hive建立2个表,一个外部表是基于这个Hbase表的,另一个是单纯的基于hdfs的hive原生表,然后把外部表的数据导入到原生表(临时),然后通过hive将临时表里面的数据导出到mysql
数据准备
mysql建立空表
CREATE TABLE `employee` ( `rowkey` int(11) NOT NULL, `id` int(11) NOT NULL, `name` varchar(20) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
注意:因为大家习惯性的把hive表用于映射Hbase的rowkey的字段命名为key,所以在建立mysql的table的时候有可能也建立对应的key字段,但是key是mysql的保留字,会导致insert语句无法插入的问题
Hbase建立employee表
建立employee表,并插入数据hbase(main):005:0> create 'employee','info' 0 row(s) in 0.4740 seconds => Hbase::Table - employee hbase(main):006:0> put 'employee',1,'info:id',1 0 row(s) in 0.2080 seconds hbase(main):008:0> scan 'employee' ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 row(s) in 0.0610 seconds hbase(main):009:0> put 'employee',1,'info:name','peter' 0 row(s) in 0.0220 seconds hbase(main):010:0> scan 'employee' ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 1 row(s) in 0.0450 seconds hbase(main):011:0> put 'employee',2,'info:id',2 0 row(s) in 0.0370 seconds hbase(main):012:0> put 'employee',2,'info:name','paul' 0 row(s) in 0.0180 seconds hbase(main):013:0> scan 'employee' ROW COLUMN+CELL 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 2 column=info:id, timestamp=1417591500179, value=2 2 column=info:name, timestamp=1417591512075, value=paul 2 row(s) in 0.0440 seconds
建立Hive外部表
hive 有分为原生表和外部表,原生表是以简单文件方式存储在hdfs里面,外部表依赖别的框架,比如Hbase,我们现在建立一个依赖于我们刚刚建立的employee hbase表的hive 外部表hive> CREATE EXTERNAL TABLE h_employee(key int, id int, name string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:id,info:name") > TBLPROPERTIES ("hbase.table.name" = "employee"); OK Time taken: 0.324 seconds hive> select * from h_employee; OK 1 1 peter 2 2 paul Time taken: 1.129 seconds, Fetched: 2 row(s)
建立Hive原生表
这个hive原生表只是用于导出的时候临时使用的,所以取名叫 h_employee_export,字段之间的分隔符用逗号CREATE TABLE h_employee_export(key INT, id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054';
我们去看下实际存储的文本文件是什么样子的
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0 1,1,peter 2,2,paul
源Hive表导入数据到临时表
第一步先将数据从 h_employee(基于Hbase的外部表)导入到 h_employee_export(原生Hive表)
hive> insert overwrite table h_employee_export select * from h_employee;
hive> select * from h_employee_export; OK 1 1 peter 2 2 paul Time taken: 0.359 seconds, Fetched: 2 row(s)
我们去看下实际存储的文本文件长什么样子
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0 1,1,peter 2,2,paul
从Hive导出数据到mysql
$ sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/ Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/12/05 08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1 14/12/05 08:49:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/12/05 08:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/12/05 08:49:35 INFO tool.CodeGenTool: Beginning code generation 14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1 14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1 14/12/05 08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/12/05 08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar 14/12/05 08:49:39 INFO mapreduce.ExportJobBase: Beginning export of employee 14/12/05 08:49:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/12/05 08:49:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.111.78.111:8032 14/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 1 14/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 1 14/12/05 08:49:45 INFO mapreduce.JobSubmitter: number of splits:1 14/12/05 08:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406097234796_0037 14/12/05 08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_0037 14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/ 14/12/05 08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_0037 14/12/05 08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 running in uber mode : false 14/12/05 08:49:59 INFO mapreduce.Job: map 0% reduce 0% 14/12/05 08:50:10 INFO mapreduce.Job: map 100% reduce 0% 14/12/05 08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully 14/12/05 08:50:10 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=99761 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=166 HDFS: Number of bytes written=0 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8805 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8805 Total vcore-seconds taken by all map tasks=8805 Total megabyte-seconds taken by all map tasks=9016320 Map-Reduce Framework Map input records=2 Map output records=2 Input split bytes=144 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=97 CPU time spent (ms)=1360 Physical memory (bytes) snapshot=167555072 Virtual memory (bytes) snapshot=684212224 Total committed heap usage (bytes)=148897792 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytes in 27.0676 seconds (6.1328 bytes/sec) 14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.
注意
在这段日志中有这样一句话
14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/
意思是你可以用浏览器访问这个地址去看下任务的执行情况,如果你的任务长时间卡主没结束就是出错了,可以去这个地址查看详细的错误日志
查看结果
mysql> select * from employee; +--------+----+-------+ | rowkey | id | name | +--------+----+-------+ | 1 | 1 | peter | | 2 | 2 | paul | +--------+----+-------+ 2 rows in set (0.00 sec) mysql>
导入成功

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





Full table scanning may be faster in MySQL than using indexes. Specific cases include: 1) the data volume is small; 2) when the query returns a large amount of data; 3) when the index column is not highly selective; 4) when the complex query. By analyzing query plans, optimizing indexes, avoiding over-index and regularly maintaining tables, you can make the best choices in practical applications.

InnoDB's full-text search capabilities are very powerful, which can significantly improve database query efficiency and ability to process large amounts of text data. 1) InnoDB implements full-text search through inverted indexing, supporting basic and advanced search queries. 2) Use MATCH and AGAINST keywords to search, support Boolean mode and phrase search. 3) Optimization methods include using word segmentation technology, periodic rebuilding of indexes and adjusting cache size to improve performance and accuracy.

Yes, MySQL can be installed on Windows 7, and although Microsoft has stopped supporting Windows 7, MySQL is still compatible with it. However, the following points should be noted during the installation process: Download the MySQL installer for Windows. Select the appropriate version of MySQL (community or enterprise). Select the appropriate installation directory and character set during the installation process. Set the root user password and keep it properly. Connect to the database for testing. Note the compatibility and security issues on Windows 7, and it is recommended to upgrade to a supported operating system.

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

The difference between clustered index and non-clustered index is: 1. Clustered index stores data rows in the index structure, which is suitable for querying by primary key and range. 2. The non-clustered index stores index key values and pointers to data rows, and is suitable for non-primary key column queries.

MySQL and MariaDB can coexist, but need to be configured with caution. The key is to allocate different port numbers and data directories to each database, and adjust parameters such as memory allocation and cache size. Connection pooling, application configuration, and version differences also need to be considered and need to be carefully tested and planned to avoid pitfalls. Running two databases simultaneously can cause performance problems in situations where resources are limited.

In MySQL database, the relationship between the user and the database is defined by permissions and tables. The user has a username and password to access the database. Permissions are granted through the GRANT command, while the table is created by the CREATE TABLE command. To establish a relationship between a user and a database, you need to create a database, create a user, and then grant permissions.

MySQL supports four index types: B-Tree, Hash, Full-text, and Spatial. 1.B-Tree index is suitable for equal value search, range query and sorting. 2. Hash index is suitable for equal value searches, but does not support range query and sorting. 3. Full-text index is used for full-text search and is suitable for processing large amounts of text data. 4. Spatial index is used for geospatial data query and is suitable for GIS applications.
