--table 导出的目的表名称 --call 导出数据调用的指定存储过程名 --update-key 更新参考的列名称,多个列名使用逗号分隔 --update-mode 指定更新策略,包括:updateonly(默认)、allowinsert
--input-null-string 使用指定字符串,替换字符串类型值为null的列
--input-null-non-string 使用指定字符串,替换非字符串类型值为null的列
--staging-table 在数据导出到数据库之前,数据临时存放的表名称
--clear-staging-table 清除工作区中临时存放的数据 --batch 使用批量模式导出
下面,我们通过实例来说明,在实际中如何使用这些选项。这里,我们主要结合一个实例,讲解如何将Hive中的数据导入到MySQL数据库。首先,我们准备几个表,MySQL数据库为tag_db,里面有两个表,定义如下所示:
1.CREATE TABLE tag_db.users (
2. id INT(11) NOT NULL AUTO_INCREMENT,
3. name VARCHAR(100) NOT NULL,
4. PRIMARY KEY (`id`)
5.) ENGINE=InnoDB DEFAULT CHARSET=utf8;
6.
7.
8.CREATE TABLE tag_db.tags (
9. id INT(11) NOT NULL AUTO_INCREMENT,
10. user_id INT NOT NULL,
11. tag VARCHAR(100) NOT NULL,
12. PRIMARY KEY (`id`)
13.) ENGINE=InnoDB DEFAULT CHARSET=utf8;
14.
15.
复制代码
这两个表中存储的是基础数据,同时对应着Hive中如下两个表:
1.CREATE TABLE users (
2. id INT,
3. name STRING
4.);
5.
6.CREATE TABLE tags (
7. id INT,
8. user_id INT,
9. tag STRING
10.);
11.
复制代码
我们首先在上述MySQL的两个表中插入一些测试数据:
1.INSERT INTO tag_db.users(name) VALUES('jeffery');
2.INSERT INTO tag_db.users(name) VALUES('shirdrn');
3.INSERT INTO tag_db.users(name) VALUES('sulee');
4.
5.INSERT INTO tag_db.tags(user_id, tag) VALUES(1, 'Music');
6.INSERT INTO tag_db.tags(user_id, tag) VALUES(1, 'Programming');
7.INSERT INTO tag_db.tags(user_id, tag) VALUES(2, 'Travel');
8.INSERT INTO tag_db.tags(user_id, tag) VALUES(3, 'Sport');
9.
复制代码
然后,使用Sqoop的import工具,将MySQL两个表中的数据导入到Hive表,执行如下命令行:
1.bin/sqoop import --connect jdbc:mysql://10.95.3.49:3306/tag_db --table users --username shirdrn -P --hive-import -- --default-character-set=utf-8
2.
3.bin/sqoop import --connect jdbc:mysql://10.95.3.49:3306/tag_db --table tags --username shirdrn -P --hive-import -- --default-character-set=utf-8
复制代码
导入成功以后,再在Hive中创建一个用来存储users和tags关联后数据的表:
1.CREATE TABLE user_tags (
2. id STRING,
3. name STRING,
4. tag STRING
5.);
6.
复制代码
执行如下HQL语句,将关联数据插入user_tags表:
1.FROM users u JOIN tags t ON u.id=t.user_id INSERT INTO TABLE user_tags SELECT CONCAT(CAST(u.id AS STRING), CAST(t.id AS STRING)), u.name, t.tag;
复制代码
将users.id与tags.id拼接的字符串,作为新表的唯一字段id,name是用户名,tag是标签名称。
再在MySQL中创建一个对应的user_tags表,如下所示:
1.CREATE TABLE tag_db.user_tags (
2. id varchar(200) NOT NULL,
3. name varchar(100) NOT NULL,
4. tag varchar(100) NOT NULL
5.);
6.
复制代码
使用Sqoop的export工具,将Hive表user_tags的数据同步到MySQL表tag_db.user_tags中,执行如下命令行:
1.bin/sqoop export --connect jdbc:mysql://10.95.3.49:3306/tag_db --username shirdrn --P --table user_tags --export-dir /hive/user_tags --input-fields-terminated-by '\\001' -- --default-character-set=utf-8
复制代码
执行导出成功后,可以在MySQL的tag_db.user_tags表中看到对应的数据。
如果在导出的时候出现类似如下的错误:
1.14/02/27 17:59:06 INFO mapred.JobClient: Task Id : attempt_201402260008_0057_m_000001_0, Status : FAILED
2.java.io.IOException: Can't export data, please check task tracker logs
3. at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
4. at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
5. at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
6. at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:)
7. at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:7)
8. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:3)
9. at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
10. at java.security.AccessController.doPrivileged(Native Method)
11. at javax.security.auth.Subject.doAs(Subject.java:396)
12. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
13. at org.apache.hadoop.mapred.Child.main(Child.java:249)
14.Caused by: java.util.NoSuchElementException
15. at java.util.AbstractList$Itr.next(AbstractList.java:350)
16. at user_tags.__loadFromFields(user_tags.java:225)
17. at user_tags.parse(user_tags.java:174)
18. at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
19. ... 10 more
20.
复制代码
通过指定字段分隔符选项--input-fields-terminated-by,指定Hive中表字段之间使用的分隔符,供Sqoop读取解析,就不会报错了。
附网上的导入命令参考:
Sqoop Import Examples:
Sqoop Import :- Import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).
Import the data (MySQL table) to HBase:
Case 1: If table have primary key and import all the column of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
Case 2: If table have primary key and import only few columns of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
Note : Column names specified in --columns attribute must contain the primary key column.
Case 3: If table doesn't have primary key then choose one column as a hbase-row-key. Import all the column of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table
Case 4: If table doesn't have primary key then choose one column as a hbase-row-key. Import only few columns of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Note: Column name specified in hbase-row-key atribute must be in columns list. Otherwise command will execute successfully but no records are inserted into hbase.
Note : The value of primary key column or column specified in --hbase-row-key attribute become the HBase row value. If MySQL table doesn't have primary key or column specified in --hbase-row-key attribute doesn't have unique value then there is a lost of few records.
Example : Let us consider a MySQL table test_table which have two columns name,address. The table test_table doesn't have primary key or unique key column.
Records of test_table:
________________
name address
----------------
abc 123
sqw 345
abc 125
sdf 1234
aql 23dw
Run the following command to import test_table data into HBase:
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table
Only 4 records are visible into HBase table instead of 5. In above example two rows have same value 'abc' of name column and value of this column is used as a HBase row key value. If record having value 'abc' of name column come then thoes record will inserted into HBase table. Next time, another record having the same value 'abc' of name column come then thoes column will overwrite the value previous column.
Above problem also occured if table have composite primary key because the one column from composite key is used as a HBase row key.
Import the data (MySQL table) to Hive
Case 1: Import MySQL table into Hive if table have primary key.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn't have primary key.
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name
or
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
Import the data (MySQL table) to HDFS
Case 1: Import MySQL table into HDFS if table have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName
Case 2: Import MySQL table into HDFS if table doesn't have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName -m 1
Sqoop Export Examples:
Sqoop Export: export the HDFS and its subproject (Hive, HBase) data back into an RDBMS.
Export Hive table back to an RDBMS:
By default, Hive will stored data using ^A as a field delimiter and \\n as a row delimiter.
$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\\001'
where '\\001' is octal representation of ^A.
Sqoop1工具import和export使用详解
Sqoop工具import和export使用详解问题导读:1、Sqoop如何在异构平台之间进行数据迁移?2、Sqoop是怎样保证高可靠性的?Sqoop可以在HDFS/Hive和关系型数据库之间进行数据的导入导出,其中主要使用了import和export这两个工具。这两个工具非常强大,提供了很多选项帮助我们完成数据的迁移和同步。比如,下面两个潜在的需求:1.业务数据存放在关系数据库中,如果数据量达到一定规模后需要对其进行分析或同统计,单纯使用关系数据库可能会成为瓶颈,这时可以将数据从业务数据库数
Top