Kettle连接Hive 中文乱码问题解决方案
刚开始接触Pentaho的 kettle desktop版本。我们这里主要应用其与hadoop及 hive 的关联进行数据处理。 kettle 的版本是4.4,使用的过程还是挺顺利的,顺利的建立好了一个转换任务,将 hive 中的数据提取到了本地文件。但是打开一看,所以 utf8 的 中文 全都是
刚开始接触Pentaho的kettle desktop版本。我们这里主要应用其与hadoop及hive的关联进行数据处理。kettle的版本是4.4,使用的过程还是挺顺利的,顺利的建立好了一个转换任务,将hive中的数据提取到了本地文件。但是打开一看,所以utf8的中文全都是乱码。而且kettle现在只支持到了hive0.7版本,还没支持到0.8,所以无法正确提取hive的meta信息,但是不影响HQL的正常运行。
只能先看看kettle是如何使用hive的jdbc连接的。我先将hive-jdbc.0.8.1.ar拷贝到{kettlehome}/libext/JDBC下,直接造成无法正常连接hive。
在该目录下存在jar文件hive-jdbc-0.7.0-pentaho-1.0.2.jar,这个类是一个适配类,不真正实现hive的jdbc连接。
而是通过反射的方式,找到classpath下的hivejdbc类,即存在于{kettlehome}\plugins\pentaho-big-data-plugin\hadoop-configurations\hadoop-20\lib\hive-jdbc-0.7.0-pentaho-1.0.2.jar这个jar文件,该文件用于真实的调用hive。
我们就来看一下这个jar中的实现。可以先从以下url中获取source文件。http://repo.pentaho.org/artifactory/repo/org/apache/hive/hive-jdbc/0.7.0-pentaho-1.0.2/hive-jdbc-0.7.0-pentaho-1.0.2-sources.jar 下载解压后,倒入到你自己的一个新建java工程中,并引入相关的类库,可以使之正常编译。
StructObjectInspector soi = (StructObjectInspector) serde.getObjectInspector();List fieldRefs = soi.getAllStructFieldRefs();//Object data = serde.deserialize(new BytesWritable(rowStr.getBytes()));//我们将该行屏蔽Object data = serde.deserialize(new BytesWritable(rowStr.getBytes("UTF-8")));//使用本行
然后将编译后的class文件加入到hive-jdbc-0.7.0-pentaho-1.0.2.jar
重新启动kettle。
然后再跑一下流程,正常了。当然,如果你的系统环境本身编码就是utf8的,应该不会出现这样的问题。
原文地址:Kettle连接Hive 中文乱码问题解决方案, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Call of Duty Warzone is a newly launched mobile game. Many players are very curious about how to set the language of this game to Chinese. In fact, it is very simple. Players only need to download the Chinese language pack, and then You can modify it after using it. The detailed content can be learned in this Chinese setting method introduction. Let us take a look together. How to set the Chinese language for the mobile game Call of Duty: Warzone 1. First enter the game and click the settings icon in the upper right corner of the interface. 2. In the menu bar that appears, find the [Download] option and click it. 3. Select [SIMPLIFIEDCHINESE] (Simplified Chinese) on this page to download the Simplified Chinese installation package. 4. Return to the settings

VSCode Setup in Chinese: A Complete Guide In software development, Visual Studio Code (VSCode for short) is a commonly used integrated development environment. For developers who use Chinese, setting VSCode to the Chinese interface can improve work efficiency. This article will provide you with a complete guide, detailing how to set VSCode to a Chinese interface and providing specific code examples. Step 1: Download and install the language pack. After opening VSCode, click on the left

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Tips for solving Chinese garbled characters written by PHP into txt files. With the rapid development of the Internet, PHP, as a widely used programming language, is used by more and more developers. In PHP development, it is often necessary to read and write text files, including txt files that write Chinese content. However, due to encoding format problems, sometimes the written Chinese will appear garbled. This article will introduce some techniques to solve the problem of Chinese garbled characters written into txt files by PHP, and provide specific code examples. Problem analysis in PHP, text

1. Place the earphones in the earphone box and keep the lid open. Press and hold the button on the box to enter the pairing state of the earphones. 2. Turn on the watch music function and select Bluetooth headphones, or select Bluetooth headphones in the watch settings function. 3. Select the headset on the watch to pair successfully.

How to connect Bluetooth controller to Gohan Arcade? Gohan Game Center is a game box used by many mobile game players. It contains a large number of popular game resources and rich game-related functions. Below, the editor will introduce the game controller connection method. Players, please take a look. 1. First go to the homepage of Gohan Game Center APP, and then click the "My" option in the lower right corner of the homepage; 2. Find the [Controller] function in the My page, the location is shown in the picture below, and click to go to settings; 3. Select to turn it on For the Bluetooth function of the mobile phone, confirm that the power of the controller is on; 4. Finally, follow the instructions of the controller to make a matching connection. If the connection is successful, you can use the mobile game to experience various games.

How to deal with the problem of garbled characters in the Linux terminal. When using the Linux system, sometimes the text displayed in the terminal will be garbled. This brings inconvenience to us when using the terminal and needs to be dealt with in time. This article will introduce how to deal with some common Linux terminal garbled problems, and provide specific code examples. Problem 1: Garbled Chinese characters on the terminal. Garbled Chinese characters on the terminal are usually caused by incorrect character encoding settings on the terminal. We can solve this problem by modifying the terminal's character encoding settings. #View the current terminal

Analysis of Java framework security vulnerabilities shows that XSS, SQL injection and SSRF are common vulnerabilities. Solutions include: using security framework versions, input validation, output encoding, preventing SQL injection, using CSRF protection, disabling unnecessary features, setting security headers. In actual cases, the ApacheStruts2OGNL injection vulnerability can be solved by updating the framework version and using the OGNL expression checking tool.
