配置apache HIVE元数据DB为PostgreSQL
本文出处:http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/ HIVE 的元数据默认使用 derby 作为存储 DB , derby 作为轻量级的 DB ,在开发、测试过程中使用比较方便,但是在实际的生产环境中,还需要考虑易用性、容灾、稳定性以及各种监控、运
本文出处:http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/
HIVE的元数据默认使用derby作为存储DB,derby作为轻量级的DB,在开发、测试过程中使用比较方便,但是在实际的生产环境中,还需要考虑易用性、容灾、稳定性以及各种监控、运维工具等,这些都是derby缺乏的。MySQL和PostgreSQL是两个比较常用的开源数据库系统,在生产环境中比较多的用来替换derby。配置MySQL在网上的文章比较多,这里不再赘述,本文主要描述配置HIVE元数据DB为PostgreSQL的方法。
HIVE版本:HIVE 0.7-snapshot,HIVE 0.8-snapshot
步骤1:在PG中为元数据增加用户的DB
首先在PostgreSQL中为HIVE的元数据建立帐号和DB。
--以管理员身份登入PG:
psql postgres -U postgres
--创建用户hive_user:
Create user hive_user;
--创建DB metastore_db,owner为hive_user:
Create database metastore_db with owner=hive_user;
--设置hive_user的密码:
/password hive_user
完成以上步骤以后,还要确保PostgreSQL的pg_hba.conf中的配置允许HIVE所在的机器ip可以访问PG。
步骤2:下载PG的JDBC驱动
在HIVE_HOME目录下创建auxlib目录:
mkdir auxlib
此时HIVE_HOME目录中应该有bin,lib,auxlib,conf等目录。
下载PG的JDBC驱动
Wget http://jdbc.postgresql.org/download/postgresql-9.0-801.jdbc4.jar
将下载到的postgresql-9.0-801.jdbc4.jar放到auxlib中。
步骤3:修改HIVE配置文件
在HIVE_HOME中新建hive-site.xml 文件,内容如下,蓝色字体按照PG server的相关信息进行修改。
步骤4:初始化元数据表
元数据库metastore中默认没有表,当HIVE第一次使用某个表的时候,如果发现该表不存在就会自动创建。对derby和mysql,这个过程没有问题,因此derby和mysql作为元数据库不需要这一步。
PostgreSQL在初始化的时候,会遇到一些问题,导致PG数据库死锁。例如执行以下HIVE语句:
>Create table kv (key,int,value string) partitioned by (ds string);
OK
>Alter table kv add partition (ds = '20110101');
执行这一句的时候,HIVE会一直停在这。
查看PG数据库,发现有两个连接在进行事务操作,其中一个是:
此时处于事务中空闲,另外一个是:
ALTER TABLE "PARTITIONS" ADD CONSTRAINT "PARTITIONS_FK1" FOREIGN KEY ("SD_ID") REFERENCES "SDS" ("SD_ID") INITIALLY DEFERRED
处于等待状态。
进一步查看日志,发现大致的过程是这样的:
HIVE发起Alter table kv add partition (ds = '20110101')语句,此时DataNucleus接口发起第一个isolation为SERIALIZABLE的事务,锁定了TBLS等元数据表。在这个的事务进行过程中,DataNucleu发现PARTITIONS等表没有,则要自动创建。于是又发起了另外一个isolation为SERIALIZABLE的事务,第一个事务变为
类似的情况出现在:
>create test(key int);
OK
>drop table test;
当drop table时会去drop它的index,而此时没有index元数据表,它去键,然后产生死锁。
有三种方法可以解决这个死锁问题:
第一种方法:
使用PG的pg_terminate_backend()将第一个事务结束掉,这样可以保证第二个事务完成下去,将元数据表键成功。
第二种方法:
使HIVE将创建元数据表的过程和向元数据表中添加数据的过程分离:
>Create table kv (key,int,value string) partitioned by (ds string);
OK
>show partitions kv;
OK
>Alter table kv add partition (ds = '20110101');
OK
执行以上语句时就不会发生死锁,因为在执行show partitions kv语句时,它是只读语句,不会加锁。当这个语句发现PARTITIONS等表不在时,创建这些表不会发生死锁。
同样对于index表,使用
>Show index on kv;
可以将IDXS表建好。
第三种方法:
使用DataNucleu提供的SchemaTool,将HIVE的metastore/src/model/package.jdo文件作为输入,这个工具可以自动创建元数据中的表。具体的使用方法见:
http://www.datanucleus.org/products/accessplatform_2_0/rdbms/schematool.html
小结
本文给出了使用PostgreSQL作为HIVE元数据DB的配置方法,以及遇到的死锁问题的解决办法,希望对使用HIVE和PostgreSQL的朋友有帮助。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

70B model, 1000 tokens can be generated in seconds, which translates into nearly 4000 characters! The researchers fine-tuned Llama3 and introduced an acceleration algorithm. Compared with the native version, the speed is 13 times faster! Not only is it fast, its performance on code rewriting tasks even surpasses GPT-4o. This achievement comes from anysphere, the team behind the popular AI programming artifact Cursor, and OpenAI also participated in the investment. You must know that on Groq, a well-known fast inference acceleration framework, the inference speed of 70BLlama3 is only more than 300 tokens per second. With the speed of Cursor, it can be said that it achieves near-instant complete code file editing. Some people call it a good guy, if you put Curs

Last week, amid the internal wave of resignations and external criticism, OpenAI was plagued by internal and external troubles: - The infringement of the widow sister sparked global heated discussions - Employees signing "overlord clauses" were exposed one after another - Netizens listed Ultraman's "seven deadly sins" Rumors refuting: According to leaked information and documents obtained by Vox, OpenAI’s senior leadership, including Altman, was well aware of these equity recovery provisions and signed off on them. In addition, there is a serious and urgent issue facing OpenAI - AI safety. The recent departures of five security-related employees, including two of its most prominent employees, and the dissolution of the "Super Alignment" team have once again put OpenAI's security issues in the spotlight. Fortune magazine reported that OpenA

To add a server to Eclipse, follow these steps: Create a server runtime environment Configure the server Create a server instance Select the server runtime environment Configure the server instance Start the server deployment project

Concurrency testing and debugging Concurrency testing and debugging in Java concurrent programming are crucial and the following techniques are available: Concurrency testing: Unit testing: Isolate and test a single concurrent task. Integration testing: testing the interaction between multiple concurrent tasks. Load testing: Evaluate an application's performance and scalability under heavy load. Concurrency Debugging: Breakpoints: Pause thread execution and inspect variables or execute code. Logging: Record thread events and status. Stack trace: Identify the source of the exception. Visualization tools: Monitor thread activity and resource usage.

1. Background of the Construction of 58 Portraits Platform First of all, I would like to share with you the background of the construction of the 58 Portrait Platform. 1. The traditional thinking of the traditional profiling platform is no longer enough. Building a user profiling platform relies on data warehouse modeling capabilities to integrate data from multiple business lines to build accurate user portraits; it also requires data mining to understand user behavior, interests and needs, and provide algorithms. side capabilities; finally, it also needs to have data platform capabilities to efficiently store, query and share user profile data and provide profile services. The main difference between a self-built business profiling platform and a middle-office profiling platform is that the self-built profiling platform serves a single business line and can be customized on demand; the mid-office platform serves multiple business lines, has complex modeling, and provides more general capabilities. 2.58 User portraits of the background of Zhongtai portrait construction
