How to Efficiently Update Large Hive Tables Incrementally?-javaTutorial-php.cn

How to Efficiently Update Large Hive Tables Incrementally?

DDD

Release： 2024-11-17 03:41:03

Original

1138 people have browsed it

How to Efficiently Update Large Hive Tables Incrementally?

Hive: Efficient Incremental Updates for Main Table

Problem Overview

Maintaining large main tables in Hive requires a strategy for efficiently handling incremental data updates. The challenge lies in balancing speed and accuracy when managing both new and updated data.

Approaches

Approach 1: Delete and Insert

Find updated entries and remove them from the main table.
Insert the new incremental data.

Pros: Fast inserts
Cons: Slow deletes

Approach 2: Update Statement

Use the UPDATE statement to match key values and update specific fields.

Pros: Precise updates
Cons: Very slow due to逐条更新。

Optimized Solution

If ACID mode is unavailable, a combination of FULL OUTER JOIN or UNION ALL and row_number() provides an efficient solution:

Query 1 (FULL OUTER JOIN):

insert overwrite target_data [partition()]
SELECT
  --select new if exists, old if not exists
  case when i.PK is not null then i.PK   else t.PK   end as PK,
  case when i.PK is not null then i.PK   else t.PK   end as PK,
  ...
  case when i.PK is not null then i.COL_n else t.COL_n end as COL_n
FROM
    target_data t
    FULL JOIN increment_data i on (t.PK=i.PK);

Copy after login

Query 2 (UNION ALL):

INSERT OVERWRITE TABLE target_data
SELECT * FROM incremental_data
UNION ALL
SELECT * FROM target_data
WHERE
    NOT (PK IN (SELECT PK FROM incremental_data));

Copy after login

Tips

Restrict partitions in JOIN/UNION operations for faster execution.
Consider using UNION ALL if all columns need to be updated with new data.

Benefits of Optimized Solution

Fast and efficient updates
Handles both new and updated data accurately
Scalable for large datasets

The above is the detailed content of How to Efficiently Update Large Hive Tables Incrementally?. For more information, please follow other related articles on the PHP Chinese website!