Optimizing Postgres Updates with CSV Data
Efficiently updating Postgres tables with data from CSV files is crucial for data management. Let's say you have a table (id, banana, apple) and a CSV file with updated banana
values. The challenge is to update only the banana
column without modifying the apple
column.
Leveraging COPY
and UPDATE
for Efficient Updates
The optimal approach involves using COPY
to import the CSV into a temporary table, then performing an UPDATE
based on the id
column. Here's the process:
<code class="language-sql">CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- Or see alternative below COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv); UPDATE tbl SET banana = tmp_x.banana FROM tmp_x WHERE tbl.id = tmp_x.id; DROP TABLE tmp_x; -- Alternatively, it's automatically dropped at the end of the session</code>
For a more streamlined approach when the temporary table structure mirrors the target table:
<code class="language-sql">CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;</code>
This creates an empty temporary table with the same schema as tbl
, omitting constraints.
Security and Performance Considerations
Prior to Postgres 11, COPY
required superuser privileges. However, Postgres 11 and later versions offer predefined roles (like pg_read_server_files
and pg_write_server_files
) for improved security.
The psql
meta-command copy
offers another solution, executing the COPY
command locally, thus bypassing superuser privilege requirements.
For large CSV files, optimizing performance is critical. Consider temporarily increasing the temp_buffers
parameter. Creating an index on the temporary table's id
column and running ANALYZE
can further enhance query speed.
The above is the detailed content of How Can I Efficiently Update Rows in a Postgres Table from a CSV File?. For more information, please follow other related articles on the PHP Chinese website!