Python을 사용하여 DynamoDB에 효율적인 일괄 쓰기: 단계별 가이드-파이썬 튜토리얼-php.cn

Python을 사용하여 DynamoDB에 효율적인 일괄 쓰기: 단계별 가이드

Barbara Streisand

풀어 주다： 2025-01-08 06:49:41

원래의

416명이 탐색했습니다.

Efficient Batch Writing to DynamoDB with Python: A Step-by-Step Guide

이 가이드에서는 대규모 데이터 세트에 중점을 두고 Python을 사용하여 AWS DynamoDB에 효율적으로 데이터를 삽입하는 방법을 보여줍니다. 최적의 성능과 비용 절감을 위한 테이블 생성(필요한 경우), 임의 데이터 생성, 일괄 쓰기 등을 다룹니다. boto3 라이브러리가 필요합니다. pip install boto3을 사용하여 설치하세요.

1. DynamoDB 테이블 설정:

먼저 AWS 세션을 설정하고 DynamoDB 테이블의 리전을 정의합니다.

import boto3
from botocore.exceptions import ClientError

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table_name = 'My_DynamoDB_Table_Name'

로그인 후 복사

create_table_if_not_exists() 함수는 테이블의 존재를 확인하고 테이블이 없으면 기본 키(id)를 사용하여 테이블을 생성합니다.

def create_table_if_not_exists():
    try:
        table = dynamodb.Table(table_name)
        table.load()
        print(f"Table '{table_name}' exists.")
        return table
    except ClientError as e:
        if e.response['Error']['Code'] == 'ResourceNotFoundException':
            print(f"Creating table '{table_name}'...")
            table = dynamodb.create_table(
                TableName=table_name,
                KeySchema=[{'AttributeName': 'id', 'KeyType': 'HASH'}],
                AttributeDefinitions=[{'AttributeName': 'id', 'AttributeType': 'S'}],
                ProvisionedThroughput={'ReadCapacityUnits': 5, 'WriteCapacityUnits': 5}
            )
            table.meta.client.get_waiter('table_exists').wait(TableName=table_name)
            print(f"Table '{table_name}' created.")
            return table
        else:
            print(f"Error: {e}")
            raise

로그인 후 복사

2. 무작위 데이터 생성:

id, name, timestamp 및 value을 사용하여 샘플 레코드를 생성합니다.

import random
import string
from datetime import datetime

def generate_random_string(length=10):
    return ''.join(random.choices(string.ascii_letters + string.digits, k=length))

def generate_record():
    return {
        'id': generate_random_string(16),
        'name': generate_random_string(8),
        'timestamp': str(datetime.utcnow()),
        'value': random.randint(1, 1000)
    }

로그인 후 복사

3. 일괄 데이터 쓰기:

batch_write() 함수는 효율적인 대량 삽입을 위해 DynamoDB의 batch_writer()를 활용합니다(배치당 최대 25개 항목).

def batch_write(table, records):
    with table.batch_writer() as batch:
        for record in records:
            batch.put_item(Item=record)

로그인 후 복사

4. 주요 작업 흐름:

주요 기능은 테이블 생성, 데이터 생성 및 일괄 쓰기를 조정합니다.

def main():
    table = create_table_if_not_exists()
    records_batch = []
    for i in range(1, 1001):
        record = generate_record()
        records_batch.append(record)
        if len(records_batch) == 25:
            batch_write(table, records_batch)
            records_batch = []
            print(f"Wrote {i} records")
    if records_batch:
        batch_write(table, records_batch)
        print(f"Wrote remaining {len(records_batch)} records")

if __name__ == '__main__':
    main()

로그인 후 복사

5. 결론:

이 스크립트는 일괄 쓰기를 활용하여 상당한 데이터 볼륨에 대한 DynamoDB 상호 작용을 최적화합니다. 특정 요구 사항에 맞게 매개변수(배치 크기, 레코드 수 등)를 조정하는 것을 잊지 마십시오. 추가적인 성능 향상을 위해 고급 DynamoDB 기능을 살펴보세요.

위 내용은 Python을 사용하여 DynamoDB에 효율적인 일괄 쓰기: 단계별 가이드의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!