在 Python 中使用 Pydantic 的最佳實踐-Python教學-PHP中文網

首頁

後端開發

Python教學

在 Python 中使用 Pydantic 的最佳實踐

PHPz

Jul 19, 2024 am 04:28 AM

Best Practices for Using Pydantic in Python

Pydantic 是一個使用型別提示簡化資料驗證的 Python 函式庫。它確保資料完整性，並提供一種透過自動類型檢查和驗證來建立資料模型的簡單方法。

在軟體應用程式中，可靠的資料驗證對於防止錯誤、安全性問題和不可預測的行為至關重要。

本指南提供了在 Python 專案中使用 Pydantic 的最佳實踐，涵蓋模型定義、資料驗證、錯誤處理和效能最佳化。

安裝 Pydantic

要安裝 Pydantic，請使用 Python 套件安裝程式 pip，並使用以下命令：

pip install pydantic

登入後複製

此指令安裝 Pydantic 及其相依性。

基本用法

透過建立繼承自 BaseModel 的類別來建立 Pydantic 模型。使用Python類型註解來指定每個欄位的類型：

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

登入後複製

Pydantic 支援各種欄位類型，包括 int、str、float、bool、list 和 dict。您也可以定義巢狀模型和自訂類型：

from typing import List, Optional
from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str
    zip_code: Optional[str] = None

class User(BaseModel):
    id: int
    name: str
    email: str
    age: Optional[int] = None
    addresses: List[Address]

登入後複製

定義 Pydantic 模型後，透過提供所需資料來建立實例。如果任何欄位不符合指定要求，Pydantic 將驗證資料並引發錯誤：

user = User(
    id=1,
    name="John Doe",
    email="john.doe@example.com",
    addresses=[{"street": "123 Main St", "city": "Anytown", "zip_code": "12345"}]
)

print(user)

# Output:
# id=1 name='John Doe' email='john.doe@example.com' age=None addresses=[Address(street='123 Main St', city='Anytown', zip_code='12345')]

登入後複製

定義 Pydantic 模型

Pydantic 模型使用 Python 類型註解來定義資料欄位類型。

它們支援各種內建類型，包括：

基本型別：str、int、float、bool
集合類型：列表、元組、集合、字典
可選類型：對於可以為 None 的字段，輸入模組中可選
聯合類型：來自類型模組的聯合用於指定欄位可以是多種類型之一

範例：

from typing import List, Dict, Optional, Union
from pydantic import BaseModel

class Item(BaseModel):
    name: str
    price: float
    tags: List[str]
    metadata: Dict[str, Union[str, int, float]]

class Order(BaseModel):
    order_id: int
    items: List[Item]
    discount: Optional[float] = None

登入後複製

自訂類型

除了內建類型之外，您還可以使用 Pydantic 的 conint、constr 和其他約束函數定義自訂類型。

這些允許您新增額外的驗證規則，例如字串的長度限製或整數的值範圍。

範例：

from pydantic import BaseModel, conint, constr

class Product(BaseModel):
    name: constr(min_length=2, max_length=50)
    quantity: conint(gt=0, le=1000)
    price: float

product = Product(name="Laptop", quantity=5, price=999.99)

登入後複製

必填字段與可選字段

預設情況下，Pydantic 模型中的欄位是必需的，除非明確標記為可選。

如果模型實例化期間缺少必填字段，Pydantic 將引發 ValidationError。

範例：

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

user = User(id=1, name="John Doe")


# Output
#  Field required [type=missing, input_value={'id': 1, 'name': 'John Doe'}, input_type=dict]

登入後複製

具有預設值的可選字段

透過使用輸入模組中的Optional並提供預設值，可以將欄位設為可選。

範例：

from pydantic import BaseModel
from typing import Optional

class User(BaseModel):
    id: int
    name: str
    email: Optional[str] = None

user = User(id=1, name="John Doe")

登入後複製

在此範例中，電子郵件是可選的，如果未提供，則預設為 None。

嵌套模型

Pydantic 允許模型相互嵌套，從而實現複雜的資料結構。

巢狀模型被定義為其他模型的字段，確保多個層級的資料完整性和驗證。

範例：

from pydantic import BaseModel
from typing import Optional, List


class Address(BaseModel):
    street: str
    city: str
    zip_code: Optional[str] = None

class User(BaseModel):
    id: int
    name: str
    email: str
    addresses: List[Address]

user = User(
    id=1,
    name="John Doe",
    email="john.doe@example.com",
    addresses=[{"street": "123 Main St", "city": "Anytown"}]
)

登入後複製

管理嵌套資料的最佳實踐

使用巢狀模型時，重要的是：

在每個層級驗證資料：確保每個巢狀模型都有自己的驗證規則和約束。
使用清晰一致的命名約定：這使資料結構更具可讀性和可維護性。
保持模型簡單：避免過於複雜的巢狀結構。如果模型變得過於複雜，請考慮將其分解為更小、更易於管理的元件。

數據驗證

Pydantic 包含一組內建驗證器，可自動處理常見的資料驗證任務。

這些驗證器包括：

類型驗證：確保欄位與指定的類型註解相符（例如 int、str、list）。
範圍驗證：使用 conint、constr、confloat 等約束強制執行值範圍和長度。
格式驗證：檢查特定格式，例如用於驗證電子郵件地址的 EmailStr。
集合驗證：確保集合中的元素（例如清單、字典）符合指定的類型和約束。

這些驗證器簡化了確保模型內資料完整性和一致性的過程。

以下是一些示範內建驗證器的範例：

來自 pydantic 導入 BaseModel、EmailStr、conint、constr

class User(BaseModel):
    id: conint(gt=0)  # id must be greater than 0
    name: constr(min_length=2, max_length=50)  # name must be between 2 and 50 characters
    email: EmailStr  # email must be a valid email address
    age: conint(ge=18)  # age must be 18 or older

user = User(id=1, name="John Doe", email="john.doe@example.com", age=25)

登入後複製

在此範例中，使用者模型使用內建驗證器來確保 id 大於 0，名稱介於 2 到 50 個字元之間，電子郵件是有效的電子郵件地址，並且年齡為 18 歲或以上。
為了能夠使用電子郵件驗證器，您需要安裝 pydantic 擴充功能：

pip install pydantic[email]

登入後複製

Custom Validators

Pydantic allows you to define custom validators for more complex validation logic.

Custom validators are defined using the @field_validator decorator within your model class.

Example of a custom validator:

from pydantic import BaseModel, field_validator


class Product(BaseModel):
    name: str
    price: float

    @field_validator('price')
    def price_must_be_positive(cls, value):
        if value <= 0:
            raise ValueError('Price must be positive')
        return value

product = Product(name="Laptop", price=999.99)

登入後複製

Here, the price_must_be_positive validator ensures that the price field is a positive number.

Custom validators are registered automatically when you define them within a model using the @field_validator decorator. Validators can be applied to individual fields or across multiple fields.

Example of registering a validator for multiple fields:

from pydantic import BaseModel, field_validator


class Person(BaseModel):
    first_name: str
    last_name: str

    @field_validator('first_name', 'last_name')
    def names_cannot_be_empty(cls, value):
        if not value:
            raise ValueError('Name fields cannot be empty')
        return value

person = Person(first_name="John", last_name="Doe")

登入後複製

In this example, the names_cannot_be_empty validator ensures that both the first_name and last_name fields are not empty.

Using Config Classes

Pydantic models can be customized using an inner Config class.

This class allows you to set various configuration options that affect the model's behavior, such as validation rules, JSON serialization, and more.

Example of a Config class:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

    class Config:
        str_strip_whitespace = True  # Strip whitespace from strings
        str_min_length = 1  # Minimum length for any string field

user = User(id=1, name="  John Doe  ", email="john.doe@example.com")

print(user)

# Output:
# id=1 name='John Doe' email='john.doe@example.com'

登入後複製

In this example, the Config class is used to strip whitespace from string fields and enforce a minimum length of 1 for any string field.

Some common configuration options in Pydantic's Config class include:

str_strip_whitespace: Automatically strip leading and trailing whitespace from string fields.
str_min_length: Set a minimum length for any string field.
validate_default: Validate all fields, even those with default values.
validate_assignment: Enable validation on assignment to model attributes.
use_enum_values: Use the values of enums directly instead of the enum instances.
json_encoders: Define custom JSON encoders for specific types.

Error Handling

When Pydantic finds data that doesn't conform to the model's schema, it raises a ValidationError.

This error provides detailed information about the issue, including the field name, the incorrect value, and a description of the problem.

Here's an example of how default error messages are structured:

from pydantic import BaseModel, ValidationError, EmailStr

class User(BaseModel):
    id: int
    name: str
    email: EmailStr

try:
    user = User(id='one', name='John Doe', email='invalid-email')
except ValidationError as e:
    print(e.json())

# Output:
# [{"type":"int_parsing","loc":["id"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"one","url":"https://errors.pydantic.dev/2.8/v/int_parsing"},{"type":"value_error","loc":["email"],"msg":"value is not a valid email address: An email address must have an @-sign.","input":"invalid-email","ctx":{"reason":"An email address must have an @-sign."},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]

登入後複製

In this example, the error message will indicate that id must be an integer and email must be a valid email address.

Customizing Error Messages

Pydantic allows you to customize error messages for specific fields by raising exceptions with custom messages in validators or by setting custom configurations.

Here’s an example of customizing error messages:

from pydantic import BaseModel, ValidationError, field_validator

class Product(BaseModel):
    name: str
    price: float

    @field_validator('price')
    def price_must_be_positive(cls, value):
        if value <= 0:
            raise ValueError('Price must be a positive number')
        return value

try:
    product = Product(name='Laptop', price=-1000)
except ValidationError as e:
    print(e.json())

# Output:
# [{"type":"value_error","loc":["price"],"msg":"Value error, Price must be a positive number","input":-1000,"ctx":{"error":"Price must be a positive number"},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]

登入後複製

In this example, the error message for price is customized to indicate that it must be a positive number.

Best Practices for Error Reporting

Effective error reporting involves providing clear, concise, and actionable feedback to users or developers.

Here are some best practices:

Log errors: Use logging mechanisms to record validation errors for debugging and monitoring purposes.
Return user-friendly messages: When exposing errors to end-users, avoid technical jargon. Instead, provide clear instructions on how to correct the data.
Aggregate errors: When multiple fields are invalid, aggregate the errors into a single response to help users correct all issues at once.
Use consistent formats: Ensure that error messages follow a consistent format across the application for easier processing and understanding.

Examples of best practices in error reporting:

from pydantic import BaseModel, ValidationError, EmailStr
import logging

logging.basicConfig(level=logging.INFO)

class User(BaseModel):
    id: int
    name: str
    email: EmailStr

def create_user(data):
    try:
        user = User(**data)
        return user
    except ValidationError as e:
        logging.error("Validation error: %s", e.json())
        return {"error": "Invalid data provided", "details": e.errors()}

user_data = {'id': 'one', 'name': 'John Doe', 'email': 'invalid-email'}
response = create_user(user_data)
print(response)

# Output:
# ERROR:root:Validation error: [{"type":"int_parsing","loc":["id"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"one","url":"https://errors.pydantic.dev/2.8/v/int_parsing"},{"type":"value_error","loc":["email"],"msg":"value is not a valid email address: An email address must have an @-sign.","input":"invalid-email","ctx":{"reason":"An email address must have an @-sign."},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]
# {'error': 'Invalid data provided', 'details': [{'type': 'int_parsing', 'loc': ('id',), 'msg': 'Input should be a valid integer, unable to parse string as an integer', 'input': 'one', 'url': 'https://errors.pydantic.dev/2.8/v/int_parsing'}, {'type': 'value_error', 'loc': ('email',), 'msg': 'value is not a valid email address: An email address must have an @-sign.', 'input': 'invalid-email', 'ctx': {'reason': 'An email address must have an @-sign.'}}]}

登入後複製

In this example, validation errors are logged, and a user-friendly error message is returned, helping maintain application stability and providing useful feedback to the user.

Performance Considerations

Lazy initialization is a technique that postpones the creation of an object until it is needed.

In Pydantic, this can be useful for models with fields that are costly to compute or fetch. By delaying the initialization of these fields, you can reduce the initial load time and improve performance.

Example of lazy initialization:

from pydantic import BaseModel
from functools import lru_cache

class DataModel(BaseModel):
    name: str
    expensive_computation: str = None

    @property
    @lru_cache(maxsize=1)
    def expensive_computation(self):
        # Simulate an expensive computation
        result = "Computed Value"
        return result

data_model = DataModel(name="Test")
print(data_model.expensive_computation)

登入後複製

In this example, the expensive_computation field is computed only when accessed for the first time, reducing unnecessary computations during model initialization.

Redundant Validation

Pydantic models automatically validate data during initialization.

However, if you know that certain data has already been validated or if validation is not necessary in some contexts, you can disable validation to improve performance.

This can be done using the model_construct method, which bypasses validation:

Example of avoiding redundant validation:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# Constructing a User instance without validation
data = {'id': 1, 'name': 'John Doe', 'email': 'john.doe@example.com'}
user = User.model_construct(**data)

登入後複製

In this example, User.model_construct is used to create a User instance without triggering validation, which can be useful in performance-critical sections of your code.

Efficient Data Parsing

When dealing with large datasets or high-throughput systems, efficiently parsing raw data becomes critical.

Pydantic provides the model_validate_json method, which can be used to parse JSON or other serialized data formats directly into Pydantic models.

Example of efficient data parsing:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

json_data = '{"id": 1, "name": "John Doe", "email": "john.doe@example.com"}'
user = User.model_validate_json(json_data)
print(user)

登入後複製

In this example, model_validate_json is used to parse JSON data into a User model directly, providing a more efficient way to handle serialized data.

Controlling Validation

Pydantic models can be configured to validate data only when necessary.

The validate_default and validate_assignment options in the Config class control when validation occurs, which can help improve performance:

validate_default: When set to False, only fields that are set during initialization are validated.
validate_assignment: When set to True, validation is performed on field assignment after the model is created.

Example configuration:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

    class Config:
        validate_default = False  # Only validate fields set during initialization
        validate_assignment = True  # Validate fields on assignment

user = User(id=1, name="John Doe", email="john.doe@example.com")
user.email = "new.email@example.com"  # This assignment will trigger validation

登入後複製

In this example, validate_default is set to False to avoid unnecessary validation during initialization, and validate_assignment is set to True to ensure that fields are validated when they are updated.

Settings Management

Pydantic's BaseSettings class is designed for managing application settings, supporting environment variable loading and type validation.

This helps in configuring applications for different environments (e.g., development, testing, production).

Consider this .env file:

database_url=db
secret_key=sk
debug=False

登入後複製

Example of using BaseSettings:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    secret_key: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()
print(settings.model_dump())

# Output:
# {'database_url': 'db', 'secret_key': 'sk', 'debug': False}

登入後複製

In this example, settings are loaded from environment variables, and the Config class specifies that variables can be loaded from a .env file.

For using BaseSettings you will need to install an additional package:

pip install pydantic-settings

登入後複製

Managing settings effectively involves a few best practices:

Use environment variables: Store configuration values in environment variables to keep sensitive data out of your codebase.
Provide defaults: Define sensible default values for configuration settings to ensure the application runs with minimal configuration.
Separate environments: Use different configuration files or environment variables for different environments (e.g., .env.development, .env.production).
Validate settings: Use Pydantic's validation features to ensure all settings are correctly typed and within acceptable ranges.

Common Pitfalls and How to Avoid Them

One common mistake when using Pydantic is misapplying type annotations, which can lead to validation errors or unexpected behavior.

Here are a few typical mistakes and their solutions:

Misusing Union Types: Using Union incorrectly can complicate type validation and handling.
Optional Fields without Default Values: Forgetting to provide a default value for optional fields can lead to None values causing errors in your application.
Incorrect Type Annotations: Assigning incorrect types to fields can cause validation to fail. For example, using str for a field that should be an int.

Ignoring Performance Implications

Ignoring performance implications when using Pydantic can lead to slow applications, especially when dealing with large datasets or frequent model instantiations.

Here are some strategies to avoid performance bottlenecks:

Leverage Configuration Options: Use Pydantic's configuration options like validate_default and validate_assignment to control when validation occurs.
Optimize Nested Models: When working with nested models, ensure that you are not over-validating or duplicating validation logic.
Use Efficient Parsing Methods: Utilize model_validate_json and model_validate for efficient data parsing.
Avoid Unnecessary Validation: Use the model_construct method to create models without validation when the data is already known to be valid.