In this article, I’ll walk you through how Django’s JSONField (a JSON & JSONB wrapper) can be used to model semi-structured data and how you can enforce a schema on that data using Pydantic—an approach that should feel natural for a Python web developer.
Let’s consider a system that processes payments, the Transaction table for example. It’s going to look like this:
from django.db import models class Transaction(models.Model): # Other relevant fields... payment_method = models.JSONField(default=dict, null=True, blank=True)
our focus is on the payment_method field. In a real-world situation, we are going to have existing methods for processing payments:
Credit card
PayPal
Buy Now, Pay Later
Cryptocurrency
Our system must be adaptable to store the specific data required by each payment method while maintaining a consistent and validatable structure.
We'll use Pydantic to define precise schemas for different payment methods:
from typing import Optional from pydantic import BaseModel class CreditCardSchema(BaseModel): last_four: str expiry_month: int expiry_year: int cvv: str class PayPalSchema(BaseModel): email: EmailStr account_id: str class CryptoSchema(BaseModel): wallet_address: str network: Optional[str] = None class BillingAddressSchema(BaseModel): street: str city: str country: str postal_code: str state: Optional[str] = None class PaymentMethodSchema(BaseModel): credit_card: Optional[CreditCardSchema] = None paypal: Optional[PayPalSchema] = None crypto: Optional[CryptoSchema] = None billing_address: Optional[BillingAddressSchema] = None
This approach offers several significant benefits:
Only one payment method can have a non-null value at a time.
It’s easy to extend or modify without complex database migrations.
Ensures data integrity at the model level.
To enforce a schema on our payment_method field, we leverage the Pydantic model to ensure that any data passed to the field aligns with the schema we've defined.
from typing import Optional, Mapping, Type, NoReturn from pydantic import ValidationError as PydanticValidationError from django.core.exceptions import ValidationError def payment_method_validator(value: Optional[dict]) -> Optional[Type[BaseModel] | NoReturn]: if value is None: return if not isinstance(value, Mapping): raise TypeError("Payment method must be a dictionary") try: PaymentMethodSchema(**value) except (TypeError, PydanticValidationError) as e: raise ValidationError(f"Invalid payment method: {str(e)}")
Here, we perform a few checks to make sure the data entering our validator is of the correct type so that Pydantic can validate it. We do nothing for nullable values, and we raise a type error if the value passed in is not a subclass of a Mapping type, such as a Dict or an OrderedDict.
When we create an instance of the Pydantic model using the value we pass into the constructor. If the structure of the value doesn't fit the defined schema for PaymentMethodSchema, Pydantic will raise a validation error. For example, if we pass an invalid email value for the email field in PayPalSchema, Pydantic will raise a validation error like this:
ValidationError: 1 validation error for PaymentMethodSchema paypal.email value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='Check me out on LinkedIn: https://linkedin.com/in/daniel-c-olah', input_type=str]
We can enforce this validation in two ways:
Custom Validation Method
During the save process, we call the validation function to ensure the payment method matches the expected schema.
from django.db import models class Transaction(models.Model): # ... other fields ... payment_method = models.JSONField(null=True, blank=True) def save(self, *args, **kwargs): # Override save method to include custom validation payment_method_validator(self.payment_method) super().save(*args, **kwargs)
While effective, this approach can become cumbersome and less idiomatic in Django. We could even replace the function with a class method that does the same thing to make the code cleaner.
Using Field Validators
This method leverages Django's built-in field validation mechanism:
from django.db import models class Transaction(models.Model): # Other relevant fields... payment_method = models.JSONField(default=dict, null=True, blank=True)
This approach balances flexibility and control over the values stored in the payment_method field. It allows us to adapt to future changes in requirements without compromising the integrity of existing data in that field. For example, we could include a Paystack ID field in our Paystack schema. This change would be seamless, as we wouldn't have to deal with complex database migrations.
We could even add a pay_later method in the future without any hassle. The types of fields could also change, and we wouldn't face database field migration constraints, like those encountered when migrating from integer primary keys to UUID primary keys. You can check out the complete code here to understand the concept completely.
Denormalization involves the deliberate duplication of data across multiple documents or collections to optimize for performance and scalability. This approach contrasts with the strict normalization used in traditional relational databases, and NoSQL databases have been instrumental in popularizing denormalization by introducing flexible, document-oriented storage paradigms.
Consider an e-commerce scenario with separate tables for products and orders. When a customer places an order, it’s essential to capture a snapshot of the product details included in the cart. Rather than referencing the current product records, which could change over time due to updates or deletions, we store the product information directly within the order. This ensures that the order retains its original context and integrity, reflecting the exact state of the products at the time of purchase. Denormalization plays a crucial role in achieving this consistency.
One possible approach might involve duplicating some product fields in the orders table. However, this method can introduce scalability challenges and compromise the cohesion of the order schema. A more effective solution is to serialize the relevant product fields into a JSON structure, allowing the order to maintain a self-contained record of the products without relying on external queries. The following code illustrates this technique:
from typing import Optional from pydantic import BaseModel class CreditCardSchema(BaseModel): last_four: str expiry_month: int expiry_year: int cvv: str class PayPalSchema(BaseModel): email: EmailStr account_id: str class CryptoSchema(BaseModel): wallet_address: str network: Optional[str] = None class BillingAddressSchema(BaseModel): street: str city: str country: str postal_code: str state: Optional[str] = None class PaymentMethodSchema(BaseModel): credit_card: Optional[CreditCardSchema] = None paypal: Optional[PayPalSchema] = None crypto: Optional[CryptoSchema] = None billing_address: Optional[BillingAddressSchema] = None
Since we’ve covered most of the concepts in the previous section, you should begin to appreciate Pydantic’s role in all of this. In the example above, we use Pydantic to validate a list of products linked to an order. By defining a schema for the product structure, Pydantic ensures that every product added to the order meets the expected requirements. If the data provided does not conform to the schema, Pydantic raises a validation error.
We can query JSONField keys the same way we perform looks in Django fields. Here are a few examples based on our use case.
from typing import Optional, Mapping, Type, NoReturn from pydantic import ValidationError as PydanticValidationError from django.core.exceptions import ValidationError def payment_method_validator(value: Optional[dict]) -> Optional[Type[BaseModel] | NoReturn]: if value is None: return if not isinstance(value, Mapping): raise TypeError("Payment method must be a dictionary") try: PaymentMethodSchema(**value) except (TypeError, PydanticValidationError) as e: raise ValidationError(f"Invalid payment method: {str(e)}")
You can check out the documentation to learn more about filtering JSON fields.
Using JSON and JSONB in PostgreSQL provides great flexibility for working with semi-structured data in relational databases. Tools like Pydantic and Django’s JSONField help enforce rules for data structure, making it easier to maintain accuracy and adapt to changes. However, this flexibility needs to be used carefully. Without proper planning, it can lead to slower performance or unnecessary complexity as your data changes over time.
In Django, field validators are only triggered when full_clean() is explicitly called—this typically occurs when using Django Forms or calling is_valid() on DRF serializers. For more details, you can refer to the Django validator documentation.
A more advanced approach to address this would be implementing a custom Django field that integrates Pydantic to handle both serialization and validation of JSON data internally. While this warrants a dedicated article, for now, you can explore libraries that offer ready-made solutions for this problem for example: django-pydantic-jsonfield
The above is the detailed content of How to Build Flexible Data Models in Django with JSONField and Pydantic. For more information, please follow other related articles on the PHP Chinese website!