Originally posted on bcd.dev
Working with large datasets stored in JSON columns presents significant performance issues, especially when filtering and sorting. In my experience, these challenges became evident while monitoring PHP processes and managing large volumes of records, leading to execution time limits being hit.
As part of my regular monitoring duties, I encountered max execution times of 30 seconds while querying JSON columns in a 580k record dataset. JSON columns, though flexible, are prone to performance bottlenecks, particularly without proper indexing.
The first major issue appeared when working on a Filament list record page, which had default sorting applied to a JSON attribute. The absence of indexing on this attribute resulted in a significant slowdown, especially when processing over 10,000 records. Without an index, querying and sorting through nested JSON attributes can cause execution delays and inefficiencies in retrieving results, pushing PHP processes beyond acceptable limits.
When faced with performance issues from sorting and filtering large JSON columns, I revisited an old solution: virtual columns from my friend Rob Fonseca. Virtual columns in MySQL allow me to create an indexed, computed column from JSON data, making queries more efficient without duplicating data.
Unlike standard JSON columns, virtual columns are calculated automatically from existing data but can be indexed, making them faster for querying. This improves sorting and filtering performance significantly, especially in large datasets where execution time is critical.
I implemented virtual columns by adding a migration that created a new indexed column for filtering and sorting. This virtual column extracted and indexed specific JSON attributes, drastically improving query performance. Here's an example migration:
$table->string('approved_at') ->nullable() ->virtualAs("json_unquote(json_extract(data, '$.latest_approval_date'))"); $table->index('approved_at');
By indexing this virtual column, I was able to reduce query times and improve overall efficiency, especially when filtering and sorting large datasets.
Once I implemented the virtual columns, I needed to ensure the performance gains were real. Benchmarking provided concrete data, comparing the execution times of filtering, sorting, and paginating large datasets using both the original nested JSON column and the new virtual column with indexing.
With over 580k records, queries on the nested JSON column were slow:
Benchmark::dd([ 'count' => fn () => Document::count(), 'paginate' => fn () => Document::paginate(100), 'filter + paginate' => fn () => Document::where('data->latest_approval_date', '>', '2024-09-05')->paginate(100), 'sort + paginate' => fn () => Document::orderBy('data->latest_approval_date')->paginate(100), 'filter + sort + paginate' => fn () => Document::where('data->latest_approval_date', '>', '2024-09-05')->orderBy('data->latest_approval_date')->paginate(100), ], iterations: 100);
After indexing the virtual column, the improvements were substantial:
These benchmarks confirmed the effectiveness of virtual columns in optimizing query performance.
Benchmark::dd([ 'count' => fn () => Document::count(), 'paginate' => fn () => Document::paginate(100), 'filter + paginate' => fn () => Document::where('approved_at', '>', '2024-09-05')->paginate(100), 'sort + paginate' => fn () => Document::orderBy('approved_at')->paginate(100), 'filter + sort + paginate' => fn () => Document::where('approved_at', '>', '2024-09-05')->orderBy('approved_at')->paginate(100), ], iterations: 100);
To improve performance, we’ll start by adding a virtual column for the approved_at field. This column extracts and indexes the JSON attribute for better query performance.
use Illuminate\Database\Migrations\Migration; use Illuminate\Database\Schema\Blueprint; use Illuminate\Support\Facades\Schema; return new class extends Migration { public function up(): void { Schema::table('documents', function (Blueprint $table) { $table->string('approved_at') ->nullable() ->virtualAs("json_unquote(json_extract(data, '$.latest_approval_date'))"); $table->index('approved_at'); }); } public function down(): void { Schema::table('documents', function (Blueprint $table) { $table->dropColumn('approved_at'); }); } };
We’ll create a HasVirtualFields trait to ensure that virtual fields are not mistakenly saved.
namespace App\Models\Concerns; trait HasVirtualFields { public function save(array $options = []) { if (isset($this->virtualFields)) { $this->attributes = array_diff_key($this->attributes, array_flip($this->virtualFields)); } return parent::save($options); } }
In the model, include the trait and define the virtual fields. This ensures that any virtual columns are properly managed.
use App\Models\Concerns\HasVirtualFields; class Document extends Model { use HasVirtualFields; protected array $virtualFields = [ 'approved_at', ]; }
To test the performance improvements, we’ll generate fake data and benchmark the queries before and after using virtual columns. Use the following provisioning script:
$count = 500 * 1000; for ($i = 0; $i < 250; $i++) { Document::factory()->count(1000)->create(); }
Write tests to verify that the virtual column works as expected. Here’s an example test suite:
namespace Tests\Feature\Models; use Tests\TestCase; use App\Models\Document; class DocumentTest extends TestCase { public function testApprovedAt() { $date = fake()->dateTimeBetween()->format(DATE_ATOM); $document = Document::factory()->create([ 'data' => [ 'latest_approval_date' => $date, ], ]); $document->refresh(); $this->assertEquals($date, $document->approved_at); } }
This complete solution ensures that your JSON columns can be optimized for performance, particularly for large datasets.
Using virtual columns with indexing can dramatically improve performance when working with large datasets and JSON columns. By transitioning from nested JSON queries to indexed virtual columns, I was able to reduce query times by up to 36x.
Best Practices:
Originally posted on bcd.dev
The above is the detailed content of Optimizing Sorting and Filtering on JSON Columns in Laravel with Indexed Virtual Columns. For more information, please follow other related articles on the PHP Chinese website!