Hello there!
Since 2019, I have been developing an open-source tool called pg-index-health, which analyzes database structures and identifies potential issues. In one of my previous articles, I shared the story of how this tool came to life.
Over the years, pg-index-health has evolved and improved. In 2024, with the support of several contributors, I managed to address most of its remaining "growing pains" and bring the project to a state where it is ready for large-scale expansion.
I've been working with PostgreSQL since 2015, and this fascinating journey began at the Yaroslavl-based company Tensor.
Back in 2015, it was still the era of monoliths with massive databases and a large number of tables. Typically, any changes to the structure of such databases required mandatory approval from an architect or a development lead, who served as the key knowledge holder. While this safeguarded against most errors, it slowed down the process of making changes and was entirely unscalable.
Gradually, people began transitioning to microservices.
The number of databases grew significantly, but the number of tables within each database, conversely, decreased. Now, each team started managing the structure of its own database independently. The centralized source of expertise disappeared, and database design errors began to multiply and propagate from one service to another.
Most of you have probably heard of the testing pyramid. For monoliths, it has a fairly characteristic shape with a broad base of unit tests. For more details, I recommend Martin Fowler's article.
Microservices have changed not only the approach to development but also the appearance of the testing pyramid. This shift was largely driven by the rise of containerization technologies (Docker, Testcontainers). Today the testing pyramid is no longer a pyramid at all. It can have a very bizarre shape. The most well-known examples are the Honeycomb and the Testing Trophy.
The modern trend is to write as few unit tests as possible, focusing on implementation details, and to prioritize component and integration tests that validate the actual functionality provided by the service.
My personal favorite is the Testing Trophy. At its base lies static code analysis, designed to guard against common errors.
Static analysis for Java and Kotlin code is now common practice. For Kotlin services, the tool of choice is typically detekt. For Java applications, the range of available tools (often referred to as linters) is broader. The main tools include Checkstyle, PMD, SpotBugs, and Error Prone. You can read more about them in my previous article.
Notably, both detekt and Checkstyle also handle code formatting, effectively functioning as formatters.
Modern microservices often include database migrations for creating and updating the database structure alongside application code.
In the Java ecosystem, the main tools for managing migrations are Liquibase and Flyway. Any changes to the database structure must always be documented in migrations. Even if changes are made manually during an incident in production, a migration must later be created to apply those changes across all environments.
Writing migrations in plain SQL is the best practice as it provides maximum flexibility and saves time compared to learning the XML dialects of tools like Liquibase. I touched on this in my article, "Six Tips for Using PostgreSQL in Functional Tests".
To verify the SQL code in migrations, I recommend using SQLFluff, which is essentially a Checkstyle equivalent for SQL. This linter supports multiple databases and dialects (including PostgreSQL) and can be integrated into your CI pipeline. It offers over 60 customizable rules, enabling you to manage table and column aliases, SQL command casing, indentation, column ordering in queries, and much more.
Compare the query with and without formatting:
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
Well-formatted SQL code is much easier to read and understand. Most importantly, code reviews will no longer be bogged down by discussions about formatting preferences. SQLFluff enforces a consistent style, saving time.
This is what it looks like in a real pull request:
Here SQLFluff found a problem with formatting the return value in the select statement: when only one column is returned, we do not put it on a separate row. The second point is the incorrect column order in the selection results: first we return simple columns and only then the calculation results. And the third is the incorrect case for and in the join statement: I prefer to write all queries in lowercase.
For more examples of SQLFluff in use, check out my open-source projects: one, two.
The structure of the database itself can also be checked. However, working with migrations is extremely inconvenient: there can be many of them; a new migration may fix errors in a previous migration, and so on. As a rule, we are more interested in the final structure of the database than its intermediate state.
PostgreSQL (like many other relational databases) stores metadata about all objects and relationships between them and provides it externally in the form of information_schema. We can use queries to information_schema to identify any deviations, problems, or common errors (this is exactly what SchemaCrawler does).
Since we work only with PostgreSQL, instead of information_schema we can use system catalogs (the pg_catalog schema), which provide much more information about the internal structure of a specific database.
In addition to metadata, PostgreSQL collects information about the operation of each database: what queries are executed, how they are executed, what access methods are used, etc. The Cumulative Statistics System is responsible for collecting this data.
By querying these statistics through system views and combining them with data from the system catalogs, we can:
Statistics can be reset manually. The date and time of the last reset are recorded in the system. It is important to consider this to understand whether the statistics can be trusted or not. For example, if you have some business logic that is executed once a month/quarter/half-year, the statistics need to be collected for a period of at least the interval mentioned above.
If a database cluster is used, then statistics are collected independently on each host and are not replicated within the cluster.
The idea of analyzing database structure based on metadata within the database itself, as described above, has been implemented by me in the form of a tool called pg-index-health.
My solution includes the following components:
All checks (also referred to as diagnostics) are divided into two groups:
Runtime checks are meaningful only when executed on a live database instance in production. These checks require accumulated statistics and aggregate this data from all hosts in the cluster.
Let’s consider a database cluster consisting of three hosts: primary, secondary, and async replica. Some services use clusters with similar topologies and execute heavy read queries only on the asynchronous replica to balance the load. Such queries are usually not executed on the primary host as they create additional load and negatively impact the latency of other queries.
As previously mentioned, in PostgreSQL, statistics are collected separately on each host and are not replicated within the cluster. Thus, you can easily encounter a situation where certain indexes are used and necessary only on the asynchronous replica. To make a reliable determination about whether an index is needed or not, it is necessary to run the check on each host in the cluster and aggregate the results.
Static checks do not require accumulated statistics and can be executed on the primary host immediately after applying migrations. Of course, they can also be used on a production database to obtain data in real time. However, most checks are static, and they are particularly useful in tests, as they help catch and prevent common errors during the development phase.
The primary use case for pg-index-health is adding tests to verify the database structure in your testing pipeline.
For Spring Boot applications, you need to add the starter to your test dependencies:
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
Then add a standard test:
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
In this test, all available checks are injected as a list. Then, only static checks are filtered and executed on the real database deployed in a container with applied migrations.
Ideally, each check should return an empty list. If there are any deviations when adding the next migration, the test will fail. The developer will be forced to pay attention to this and resolve the problem in any way: either fix it in his migration, or explicitly ignore it.
It is important to understand that pg-index-health, like any other static analyzer, may generate false positives. Additionally, some checks may not be relevant to your project. For example, it is considered good practice to document the database structure. PostgreSQL allows adding comments to almost all database objects. In a migration, this might look like the following:
dependencies { testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4") }
Within your team, you might agree not to do this. In that case, the results of corresponding checks (TABLES_WITHOUT_DESCRIPTION, COLUMNS_WITHOUT_DESCRIPTION, FUNCTIONS_WITHOUT_DESCRIPTION) become irrelevant for you.
You can either completely exclude these checks:
import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost; import io.github.mfvanek.pg.core.checks.common.Diagnostic; import io.github.mfvanek.pg.model.dbobject.DbObject; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.ActiveProfiles; import java.util.List; import static org.assertj.core.api.Assertions.assertThat; @SpringBootTest @ActiveProfiles("test") class DatabaseStructureStaticAnalysisTest { @Autowired private List<DatabaseCheckOnHost<? extends DbObject>> checks; @Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); } }
Or simply ignore their results:
create table if not exists demo.warehouse ( id bigint primary key generated always as identity, name varchar(255) not null ); comment on table demo.warehouse is 'Information about the warehouses'; comment on column demo.warehouse.id is 'Unique identifier of the warehouse'; comment on column demo.warehouse.name is 'Human readable name of the warehouse';
When introducing pg-index-health, you may often encounter situations where the database structure already has some deviations, but you do not want to address them immediately. At the same time, the check is relevant, and disabling it is not an option. In such cases, it is best to fix all deviations in the code:
@Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION && c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); }
Now, I would like to focus in more detail on the most frequently encountered issues.
Due to the specifics of the MVCC mechanism in PostgreSQL, situations like bloat can occur, where the size of your table (or index) rapidly grows due to a large number of dead tuples. This can happen, for example, as a result of long-running transactions or a one-time update of a large number of rows.
Garbage collection within the database is handled by the autovacuum process, but it does not free up the physical disk space occupied. The only way to effectively reduce the physical size of a table is by using the VACUUM FULL command, which requires an exclusive lock for the duration of the operation. For large tables, this can take several hours, making full vacuuming impractical for most modern services.
To address the issue of table bloat without downtime, third-party extensions like pg_repack are often used. One of the mandatory requirements of pg_repack is the presence of a primary key or some other uniqueness constraint on the target table. The TABLES_WITHOUT_PRIMARY_KEY diagnostic helps detect tables without primary keys and prevents maintenance problems in the future.
Below is an example of a table without a primary key. If bloat occurs in this table, pg_repack will be unable to process it and will return an error.
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
Our databases operate on hosts with limited resources, and disk space is one of them. When using Database-as-a-Service solutions, there is often a physical limitation on the maximum database size that cannot be changed.
Each index in a table is a separate entity on the disk. It occupies space and requires resources for maintenance, which slows down data insertion and updates. We create indexes to speed up searches or ensure the uniqueness of certain values. However, improper use of indexes can lead to a situation where their combined size exceeds the size of the useful data in the table itself. Thus, the number of indexes in a table should be minimal yet sufficient for its functionality.
I have encountered numerous cases where unnecessary indexes are created in migrations. For example, an index for a primary key is created automatically. While it is technically possible to manually index the id column, doing so is completely pointless.
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
A similar situation arises with unique constraints. When you mark a column (or a group of columns) with the unique keyword, PostgreSQL automatically creates an unique index for that column (or group of columns). Manually creating additional indexes is unnecessary. If done, this results in duplicated indexes. Such redundant indexes can and should be removed, and the DUPLICATED_INDEXES diagnostic can help identify them.
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
Most indexes are created for a single column. When query optimization begins, more complex indexes may be added, involving multiple columns. This leads to scenarios where indexes are created for columns like A, A B, and A B C. The first two indexes in this series can often be discarded since they are prefixes of the third (I recommend watching this video). Removing these redundant indexes can save significant disk space, and the INTERSECTED_INDEXES diagnostic is designed to detect such cases.
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
PostgreSQL allows the creation of foreign key constraints without specifying a supporting index, meaning that a table referencing another does not require and will not automatically create an index. In some cases, this might not be a problem and may not manifest at all. However, sometimes it can lead to incidents in production.
Let's look at a small example (I’m using PostgreSQL 16.6):
dependencies { testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4") }
We have an orders table and an order_item table. They are linked via a foreign key on the order_id column. The foreign key should always reference either the primary key or some unique constraint, which is satisfied in our case.
Let's populate the tables with data and gather statistics. We'll add 100,000 orders, with half having two items and the others having one.
import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost; import io.github.mfvanek.pg.core.checks.common.Diagnostic; import io.github.mfvanek.pg.model.dbobject.DbObject; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.ActiveProfiles; import java.util.List; import static org.assertj.core.api.Assertions.assertThat; @SpringBootTest @ActiveProfiles("test") class DatabaseStructureStaticAnalysisTest { @Autowired private List<DatabaseCheckOnHost<? extends DbObject>> checks; @Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); } }
If we attempt to retrieve items for an order with ID=100, we should successfully return 2 rows. Since there is an index on the id column in the orders table, it might seem that this query should be fast.
create table if not exists demo.warehouse ( id bigint primary key generated always as identity, name varchar(255) not null ); comment on table demo.warehouse is 'Information about the warehouses'; comment on column demo.warehouse.id is 'Unique identifier of the warehouse'; comment on column demo.warehouse.name is 'Human readable name of the warehouse';
However, if we try to profile this query, we will see sequential scanning of the table in the execution plan. We should also be concerned about the large number of pages that need to be read (Buffers parameter).
@Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION && c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); }
@Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> { final ListAssert<? extends DbObject> listAssert = assertThat(c.check()) .as(c.getDiagnostic().name()); switch (c.getDiagnostic()) { case TABLES_WITHOUT_DESCRIPTION, COLUMNS_WITHOUT_DESCRIPTION -> listAssert.hasSizeGreaterThanOrEqualTo(0); // ignored default -> listAssert.isEmpty(); } }); }
If we create an index for a column with a foreign key, the situation will return to normal:
@Test void checksShouldWorkForAdditionalSchema() { final PgContext ctx = PgContext.of("additional_schema"); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> { final ListAssert<? extends DbObject> listAssert = assertThat(c.check(ctx)) .as(c.getDiagnostic().name()); switch (c.getDiagnostic()) { case TABLES_WITHOUT_DESCRIPTION, TABLES_NOT_LINKED_TO_OTHERS -> listAssert.hasSize(1) .asInstanceOf(list(Table.class)) .containsExactly( Table.of(ctx, "additional_table") ); default -> listAssert.isEmpty(); } }); }
The sequential scan will disappear from the query plan, and the number of pages read will be significantly reduced:
create table if not exists demo.payment ( id bigint not null, -- column is not marked as primary key order_id bigint references demo.orders (id), status int not null, created_at timestamp not null, payment_total decimal(22, 2) not null );
The FOREIGN_KEYS_WITHOUT_INDEX diagnostic will allow you to catch such cases early during development, preventing performance issues.
It’s important to remember the issue of false positives: not all foreign key columns need to be indexed. Try to estimate the approximate table size in production; check your code for filtering, searching, or joining on the foreign key column. If you are 100% sure that you don’t need the index, you can simply add it to the exclusions. If you're unsure, it's better to create the index (it can always be removed later). I’ve often encountered incidents where the database "slowed down" due to the absence of an index on a foreign key, but I haven’t seen any incidents where the database "slowed down" because of the presence of such indexes. Therefore, I disagree with the point made in the Percona blog article that foreign key indexes should not be created at all from the start. This is a DBA approach. Do you have a dedicated DBA on your team?Should I create indexes or not?
By default, PostgreSQL includes null values in btree indexes, but they are usually not needed there. All null values are unique, and you cannot simply retrieve a record where the column value is null. Most of the time, it’s better to exclude nulls from the index by creating partial indexes on nullable columns like where is not null. The diagnostic INDEXES_WITH_NULL_VALUES helps to detect such cases.
Let's consider an example with orders and order_items. The order_item table has a nullable column warehouse_id, which represents the warehouse ID.
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
Suppose we have several warehouses. After the order is paid, we begin to assemble it. We will update the status of some orders and mark them as paid.
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
Individual items in an order may be shipped from different warehouses according to an internal algorithm, considering logistics, stock, warehouse load, etc. After assigning the warehouse and updating the stock, we update the warehouse_id field for each item in the order (which was initially null).
dependencies { testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4") }
We will need to search by a specific warehouse ID to know which items need to be completed and shipped. We take only paid orders for a certain time range.
import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost; import io.github.mfvanek.pg.core.checks.common.Diagnostic; import io.github.mfvanek.pg.model.dbobject.DbObject; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.ActiveProfiles; import java.util.List; import static org.assertj.core.api.Assertions.assertThat; @SpringBootTest @ActiveProfiles("test") class DatabaseStructureStaticAnalysisTest { @Autowired private List<DatabaseCheckOnHost<? extends DbObject>> checks; @Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); } }
The first solution would likely be a regular index on the warehouse_id column:
-- well-formatted SQL select pc.oid::regclass::text as table_name, pg_table_size(pc.oid) as table_size from pg_catalog.pg_class pc inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace where pc.relkind = 'r' and pc.oid not in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p' ) and nsp.nspname = :schema_name_param::text order by table_name;
If we create such an index, it will be used without issues when searching for items for a specific warehouse. It might seem that this index should allow efficiently finding all items where the warehouse is not yet assigned, filtering records with the condition warehouse_id is null.
-- poorly formatted SQL SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size FROM pg_catalog.pg_class pc JOIN pg_catalog.pg_namespace AS nsp ON nsp.oid = pc.relnamespace WHERE pc.relkind = 'r’ and pc.oid NOT in ( select c.conrelid as table_oid from pg_catalog.pg_constraint c where c.contype = 'p’ ) and nsp.nspname = :schema_name_param::text ORDER BY table_name;
However, if we look at the query execution plan, we will see sequential access there — the index is not used.
dependencies { testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4") }
Of course, this is related to the specific distribution of data in the test database. The warehouse_id column has low cardinality, meaning the number of unique values in it is low. An index on this column has low selectivity. Index selectivity refers to the ratio of the number of distinct indexed values (i.e., cardinality) to the total number of rows in the table distinct / count(). For example, a unique index has a selectivity of one.
We can increase the selectivity of the index by removing null values and creating a partial index on the warehouse_id column.
import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost; import io.github.mfvanek.pg.core.checks.common.Diagnostic; import io.github.mfvanek.pg.model.dbobject.DbObject; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.ActiveProfiles; import java.util.List; import static org.assertj.core.api.Assertions.assertThat; @SpringBootTest @ActiveProfiles("test") class DatabaseStructureStaticAnalysisTest { @Autowired private List<DatabaseCheckOnHost<? extends DbObject>> checks; @Test void checksShouldWork() { assertThat(checks) .hasSameSizeAs(Diagnostic.values()); checks.stream() .filter(DatabaseCheckOnHost::isStatic) .forEach(c -> assertThat(c.check()) .as(c.getDiagnostic().name()) .isEmpty()); } }
We will immediately see this index in the query plan:
create table if not exists demo.warehouse ( id bigint primary key generated always as identity, name varchar(255) not null ); comment on table demo.warehouse is 'Information about the warehouses'; comment on column demo.warehouse.id is 'Unique identifier of the warehouse'; comment on column demo.warehouse.name is 'Human readable name of the warehouse';
If we compare the sizes of the indexes, we will see a significant difference. The partial index is much smaller and will be updated less frequently. Using this index, we save disk space and improve performance.
Query to get the size of the indexes
@Test
void checksShouldWork() {
assertThat(checks)
.hasSameSizeAs(Diagnostic.values());
checks.stream()
.filter(DatabaseCheckOnHost::isStatic)
.filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION &&
c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION)
.forEach(c -> assertThat(c.check())
.as(c.getDiagnostic().name())
.isEmpty());
}
table_name | index_name | index_size_bytes |
---|---|---|
demo.order_item | demo.idx_order_item_warehouse_id | 1056768 |
demo.order_item | demo.idx_order_item_warehouse_id_without_nulls | 16384 |
These are far from all the issues that pg-index-health can detect. The full list of diagnostics is available in the README of the project on GitHub, and it is regularly expanded.
Integrating pg-index-health into a Spring Boot application is quite simple. The overhead for running the checks is minimal. As a result, you will gain protection from common errors and issues. I encourage you to try implementing it!
In the near future, I plan to add full support for partitioned tables in all checks. Currently, this is implemented only for 11 out of 25 checks. I also want to expand the number of checks: there are already tickets for implementing at least 5 new checks. Additionally, in 2025, I plan to switch to Java 17 and Spring Boot 3.
The above is the detailed content of pg-index-health – a static analysis tool for you PostgreSQL database. For more information, please follow other related articles on the PHP Chinese website!