...

Why database indexes can do more harm than good

database indexes accelerate queries, but they can slow down write operations significantly, consume memory, and cause the optimizer to generate unfavorable plans. I will demonstrate specifically when indexes fail, how typical MySQL indexing pitfalls arise, and how I maintain a balance between database performance and hosting tuning.

Key points

The following bullet points classify the most important risks and measures.

  • writing loadEach additional index increases the cost of INSERT/UPDATE/DELETE operations.
  • over-indexingToo many indexes bloat memory and complicate optimizer decisions.
  • cardinalityIndexes on low-cardinality columns offer little benefit and a lot of overhead.
  • SequenceComposite indexes only work correctly with the appropriate column order.
  • MonitoringMeasure, evaluate, remove unused indexes – continuously.

Why indexes slow things down instead of speeding them up

I consider indexes to be trade-off: They save reading time, but cost work with every data mutation. With write-intensive workloads, this overhead quickly adds up because the engine has to maintain the index trees. Many developers underestimate this until latencies increase and timeouts occur. Too many options also cause the optimizer to choose suboptimal plans—a classic starting point for MySQL indexing pitfalls. If you really want to control database performance, you need to weigh the benefits and costs of each index soberly.

Write operations: the real bottleneck

Each index generates additional Overhead with INSERT, UPDATE, and DELETE. I have seen bulk loads that run in 10–15 seconds without indexes, but take almost two minutes with multiple indexes. This difference eats into throughput in log and event systems, e-commerce checkouts, and mass imports. Those who load data at night often disable secondary indexes, import, and then selectively rebuild them. This practice saves time as long as I know exactly which indexes are actually needed afterwards.

Over-indexing and memory load

Memory requirements often remain invisible until the buffer pool becomes too small and IOPS skyrocket. String columns greatly increase index size because length information and keys must be stored. The result: more page reads, more cache pressure, and ultimately more latency. I therefore regularly check which indexes queries actually use and which only appear useful in theory. If you want to delve deeper, you can find more information in my guide. Optimize SQL database Practical steps for lean structures.

Incorrect indexes: low cardinality and rare filters

An index on a column with cardinality 2 like status = {active, inactive} is of little use. The engine still reads many pages at the end, updates become more expensive, and there are no real gains. The same applies to columns that never appear in WHERE, JOIN, or ORDER BY. I often see attributes indexed „just to be on the safe side“ that never speed up a query. Better: only index where filters are real and occur frequently.

Composite indexes: Order matters

For multi-column indexes, the Sequence Effectiveness. An index (col1, col2) only helps if queries filter col1; pure filters on col2 ignore it. This creates false expectations, even though the plan sounds logical. In addition, it often happens that a single index on A remains next to a composite (A, B) – redundant because the composite covers the single index. I consistently remove such duplications to reduce costs.

Clustered Index and Primary Key: Width, Locality, Cost

InnoDB physically stores data according to the Primary key (Clustered Index). This choice affects several cost factors at once: write locality, fragmentation, and the size of all secondary indexes. This is because each secondary index leaf page contains the primary key as a reference to the row. A wide, text-heavy, or composite primary key is thus multiplied in each index—memory eats up performance. I therefore prefer a narrow, monotonically growing surrogate key (BIGINT) instead of natural, wide keys. This makes secondary indexes more compact, reduces page splits, and improves cache hit rates.

UUID vs. AUTO_INCREMENT: Insert locality under control

Random keys such as classic UUIDv4 distribute insertions across the entire B-tree. This results in frequent page splits, fewer contiguous writes, and higher latency jitter. With high write rates, this quickly becomes a problem. If you need UUIDs, it is better to use sortable by time Variants (e.g., monotonic sequences, UUIDv7/ULID) and stores them compactly as BINARY(16). In many cases, an AUTO_INCREMENT key plus an additional unique business key is the more robust choice: inserts end up at the end, change buffer hits increase, and replication remains stable.

Query Optimizer: why too many options are harmful

Too many indexes increase the search area of the optimizer. Each query must decide whether an index or a full table scan is more cost-effective. In some cases, incorrect statistics can cause the plan to shift to an expensive strategy. I therefore keep the index set small and ensure that statistics are up to date so that cost models are accurate. Less freedom of choice often leads to more stable runtimes.

ORDER BY, LIMIT, and Filesort: Making sorting indexable

Many queries fail due to sorting: ORDER BY + LIMIT seems harmless, but triggers expensive file sorts. I build indexes so that Filter and sorting match: (user_id, created_at DESC) speeds up „Last N events per user“ without an extra sorting step. MySQL 8.0 supports descending indexes – important for predominantly descending timestamps. The better the sorting is covered by the index, the less work is required in the executor.

Functional and prefix indexes: used correctly

Functions on columns render indexes ineffective. In MySQL 8.0, I therefore use functional indexes or generated columnsInstead of WHERE LOWER(email) = ?, I index the normalized form—stable and predictable. For very long VARCHARs, this helps. Prefix indexes (e.g., (hash, title(32))), but only if the prefix length provides sufficient selectivity. I check for collisions in random samples before relying on prefixes.

JOINs, functions, and unused indexes

JOINs require indexes on the Keys on both sides, but too many indexes on the same columns slow down updates dramatically. Functions such as UPPER(col) or CAST on indexed columns disable the index and force scans. I replace such constructs with normalized or additional persistent columns, which I index appropriately. Low-cardinality joins also slow things down because too many rows share the same keys. I check queries with EXPLAIN to see the actual usage.

Partitioning: Pruning yes, overhead no

Partitioning can reduce scans if the Partitioning column matches the most common filters. Each partition has its own indexes – too many small partitions increase administrative overhead and metadata costs. I make sure that partition pruning is effective and that no more partitions than necessary are affected. Periodic partitions that can be deleted in rotation have proven useful for time series; nevertheless, I keep the index landscape lean for each partition.

Locking, deadlocks, and index selection

Under REPEATABLE READ, InnoDB locks Next Key areas. Broad range filters without a matching index increase the locked ranges, increase the likelihood of conflicts, and cause deadlocks. A precise index that exactly matches the WHERE clause shortens the locked ranges and stabilizes transactions. The order of write accesses and the consistency of query plans in competing transactions also play a role—fewer and more appropriate indexes help because they make the search pattern more deterministic.

Fragmentation, maintenance, and hosting tuning

Increase many indexes Maintenance Noticeable: ANALYZE/OPTIMIZE run longer, rebuilds block resources. On shared or multi-tenant hosts, this has a direct impact on CPU and I/O. I deliberately plan maintenance windows and reduce the number of indexes before major actions. Measure first, then act – this is how I prevent maintenance itself from becoming a burden. I describe further tuning ideas in „Optimize MySQL performance“ with a focus on cache and memory-side adjustment screws.

Online DDL and rollout strategies

Index changes in operation need clean deployments. Where possible, I use ALGORITHM=INSTANT/INPLACE to minimize locks; older versions tend to fall back on COPY. Index rebuilds are I/O-intensive and inflate redo/undo traffic – I throttle the action, schedule it outside rush hour, or build the index on a replica first and then switch over. Important: Make schema changes in small steps, monitor latencies, and have a clear rollback path.

Replication and index costs

Each additional index not only increases the cost of the primary server, but also replicasThe SQL thread applies the same writes and pays the same price. Replicas can fall significantly behind during extensive backfills or index builds. I therefore plan index work replica-first, check the lag, and keep buffer capacities (IOPS, CPU) available. If you are running binlog-based backfills, you should pay attention to the order: first change the data, then add indexes – or vice versa, depending on the workload.

Statistics, histograms, and plan stability

The Optimizer stands or falls with Statistics. I update stats regularly (ANALYZE) and use histograms for skewed distributions to make selectivities more realistic—especially on non-indexed but filtered columns. I reduce plan flutter by removing redundant options and deliberately increasing cardinality (e.g., through finer normalization instead of collection fields). The goal is a robust, reproducible cost framework.

Test figures and table: what really happens

Concrete Measured values clearly illustrate the trade-off. A bulk insert with one million rows can be completed in about 10–15 seconds without indexes; with many secondary indexes, it takes almost two minutes. SELECT queries benefit from smart indexes, but quickly reach a plateau where additional indexes no longer provide much benefit. The net effect: read latency decreases only marginally, while write throughput drops sharply. The following table summarizes typical observations.

Scenario SELECT p95 INSERT Throughput index memory Maintenance time/day
Without secondary indexes ~250 ms ~60,000 lines/s ~0 GB ~1–2 min
5 targeted indexes ~15 ms ~25,000 lines/s ~1.5 GB ~6–8 min
12 indexes (over-indexing) ~12 ms ~8,000 lines/s ~5.2 GB ~25–30 min

These figures vary depending on data distribution, hardware, and query profile. Nevertheless, the trend remains stable: more indexes significantly reduce inserts, while the reading gain flattens out. I therefore make data-driven decisions and remove anything that does not have a clear effect. This allows me to keep latency under control and my mind and budget clear.

Targeted use of covering indexes

A Covering Index containing all required columns saves table pages and reduces I/O. Example: SELECT first_name, last_name WHERE customer_id = ? benefits from (customer_id, first_name, last_name). In this case, the index acts like a column-level data cache. At the same time, I remove the single index on customer_id if it has become redundant. Fewer structures, same speed—this reduces maintenance and storage.

Monitoring and configuration: pragmatic steps

I start with EXPLAIN and EXPLAIN ANALYZE (MySQL 8.0+) and monitor slow query logs. SHOW INDEX FROM table_name reveals unused or redundant structures. I then adjust innodb_buffer_pool_size, log file sizes, and flush strategies so that indexes remain in memory. Tools for time series metrics help keep an eye on CPU, IOPS, and latencies. For high loads, this guide is worth checking out: Database optimization under high load.

Briefly summarized

I use indexes deliberately and sparingly because Balance What counts: reading speed, yes, but not at any price. I remove low-cardinality columns, rare filters, and incorrectly sorted composite indexes. Every structure must demonstrate a clear benefit, otherwise it's out. Measurements before and after changes prevent gut decisions and bad investments. If you prioritize database performance and hosting tuning properly, you can avoid MySQL indexing pitfalls and keep latency, throughput, and costs in balance.

Current articles