SQL Query Optimization: EXPLAIN Plans, Indexes, and Common Pitfalls

Why SQL Query Optimization Matters

Microsoft’s SQL Server 2022 introduced improvements in the cardinality estimator and parameter-sensitive plan (PSP) optimization, reflecting a growing industry focus on smarter query planning (RedmondMag). However, regardless of database advancements, inefficient SQL can severely impact application performance at scale. In production systems, queries that are not properly optimized may lead to latency spikes, excessive CPU usage, and even system downtime—especially as tables grow from thousands to millions of rows.

Real-world impact: Bad execution plans (for example, full table scans on large tables) can slow queries by 10–100x, directly affecting user experience and system resources.
What most devs miss: Developers often overlook issues like indexes not being used, incorrect join order, and misestimated row counts—even when queries perform adequately on smaller, test datasets.
EXPLAIN plans reveal what’s really happening: They show which steps in a query plan are slow, where row count estimates are incorrect, and whether indexes are being used as intended.

EXPLAIN Plans in Real Life

The EXPLAIN (or EXPLAIN ANALYZE) command in SQL is a tool used to display the execution plan for a query. This plan shows the detailed steps the database engine will perform to execute the query, including the order of operations, access methods, estimated costs, and—if using ANALYZE—actual runtime statistics.

-- PostgreSQL example: Analyze how a query will execute
EXPLAIN ANALYZE
SELECT customer_id, order_total
FROM orders
WHERE order_date > '2025-01-01';

-- Output (abridged):
-- Seq Scan on orders  (cost=0.00..431.00 rows=1000 width=16)
--                    (actual time=0.020..10.500 rows=1200 loops=1)
--   Filter: (order_date > '2025-01-01'::date)
--   Rows Removed by Filter: 8000
-- Planning Time: 0.1 ms
-- Execution Time: 10.6 ms

Key terms explained:

Seq Scan: Short for “sequential scan,” meaning the database is reading every row in the table to find matches. This is efficient only for small tables or queries returning most rows, but slow for large tables with selective filters.
Rows Removed by Filter: Number of rows that did not match the filter condition. A high value suggests many rows were read unnecessarily, indicating a potential for optimization.
Actual vs. Estimated Rows: The plan shows both how many rows the planner expected to find and how many were actually processed. Significant differences suggest that table statistics are outdated, which can result in poor execution plans. Running ANALYZE updates these statistics.

For example, if you see a Seq Scan on a table with millions of rows and only a few are needed, it’s a sign that an index could help. Conversely, if EXPLAIN shows an index scan but the estimated and actual row counts are very different, updating statistics may be necessary.

During development, use EXPLAIN to check if your intended indexes are being considered. Before production rollout, use EXPLAIN ANALYZE on staging data to verify actual performance. For a deeper look, see our in-depth guide.

Database index structure diagram visual — Designing the right indexes is the foundation of SQL performance.

Indexing Strategies and Performance

Indexes are data structures that allow the database to find rows much faster than scanning the entire table. Proper indexing is the most direct way to speed up queries, but indiscriminate indexing can hurt performance. Each additional index increases the time and resources required for data modifications (INSERT, UPDATE, DELETE) and consumes extra storage.

The best practice is to index columns that are:

High-cardinality: Columns with many unique values (such as user IDs, timestamps, or order numbers). High-cardinality indexes are selective, making searches efficient.
Frequently queried: Columns often used in WHERE, JOIN, or ORDER BY clauses. Indexes speed up searches, joins, and sorting on these fields.

For example, if your application often retrieves orders by date, creating an index on order_date will make those queries much faster.

-- Create a useful index for order_date queries
CREATE INDEX idx_orders_order_date ON orders(order_date);
ANALYZE orders; -- Always analyze after major data changes or index creation

-- After indexing, the same query uses an Index Scan:
EXPLAIN ANALYZE
SELECT customer_id, order_total
FROM orders
WHERE order_date > '2025-01-01';

-- Output (abridged):
-- Index Scan using idx_orders_order_date on orders
--  (cost=0.42..37.08 rows=900 width=16)
--  (actual time=0.05..6.10 rows=900 loops=1)
-- Execution Time: 6.2 ms

With the index in place, the database can jump directly to the relevant rows, drastically reducing query execution time. Notice the shift from a sequential scan to an index scan and the drop in execution time.

Indexing tips:

Avoid indexing low-cardinality columns: Columns with few unique values (e.g., a status field with only a handful of possible values) do not benefit much from indexing and add unnecessary write overhead.
Composite indexes: These are indexes on multiple columns. They are only effective if your WHERE clause filters on the leftmost column(s) of the index.
Remove redundant indexes: Overlapping or unused indexes slow down write operations without improving read performance. Regularly audit your schema to drop such indexes.
Foreign key columns: Most databases do not automatically create indexes on foreign key columns. If you often join or filter on these columns, adding indexes can significantly speed up those operations.

-- Example: Only useful if WHERE starts with customer_id
CREATE INDEX idx_comp ON orders (customer_id, order_date);

-- Good: WHERE customer_id = ? AND order_date = ?
-- Bad: WHERE order_date = ? -- Index won't be used efficiently

For example, if you create a composite index on (customer_id, order_date) but write queries that only filter by order_date, the index will not be used efficiently. To benefit from a composite index, always ensure your query’s WHERE clause includes the leftmost column.

With indexing best practices established, let’s examine the most common mistakes and troubleshooting steps.

Common Pitfalls and Troubleshooting

Despite experience, developers and DBAs frequently encounter recurring mistakes that degrade SQL performance. Awareness of these pitfalls helps you avoid wasted time during troubleshooting (see our troubleshooting guide):

Using SELECT *: Fetching all columns in a query retrieves unnecessary data, increases I/O, and can break client code if columns change. Always list only required columns.

Example:
```
-- BAD: retrieves all columns, even unused ones
SELECT * FROM orders;

-- GOOD: fetches only needed columns
SELECT customer_id, order_total FROM orders;
```
Functions in WHERE clauses: Applying a function (such as YEAR() or LOWER()) to an indexed column disables index usage, forcing a full table scan.
```
-- BAD: Prevents index usage
SELECT * FROM employees WHERE YEAR(joining_date) = 2022;

-- GOOD: Use a range filter
SELECT * FROM employees
WHERE joining_date >= '2022-01-01' AND joining_date < '2023-01-01';
```
In this example, the first query cannot use an index on joining_date, but the second query can.
Wildcard at start of LIKE pattern: Placing a wildcard (%) at the beginning of a LIKE pattern disables index usage, resulting in a full scan.
```
-- BAD: disables index, triggers table scan
SELECT * FROM users WHERE name LIKE '%john';

-- GOOD: allows index use
SELECT * FROM users WHERE name LIKE 'john%';
```
Use wildcards only at the end of the pattern to allow index usage.
Missing or outdated statistics: Table statistics inform the query planner about data distribution. If statistics are stale (e.g., after bulk loads), the planner may choose inefficient plans. Running ANALYZE updates these statistics.
Over-indexing: Too many indexes slow down data modifications. Regularly audit and remove indexes that are no longer needed, especially after schema or query changes.
IN vs. EXISTS: For large subqueries, EXISTS is often faster because it returns as soon as a match is found, unlike IN, which may build a full result set. See the Stack Overflow discussion.
```
-- Faster for large subqueries
SELECT name FROM customers WHERE EXISTS (
  SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id
);
```
This approach is especially efficient for correlation subqueries with large tables.
Join order surprises: The query planner chooses join algorithms (nested loop, hash join, merge join) based on estimated row counts and indexes. Nested loops are fast for small result sets but can be slow for large joins. Ensuring that join keys are indexed helps the planner choose the most efficient algorithm.

Example:
- If joining two large tables, verify both tables’ join columns are indexed.
- Check the EXPLAIN plan for the join type to identify performance issues.

Now that we’ve covered the most frequent pitfalls, let’s compare index scans and sequential scans in practice.

Comparison Table: Index Scan vs. Sequential Scan

Scenario	Estimated Rows	Actual Rows	Execution Time (ms)	Scan Type	Source
Without Index	800	900	121	Sequential Scan	SesameDisk
With Index	900	900	6	Index Scan	SesameDisk

This table illustrates the dramatic difference in performance between a sequential scan and an index scan. The indexed query not only matches the expected row count but also completes in a fraction of the time. Keeping indexes optimized and statistics current can reduce query execution times by 10–20x for common scenarios.

With these comparisons in mind, let’s summarize the most important lessons for day-to-day query optimization.

Key Takeaways

Key Takeaways:

EXPLAIN ANALYZE is indispensable for real-world SQL performance tuning—always measure, don’t guess. Use it routinely, not just for troubleshooting.

Compare estimated vs. actual rows in your plan output. Large mismatches indicate stale statistics or missing indexes and should be addressed promptly.

Indexes on filter and join columns are the most powerful optimization—benchmark queries with and without indexes before making schema changes.

Update statistics with ANALYZE after big data loads, deletes, or index creation to keep the query planner accurate.

Avoid functions on indexed columns or use expression indexes if your database supports them to maintain index efficiency.

Audit and prune indexes regularly to prevent storage bloat and slow write operations.

For advanced workloads (such as large filtered Top K queries or full-text search), explore partial or composite indexes, and where appropriate, consider specialized engines like ParadeDB (ParadeDB Engineering Blog).

Let’s wrap up with resources for further study and practical application.

SQL Query Optimization: EXPLAIN Plans, Indexes, and Common Pitfalls

SQL Query Optimization: EXPLAIN Plans, Indexes, and Common Pitfalls

Why SQL Query Optimization Matters

EXPLAIN Plans in Real Life

Indexing Strategies and Performance

Common Pitfalls and Troubleshooting

Comparison Table: Index Scan vs. Sequential Scan

Key Takeaways

Further Reading

Thomas A. Anderson