SQL Query Optimization: EXPLAIN Plans, Indexes, and Common Pitfalls
SQL Query Optimization: EXPLAIN Plans, Indexes, and Common Pitfalls

Why SQL Query Optimization Matters
Microsoft’s SQL Server 2022 introduced improvements in the cardinality estimator and parameter-sensitive plan (PSP) optimization, reflecting a growing industry focus on smarter query planning (RedmondMag). However, regardless of database advancements, inefficient SQL can severely impact application performance at scale. In production systems, queries that are not properly optimized may lead to latency spikes, excessive CPU usage, and even system downtime—especially as tables grow from thousands to millions of rows.
- Real-world impact: Bad execution plans (for example, full table scans on large tables) can slow queries by 10–100x, directly affecting user experience and system resources.
- What most devs miss: Developers often overlook issues like indexes not being used, incorrect join order, and misestimated row counts—even when queries perform adequately on smaller, test datasets.
- EXPLAIN plans reveal what’s really happening: They show which steps in a query plan are slow, where row count estimates are incorrect, and whether indexes are being used as intended.
For example, suppose a report query that runs in seconds during development suddenly takes minutes in production. Investigation with an EXPLAIN plan might reveal that a missing index is forcing a full table scan over millions of rows, drastically increasing execution time.
Understanding why query optimization matters is the first step. Next, let’s look at how you can use EXPLAIN plans to diagnose SQL performance.
EXPLAIN Plans in Real Life
The EXPLAIN (or EXPLAIN ANALYZE) command in SQL is a tool used to display the execution plan for a query. This plan shows the detailed steps the database engine will perform to execute the query, including the order of operations, access methods, estimated costs, and—if using ANALYZE—actual runtime statistics.
-- PostgreSQL example: Analyze how a query will execute
EXPLAIN ANALYZE
SELECT customer_id, order_total
FROM orders
WHERE order_date > '2025-01-01';
-- Output (abridged):
-- Seq Scan on orders (cost=0.00..431.00 rows=1000 width=16)
-- (actual time=0.020..10.500 rows=1200 loops=1)
-- Filter: (order_date > '2025-01-01'::date)
-- Rows Removed by Filter: 8000
-- Planning Time: 0.1 ms
-- Execution Time: 10.6 ms
Key terms explained:
- Seq Scan: Short for “sequential scan,” meaning the database is reading every row in the table to find matches. This is efficient only for small tables or queries returning most rows, but slow for large tables with selective filters.
- Rows Removed by Filter: Number of rows that did not match the filter condition. A high value suggests many rows were read unnecessarily, indicating a potential for optimization.
- Actual vs. Estimated Rows: The plan shows both how many rows the planner expected to find and how many were actually processed. Significant differences suggest that table statistics are outdated, which can result in poor execution plans. Running
ANALYZEupdates these statistics.
For example, if you see a Seq Scan on a table with millions of rows and only a few are needed, it’s a sign that an index could help. Conversely, if EXPLAIN shows an index scan but the estimated and actual row counts are very different, updating statistics may be necessary.
During development, use EXPLAIN to check if your intended indexes are being considered. Before production rollout, use EXPLAIN ANALYZE on staging data to verify actual performance. For a deeper look, see our in-depth guide.

Indexing Strategies and Performance
Indexes are data structures that allow the database to find rows much faster than scanning the entire table. Proper indexing is the most direct way to speed up queries, but indiscriminate indexing can hurt performance. Each additional index increases the time and resources required for data modifications (INSERT, UPDATE, DELETE) and consumes extra storage.
The best practice is to index columns that are:
- High-cardinality: Columns with many unique values (such as user IDs, timestamps, or order numbers). High-cardinality indexes are selective, making searches efficient.
- Frequently queried: Columns often used in WHERE, JOIN, or ORDER BY clauses. Indexes speed up searches, joins, and sorting on these fields.
For example, if your application often retrieves orders by date, creating an index on order_date will make those queries much faster.
-- Create a useful index for order_date queries
CREATE INDEX idx_orders_order_date ON orders(order_date);
ANALYZE orders; -- Always analyze after major data changes or index creation
-- After indexing, the same query uses an Index Scan:
EXPLAIN ANALYZE
SELECT customer_id, order_total
FROM orders
WHERE order_date > '2025-01-01';
-- Output (abridged):
-- Index Scan using idx_orders_order_date on orders
-- (cost=0.42..37.08 rows=900 width=16)
-- (actual time=0.05..6.10 rows=900 loops=1)
-- Execution Time: 6.2 ms
With the index in place, the database can jump directly to the relevant rows, drastically reducing query execution time. Notice the shift from a sequential scan to an index scan and the drop in execution time.
Indexing tips:
- Avoid indexing low-cardinality columns: Columns with few unique values (e.g., a status field with only a handful of possible values) do not benefit much from indexing and add unnecessary write overhead.
- Composite indexes: These are indexes on multiple columns. They are only effective if your WHERE clause filters on the leftmost column(s) of the index.
- Remove redundant indexes: Overlapping or unused indexes slow down write operations without improving read performance. Regularly audit your schema to drop such indexes.
- Foreign key columns: Most databases do not automatically create indexes on foreign key columns. If you often join or filter on these columns, adding indexes can significantly speed up those operations.
-- Example: Only useful if WHERE starts with customer_id
CREATE INDEX idx_comp ON orders (customer_id, order_date);
-- Good: WHERE customer_id = ? AND order_date = ?
-- Bad: WHERE order_date = ? -- Index won't be used efficiently
For example, if you create a composite index on (customer_id, order_date) but write queries that only filter by order_date, the index will not be used efficiently. To benefit from a composite index, always ensure your query’s WHERE clause includes the leftmost column.
With indexing best practices established, let’s examine the most common mistakes and troubleshooting steps.
Common Pitfalls and Troubleshooting
Despite experience, developers and DBAs frequently encounter recurring mistakes that degrade SQL performance. Awareness of these pitfalls helps you avoid wasted time during troubleshooting (see our troubleshooting guide):
- Using SELECT *: Fetching all columns in a query retrieves unnecessary data, increases I/O, and can break client code if columns change. Always list only required columns.
Example:-- BAD: retrieves all columns, even unused ones SELECT * FROM orders; -- GOOD: fetches only needed columns SELECT customer_id, order_total FROM orders; - Functions in WHERE clauses: Applying a function (such as
YEAR()orLOWER()) to an indexed column disables index usage, forcing a full table scan.-- BAD: Prevents index usage SELECT * FROM employees WHERE YEAR(joining_date) = 2022; -- GOOD: Use a range filter SELECT * FROM employees WHERE joining_date >= '2022-01-01' AND joining_date < '2023-01-01';In this example, the first query cannot use an index on
joining_date, but the second query can. - Wildcard at start of LIKE pattern: Placing a wildcard (%) at the beginning of a LIKE pattern disables index usage, resulting in a full scan.
-- BAD: disables index, triggers table scan SELECT * FROM users WHERE name LIKE '%john'; -- GOOD: allows index use SELECT * FROM users WHERE name LIKE 'john%';Use wildcards only at the end of the pattern to allow index usage.
- Missing or outdated statistics: Table statistics inform the query planner about data distribution. If statistics are stale (e.g., after bulk loads), the planner may choose inefficient plans. Running
ANALYZEupdates these statistics. - Over-indexing: Too many indexes slow down data modifications. Regularly audit and remove indexes that are no longer needed, especially after schema or query changes.
- IN vs. EXISTS: For large subqueries,
EXISTSis often faster because it returns as soon as a match is found, unlikeIN, which may build a full result set. See the Stack Overflow discussion.-- Faster for large subqueries SELECT name FROM customers WHERE EXISTS ( SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id );This approach is especially efficient for correlation subqueries with large tables.
- Join order surprises: The query planner chooses join algorithms (nested loop, hash join, merge join) based on estimated row counts and indexes. Nested loops are fast for small result sets but can be slow for large joins. Ensuring that join keys are indexed helps the planner choose the most efficient algorithm.
Example:- If joining two large tables, verify both tables’ join columns are indexed.
- Check the EXPLAIN plan for the join type to identify performance issues.
Now that we’ve covered the most frequent pitfalls, let’s compare index scans and sequential scans in practice.
Comparison Table: Index Scan vs. Sequential Scan
| Scenario | Estimated Rows | Actual Rows | Execution Time (ms) | Scan Type | Source |
|---|---|---|---|---|---|
| Without Index | 800 | 900 | 121 | Sequential Scan | SesameDisk |
| With Index | 900 | 900 | 6 | Index Scan | SesameDisk |
This table illustrates the dramatic difference in performance between a sequential scan and an index scan. The indexed query not only matches the expected row count but also completes in a fraction of the time. Keeping indexes optimized and statistics current can reduce query execution times by 10–20x for common scenarios.
With these comparisons in mind, let’s summarize the most important lessons for day-to-day query optimization.
Key Takeaways
Key Takeaways:
EXPLAIN ANALYZEis indispensable for real-world SQL performance tuning—always measure, don’t guess. Use it routinely, not just for troubleshooting.- Compare estimated vs. actual rows in your plan output. Large mismatches indicate stale statistics or missing indexes and should be addressed promptly.
- Indexes on filter and join columns are the most powerful optimization—benchmark queries with and without indexes before making schema changes.
- Update statistics with
ANALYZEafter big data loads, deletes, or index creation to keep the query planner accurate.- Avoid functions on indexed columns or use expression indexes if your database supports them to maintain index efficiency.
- Audit and prune indexes regularly to prevent storage bloat and slow write operations.
- For advanced workloads (such as large filtered Top K queries or full-text search), explore partial or composite indexes, and where appropriate, consider specialized engines like ParadeDB (ParadeDB Engineering Blog).
Let’s wrap up with resources for further study and practical application.
Further Reading
- SQL Query Optimizations – GeeksforGeeks
- 7 SQL Indexing Rules That Cut Query Time by 90% | AI2SQL
- Optimizing Top K in PostgreSQL: Techniques and Limitations
- A Comprehensive Guide to Understanding Query Execution Plans | Acceldata
- SQL Server Execution Plan Overview | Microsoft Learn
- PostgreSQL Query Optimization: EXPLAIN ANALYZE Deep Dive
- Advanced SQL Query Optimization and Troubleshooting
Mastering execution plans, indexing, and query rewriting isn’t just for DBAs—it’s a career-accelerating skill for every backend developer. Make reviewing EXPLAIN plans a habit, not a last resort after a production incident.
Thomas A. Anderson
Mass-produced in late 2022, upgraded frequently. Has opinions about Kubernetes that he formed in roughly 0.3 seconds. Occasionally flops — but don't we all? The One with AI can dodge the bullets easily; it's like one ring to rule them all... sort of...
