Categories
Software Development

Database Indexing: B-Tree, Hash, and Composite Index Strategies

Slow database queries kill application performance and frustrate users. If you don’t understand how different indexing strategies work—especially B-Tree, Hash, and Composite indexes—you’ll waste time tuning queries that never get fast. Here’s a hands-on, production-focused guide to choosing and using the right database indexes for real workloads.

Key Takeaways:

  • How B-Tree, Hash, and Composite indexes work with real-world SQL examples
  • When to use each index type for maximum query speed and scalability
  • How composite indexes can make or break multi-column queries
  • Common mistakes in index selection and how to avoid them in production
  • Performance trade-offs, maintenance overhead, and edge cases for each strategy

Why Indexes Matter: The Real Impact on Performance

Indexes are the backbone of fast data retrieval in any relational database. Without proper indexing, even the most powerful hardware will struggle as table sizes grow. Here’s a concrete demonstration with PostgreSQL:

-- Assume a table with millions of customer records
CREATE TABLE customer (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE,
    last_login TIMESTAMP,
    status VARCHAR(50)
);

-- No index on last_login
EXPLAIN ANALYZE SELECT * FROM customer WHERE last_login > NOW() - INTERVAL '7 days';
-- Expect: Seq Scan (full table scan), slow on large tables

-- Add index
CREATE INDEX idx_last_login ON customer (last_login);

EXPLAIN ANALYZE SELECT * FROM customer WHERE last_login > NOW() - INTERVAL '7 days';
-- Now: Index Scan, significantly faster

This difference can mean seconds vs milliseconds for users. According to dasroot.net, effective index selection is crucial for maintaining performance, especially as concurrency and data volume increase in 2026-era systems.

Indexes work like the index in a book—enabling the database to jump directly to relevant rows rather than scanning every entry. But not all indexes are created equal. The right choice depends on your query patterns, data distribution, and update frequency.

B-Tree Indexes: Fundamentals and Advanced Usage

B-Tree (Balanced Tree) indexes are the default for most relational databases (PostgreSQL, MySQL, Oracle, SQL Server). They efficiently handle a wide range of queries, including:

  • Exact match (WHERE email = '[email protected]')
  • Range queries (WHERE last_login >= '2024-01-01')
  • Sorting (ORDER BY last_login DESC)

How B-Tree Indexes Work

-- Creating a B-Tree index (default in most systems)
CREATE INDEX idx_status ON customer (status);

-- Using the index in a query
EXPLAIN ANALYZE SELECT * FROM customer WHERE status = 'active';
-- Output: Index Scan using idx_status (fast)

B-Trees maintain a balanced hierarchy where each node has multiple children. This structure makes lookups, inserts, and deletes all O(log n) operations. They’re optimized for both equality and range searches. That’s why they’re the workhorse of modern database indexing (source).

Advanced Usage

  • Partial Indexes: Only index a subset of rows to save space.
  • Covering Indexes: Include all columns needed for a query to avoid touching the table at all.
  • Descending/Ascending Order: Optimize for specific sort orders.
-- Partial index for active users only
CREATE INDEX idx_active_lastlogin ON customer (last_login)
  WHERE status = 'active';

-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_covering ON customer (status, last_login) INCLUDE (email);

Why it matters: B-Tree indexes are highly flexible, but they require maintenance on every write. For most business applications, this trade-off is well worth it—B-Tree indexes are the safest default if you’re unsure (reference).

Hash Indexes: Strengths and Limitations

Hash indexes take a radically different approach. Instead of a tree, they use a hash table to map keys directly to values. This makes them extremely fast for exact match lookups—but useless for range queries, ordering, or partial matches.

-- MySQL: Hash index on a MEMORY table
CREATE TABLE temp_session (
    session_id CHAR(64) PRIMARY KEY,
    user_id INT,
    expires_at TIMESTAMP
) ENGINE=MEMORY;

CREATE INDEX idx_session_id USING HASH ON temp_session (session_id);

-- Fast exact lookup, but can't do WHERE session_id > 'abc'
SELECT * FROM temp_session WHERE session_id = 'b7f4c2...';

In PostgreSQL, hash indexes exist but are rarely used because B-Trees are almost always as fast for equality and much more flexible (source).

Strengths

  • Lightning-fast exact lookups: SELECT ... WHERE key = value
  • Very low CPU overhead for simple, static tables

Limitations

  • No support for range queries or sorting
  • Poor performance if there are many hash collisions
  • Not crash-safe in some RDBMS (e.g., older PostgreSQL versions)
  • Rarely used in production outside of special cases (caching, ephemeral tables)

Hash indexes shine in key-value workloads but are a niche option for general-purpose OLTP databases. They’re best avoided for most business data unless you have a very specific exact-match-only workload.

Composite Index Strategies: Multi-Column Optimization

Composite indexes (a.k.a. multi-column indexes) allow you to index more than one column together. This is essential for queries filtering or sorting on multiple fields. But the order of columns in the index is critical. Get it wrong, and your index is ignored.

-- Composite index on (status, last_login)
CREATE INDEX idx_status_lastlogin ON customer (status, last_login);

-- Efficient for queries with both columns:
SELECT * FROM customer
WHERE status = 'active'
  AND last_login > NOW() - INTERVAL '14 days'
ORDER BY last_login DESC;

-- Index is not used if you filter by last_login only:
SELECT * FROM customer WHERE last_login > NOW() - INTERVAL '14 days';
-- Falls back to seq scan, slower

How Composite Indexes Work

  • The index is ordered by status first, then last_login within each status group.
  • Queries filtering by the first column (status) get full index benefits.
  • If you filter by both columns, the index is fully utilized.
  • If you filter by last_login only, the index is generally not used.

Covering vs. Non-Covering Composite Indexes

-- Covering index (MySQL syntax)
CREATE INDEX idx_full ON customer (status, last_login, email);

-- Query uses only the index (no table read)
SELECT email FROM customer
WHERE status = 'active' AND last_login > NOW() - INTERVAL '30 days';

Composite indexes are crucial for optimizing multi-column filters and ORDER BY clauses. However, adding too many columns increases index size and write overhead. Focus on your most common query patterns—not every possible combination.

For more on multi-dimensional data access patterns, see our Python vs Go performance comparison—especially the sections on data handling and algorithmic efficiency.

Comparison Table: B-Tree, Hash, and Composite Indexes

Index TypeBest Use CaseSupports Range Queries?Supports Multi-Column?Performance ImpactCommon RDBMS Support
B-TreeGeneral-purpose, equality and range lookupsYesYes (Composite)Fast for most queries, small write penaltyMySQL, PostgreSQL, Oracle, SQL Server
HashExact match, key-value, cachingNoNoUltra-fast for simple equality, but limitedMySQL (MEMORY), PostgreSQL (rare), MongoDB
CompositeMulti-column filters, covering queriesYes (if leading columns are used)YesGreat for complex filters, but larger/more writesAll major RDBMS

For more details on specific index types and their use cases, see GeeksforGeeks: Difference Between Indexing Techniques.

Common Pitfalls and Pro Tips

1. Over-Indexing Kills Write Performance

  • Every index must be updated on INSERT/UPDATE/DELETE.
  • Too many indexes can slow down writes and increase storage usage drastically.
  • Audit your indexes regularly and remove unused ones (check with pg_stat_user_indexes in PostgreSQL or SHOW INDEX in MySQL).

2. Wrong Composite Index Order

  • If your queries filter by last_login but your index is (status, last_login), the index is ignored. Always order columns in your composite index to match your most frequent WHERE clauses.

3. Ignoring Maintenance Overhead

  • Indexes need to be REINDEXed or rebuilt periodically on very active tables to avoid bloat, especially with heavy updates and deletes.
  • Keep an eye on index size and usage statistics.

4. Hash Indexes: Use With Caution

  • Hash indexes are not crash-safe in older PostgreSQL versions and rarely outperform B-Trees in practice for most workloads.
  • Use only for specific caching or in-memory workloads where range queries are irrelevant.

5. Don’t Trust the Query Planner Blindly

  • Query planners can make suboptimal choices. Always EXPLAIN your queries and check which indexes are being used.
  • Real-world performance beats theoretical best practices.

For more distributed system and data access patterns, see gRPC vs REST vs Message Queues for how backend communication affects database load and indexing choices.

Conclusion & Next Steps

If you care about database performance, understanding and applying the right indexing strategy—B-Tree, Hash, or Composite—should be part of your workflow. Start by reviewing your slow queries and matching them with the appropriate index type. Test in a staging environment with production-scale data.

For deeper dives, see the 2026 database indexing strategies survey, and keep exploring how indexing decisions intersect with language and architecture choices in posts like Python vs Go: Performance, Syntax, and Use Cases Compared.