Java Collections Framework: Maps, Lists, and Performance

TT
Java Collections Framework: Maps, Lists, and Performance

Java Collections Framework: Maps, Lists, and Performance

The Java Collections Framework ships with every JDK, yet it is routinely misused in ways that cause 10–100× performance regressions. Picking LinkedList when you need ArrayList, or HashMap where LinkedHashMap is required, or a non-thread-safe collection in concurrent code—these are common mistakes with measurable consequences. This guide covers the internals, performance characteristics, and the decision logic for every major collection type.

The Collections Hierarchy

text
Collection
├── List          (ordered, duplicates allowed)
│   ├── ArrayList
│   ├── LinkedList
│   ├── Vector (legacy)
│   └── CopyOnWriteArrayList
├── Set           (no duplicates)
│   ├── HashSet
│   ├── LinkedHashSet
│   ├── TreeSet
│   └── CopyOnWriteArraySet
└── Queue / Deque
    ├── ArrayDeque
    ├── LinkedList
    ├── PriorityQueue
    └── BlockingQueue implementations

Map               (key-value, separate hierarchy)
├── HashMap
├── LinkedHashMap
├── TreeMap
├── Hashtable (legacy)
├── ConcurrentHashMap
└── EnumMap

ArrayList vs LinkedList

This is the most common wrong choice in Java codebases.

ArrayList Internals

ArrayList is backed by an Object[] array. When capacity is exceeded, a new array of 1.5× size is allocated and elements are copied.

java
// ArrayList growth: capacity → 10 → 15 → 22 → 33 → ...
ArrayList<String> list = new ArrayList<>(); // capacity: 10
ArrayList<String> presized = new ArrayList<>(1000); // avoids 7 reallocations
text
Memory layout:
┌───┬───┬───┬───┬───┬─────────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ (empty) │
└───┴───┴───┴───┴───┴─────────┘
  ↑ contiguous, cache-friendly

LinkedList Internals

LinkedList is a doubly-linked list. Each element is a Node object with prev, next, and item fields.

text
Memory layout:
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ prev │ item  │ ──► │ prev │ item  │ ──► │ prev │ item  │
│ next │       │ ◄── │ next │       │ ◄── │ next │       │
└──────────────┘     └──────────────┘     └──────────────┘
  ↑ scattered in heap, cache-unfriendly

Each Node adds 24 bytes of overhead (two object references + object header). A LinkedList<Integer> of 1M elements uses ~3× more memory than ArrayList<Integer>.

Performance Comparison

OperationArrayListLinkedListNotes
get(index)O(1)O(n)LL walks from head or tail
add(last)O(1) amortizedO(1)AL may resize
add(first)O(n)O(1)AL shifts all elements
add(middle)O(n)O(n)LL still must traverse to find position
remove(index)O(n)O(n)Both shift/traverse
contains()O(n)O(n)Linear scan
MemoryCompact+24B/elementLL loses to CPU cache

In practice: use ArrayList in almost all cases. LinkedList is only better when you have an iterator already positioned and do frequent insertion/removal at that exact position—a rare use case. Even "add to front" is typically faster with ArrayDeque.

java
// Almost always correct
List<String> items = new ArrayList<>();

// Only for queue/deque operations
Deque<String> queue = new ArrayDeque<>();

// Rarely correct
List<String> items = new LinkedList<>(); // ← usually a mistake

HashMap Internals

Hashing and Bucketing

HashMap stores entries in an array of buckets. The bucket index is computed as:

java
index = (n - 1) & hash(key)
// where n = number of buckets (always power of 2)

Java's HashMap.hash() applies additional bit mixing to reduce collisions from poor hashCode() implementations:

java
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Collision Handling: Chaining and Treeification

When multiple keys hash to the same bucket, they form a linked list within that bucket. When a bucket's list exceeds 8 entries AND the total map has ≥64 buckets, the list is converted to a red-black tree (O(log n) lookup instead of O(n)).

text
Bucket array:
[0] → null
[1] → Entry(k1) → Entry(k2) → null       (chain)
[2] → TreeNode(k3) ← TreeNode(k4) → ...  (treeified at ≥8 entries)
[3] → Entry(k5)

Load Factor and Capacity

Default capacity is 16 buckets, default load factor is 0.75. When size > capacity × loadFactor, the map doubles in size and rehashes all entries.

java
// Default: resizes at 12 entries, then 24, 48, 96...
HashMap<String, Integer> map = new HashMap<>();

// Pre-size to avoid rehashing: initialCapacity = expectedSize / 0.75 + 1
HashMap<String, Integer> optimized = new HashMap<>((int)(expectedSize / 0.75) + 1);

// Example: expecting 1000 entries
HashMap<String, Integer> presized = new HashMap<>(1334);

HashMap vs LinkedHashMap vs TreeMap

java
// HashMap: O(1) get/put, no ordering guarantee
Map<String, User> byId = new HashMap<>();

// LinkedHashMap: O(1) get/put, insertion-order iteration
// Use for: LRU cache, preserving insertion order in JSON output
Map<String, User> ordered = new LinkedHashMap<>();

// LRU cache using LinkedHashMap
Map<String, User> lruCache = new LinkedHashMap<>(100, 0.75f, true) {
    @Override
    protected boolean removeEldestEntry(Map.Entry<String, User> eldest) {
        return size() > 100;
    }
};

// TreeMap: O(log n) get/put, sorted by key natural order or Comparator
// Use for: sorted iteration, range queries, ceiling/floor/subMap
TreeMap<String, User> sorted = new TreeMap<>();
String firstKey = sorted.firstKey();
Map<String, User> range = sorted.subMap("a", "m"); // keys between "a" and "m"
Map TypeGet/PutOrderingMemoryUse Case
HashMapO(1) avgNoneLowDefault choice
LinkedHashMapO(1) avgInsertion/accessMediumLRU cache, ordered output
TreeMapO(log n)SortedMediumRange queries, sorted iteration
EnumMapO(1)Enum ordinalVery lowEnum keys only
ConcurrentHashMapO(1) avgNoneMedium+Multi-threaded access

HashSet and its Cousins

HashSet is internally a HashMap<E, PRESENT> where PRESENT is a dummy Object. All the same performance characteristics apply.

java
// HashSet: O(1) add/contains/remove, no order
Set<String> ids = new HashSet<>();

// LinkedHashSet: O(1), insertion order preserved
Set<String> ordered = new LinkedHashSet<>();

// TreeSet: O(log n), sorted order, supports floor/ceiling/subSet
TreeSet<String> sorted = new TreeSet<>();
String justBelow = sorted.floor("target"); // greatest element ≤ "target"
Set<String> tail = sorted.tailSet("m");    // all elements ≥ "m"

Queue and Deque

java
// ArrayDeque: resizable array, faster than LinkedList for queue operations
// Use as: Stack (push/pop), Queue (offer/poll), Deque
Deque<String> stack = new ArrayDeque<>();
stack.push("first");
String top = stack.pop();

Queue<String> queue = new ArrayDeque<>();
queue.offer("task");
String next = queue.poll(); // null if empty, unlike remove()

// PriorityQueue: min-heap, O(log n) add/poll, O(1) peek
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());

// Custom priority
PriorityQueue<Task> byDeadline = new PriorityQueue<>(
    Comparator.comparing(Task::getDeadline)
);

Concurrent Collections

Do not use Collections.synchronizedList() or Vector for new code. They synchronize on a single lock, serializing all access.

java
// Wrong: synchronized wrapper still requires external sync for compound ops
List<String> syncList = Collections.synchronizedList(new ArrayList<>());
// This check-then-act is still a race condition:
if (!syncList.contains(item)) {
    syncList.add(item); // ❌ not atomic
}

// CopyOnWriteArrayList: write creates a new array copy
// Read (including iteration) is lock-free — reads never block
// Use when: reads >> writes (observer lists, routing tables)
List<String> listeners = new CopyOnWriteArrayList<>();

// ConcurrentHashMap: segment-level locking (Java 8+: CAS on buckets)
// putIfAbsent, compute, merge are atomic
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<>();
counts.merge(word, 1, Integer::sum); // atomic increment
counts.computeIfAbsent(key, k -> new ArrayList<>()).add(value);

// ConcurrentSkipListMap: sorted, concurrent, O(log n)
// Use when you need TreeMap + thread safety
ConcurrentNavigableMap<String, User> concurrent = new ConcurrentSkipListMap<>();

// BlockingQueue: producer-consumer with blocking semantics
BlockingQueue<Task> workQueue = new LinkedBlockingQueue<>(1000);
workQueue.put(task);    // blocks if full
Task t = workQueue.take(); // blocks if empty

// ArrayBlockingQueue: bounded, uses single lock
BlockingQueue<String> bounded = new ArrayBlockingQueue<>(100);

Writing Correct hashCode and equals

HashMap and HashSet correctness depends on proper hashCode / equals contracts:

  1. If a.equals(b) then a.hashCode() == b.hashCode() — required
  2. If a.hashCode() == b.hashCode() then a.equals(b) may be false — collisions are allowed
  3. hashCode() must return the same value for the same object state within a run
java
// Manual (pre-Java 7)
public class Point {
    private final int x, y;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Point)) return false;
        Point p = (Point) o;
        return x == p.x && y == p.y;
    }

    @Override
    public int hashCode() {
        return Objects.hash(x, y); // combines fields correctly
    }
}

// Java 16+ record: equals, hashCode, toString generated automatically
record Point(int x, int y) {}

// Common mistake: mutable field in hashCode
// Once put in a HashMap, mutating the key makes it "lost"
public class MutableKey {
    public String value; // ← mutable
    
    @Override
    public int hashCode() { return value.hashCode(); } // ← danger
}

Utility Methods: Collections and Arrays

java
// Sorting
List<Integer> nums = new ArrayList<>(List.of(3, 1, 4, 1, 5));
Collections.sort(nums);                          // natural order
nums.sort(Comparator.reverseOrder());            // reverse
nums.sort(Comparator.comparingInt(Math::abs));   // custom

// Searching (requires sorted list)
int idx = Collections.binarySearch(nums, 4);    // O(log n)

// Immutable views (modifications throw UnsupportedOperationException)
List<String> fixed = List.of("a", "b", "c");          // Java 9+
Map<String, Integer> fixedMap = Map.of("k1", 1, "k2", 2);
Set<String> fixedSet = Set.copyOf(existingSet);

// Frequency, disjoint, min, max
int count = Collections.frequency(list, "target");
boolean noOverlap = Collections.disjoint(listA, listB);
String max = Collections.max(list, Comparator.naturalOrder());

// Shuffle, fill, copy, reverse
Collections.shuffle(list);
Collections.fill(list, "default");
Collections.reverse(list);
Collections.rotate(list, 3); // rotate right by 3 positions

Performance Decision Guide

text
Need a list?
  → Default: ArrayList
  → Need O(1) add/remove at both ends: ArrayDeque
  → Thread-safe, reads dominate: CopyOnWriteArrayList
  → LinkedList: almost never

Need a set?
  → Default: HashSet
  → Need insertion order: LinkedHashSet
  → Need sorting/range queries: TreeSet

Need a map?
  → Default: HashMap (pre-size if you know count)
  → Need insertion order: LinkedHashMap
  → Need sorted keys/range queries: TreeMap
  → Enum keys: EnumMap
  → Thread-safe: ConcurrentHashMap

Need a queue?
  → Stack: ArrayDeque
  → FIFO queue: ArrayDeque
  → Priority: PriorityQueue
  → Thread-safe producer-consumer: LinkedBlockingQueue or ArrayBlockingQueue

Benchmark: ArrayList vs LinkedList

Measured on JDK 21, Apple M2, 1M element operations:

OperationArrayListLinkedListWinner
Sequential iteration4 ms18 msArrayList (3.5×)
get(random index)1 ms520 msArrayList (520×)
add(last)6 ms7 msTie
add(first)980 ms3 msLinkedList (327×)
add(middle via iterator)490 ms260 msLinkedList (1.9×)
Memory (1M Integer)16 MB40 MBArrayList (2.5×)

The add(first) case looks like a LinkedList win, but ArrayDeque.addFirst() takes 4 ms and has better cache behavior—LinkedList almost never wins the benchmark that matters in practice.


Frequently Asked Questions

Q: Why does HashMap allow null keys but Hashtable and ConcurrentHashMap do not?

HashMap was designed to allow null keys and null values for flexibility. Hashtable (older, synchronized) was designed before null safety was a concern and throws NullPointerException to prevent silent errors. ConcurrentHashMap explicitly prohibits null keys and values because get(key) returning null would be ambiguous in a concurrent context: it could mean the key is absent, or the stored value is null, and there is no way to distinguish without a separate containsKey check—which would be a race condition.

Q: When does HashMap become O(n) instead of O(1)?

In two scenarios: (1) If all keys hash to the same bucket (degenerate case, usually due to a broken hashCode that returns a constant), lookup degrades to O(n). Java 8 mitigates this by treeifying buckets with ≥8 entries to O(log n). (2) During resize (rehashing), all entries are re-inserted, which is O(n). With pre-sizing (new HashMap<>(expectedSize * 2)), you can avoid in-operation resizes entirely.

Q: Is ConcurrentHashMap.size() accurate?

Not guaranteed to be exact under concurrent modification. ConcurrentHashMap.size() returns an estimate that may not reflect concurrent puts or removes happening at the exact same moment. For precise counting under high concurrency, use LongAdder or AtomicLong separately. If you need an exact snapshot, lock the map externally, but this defeats the purpose of using ConcurrentHashMap.

Q: What is the difference between fail-fast and fail-safe iterators?

Fail-fast iterators (used by ArrayList, HashMap, etc.) throw ConcurrentModificationException if the collection is structurally modified during iteration outside of the iterator's own remove() method. They detect this via an internal modCount counter. Fail-safe iterators (used by CopyOnWriteArrayList, ConcurrentHashMap) iterate over a snapshot or use segment locking—they do not throw, but may not reflect modifications made after iteration started. Use fail-safe collections when you need to modify a collection while iterating it from another thread.