[OSS] HashSmith – High-performance open-addressing hash tables for Java (SwissTable / Robin Hood)

Hey everyone 👋

I've been experimenting with high-performance hash table implementations on the JVM and ended up creating HashSmith:

What it is:
- A collection of open-addressing hash tables for Java
- Implementations based on Robin Hood probing and SwissTable-style layouts
- Focused on predictable performance and memory efficiency

Highlights:
- JMH benchmarks comparing HashSmith vs JDK HashMap
- JOL-based memory footprint analysis
- Java 21, with vector-friendly layouts in mind (where applicable)

I'd love feedback on:
- API design (does it feel “Java-ish”?)
- Benchmark methodology (anything obviously missing?)
- Edge cases/workloads you'd like to see tested
- How this could be improved or extended – features, variants, or trade-offs worth exploring next

Thanks!

3 points NovaX

Your hash spreader is too weak due to an incorrect understanding of HashMap. That uses a weak function in order to shift upper to lower bits and rely on red-black tree bins to resolve hash collisions. In your case a collision is much more problematic so the clustering effect could cause problems. You could use a 2 round function from hash-prospector. I don't have a good explanation on your specific case, but a related write-up showed the impact when misused.

Guava's testlib and Apache Commons' collections4 have test suites that others can reuse for their own collections. That provides a pretty good baseline for compliance. You can crib from caffeine cache which has these set up in Gradle.

parent

3 points Charming-Top-8583

Got it, thank you so much for the pointers — this was exactly the blind spot I had. I definitely misunderstood how HashMap’s hash spreader works and why it can get away with weaker mixing. In my case collisions hurt a lot more, so your note about clustering really clicked.

I’m going to swap in a stronger hash function and hook up the Guava / Commons4 collection tests like you suggested.

Really appreciate you taking the time to spell this out.

parent root

1 points Charming-Top-8583

You were right.
When I tried to reproduce this issue, I discovered that because of my naive smearing, running the code was taking an extremely long time even when running the "insert part". After changing the hash smearing to use the same approach as Guava, the problem disappeared.

It’s all thanks to you
thank you!

parent root
2 points sysKin
Every time I see a naive implementation of putAll I am reminded of this bug. Basically, iterating over a hash map gives you keys in a hash-related order, and in the middle of inserting them to your own map you can end up with very unusual hash distribution, one that violates all the expectations.

I am not qualified to say if your implementation suffers from this but a resize to a new expected size before putAlling might be just a good idea to avoid multiple resizings anyway.

Do you think an optimisation to combine keys and values arrays into a single array, in which keys go to the left and values to the right, might help a tiny bit with the number of objects being allocated? Or just not worth it.

parent

1 points Charming-Top-8583

Thanks a lot for pointing this out
I actually wasn’t aware of that kind of issue with putAll and hash map iteration order.

I’ll definitely read up on the bug you mentioned and look into adjusting my implementation (including resizing up front).

Really appreciate you taking the time to explain it!

parent root

1 points Charming-Top-8583

I think you’re talking about something like this layout, right?
[k, k, k, ... empty ..., v, v, v, ... empty]

or maybe
[k, k, k, ... empty ..., v, v, v]

My first thought is that it would indeed save one array allocation (and one object header) per map instance, so in workloads where you have lots of very small maps that could be a real, if small, win in terms of memory/allocations.

For iteration over just keys or just values, locality should still be pretty good because each side is contiguous. So I’d expect it to behave similarly to having separate keys[] / values[] arrays in those cases.

I’ve also seen implementations that store key and value together per slot like [[k, v], [k, v], ...], which is a different kind of layout again. I’m not entirely sure how your “keys on the left / values on the right” idea would compare in practice without benchmarking it, but it’s an interesting approach. It’s probably something I’d like to experiment with.

parent root
1 points Charming-Top-8583
I tried running a test similar to the one in the link you shared.

void insertAndReinsertTimings(MapSpec spec) { int n = 5_000_000; long insertStart = System.nanoTime(); var one = newMap(spec, n); for (int i = 1; i <= n; i++) { one.put(i, i); } long insertEnd = System.nanoTime(); long reinsertStart = System.nanoTime(); var two = newMap(spec, n/2); for (var e : one.entrySet()) { two.put(e.getKey(), e.getValue()); } long reinsertEnd = System.nanoTime(); double insertMs = (insertEnd - insertStart) / 1_000_000.0; double reinsertMs = (reinsertEnd - reinsertStart) / 1_000_000.0; System.out.printf( "%s insert %,d entries: %.2f ms, reinsert: %.2f ms%n", spec.name(), n, insertMs, reinsertMs ); }

I was able to reproduce the bug you mentioned!

HashMap insert 3,000,000 entries: 138.06 ms, reinsert: 177.46 ms (load factor = 0.9)
SwissMap insert 3,000,000 entries: 677.10 ms, reinsert: 253.16 ms (load factor = 0.5)
SwissMap insert 3,000,000 entries: 426.91 ms, reinsert: 1189.93 ms (load factor = 0.75)
SwissMap insert 3,000,000 entries: 418.98 ms, reinsert: 45081.89 ms (load factor = 0.875)
RobinHoodMap insert 3,000,000 entries: 623.41 ms, reinsert: 309.41 ms (load factor = 0.5)
RobinHoodMap insert 3,000,000 entries: 515.65 ms, reinsert: 62736.08 ms (load factor = 0.75)
RobinHoodMap insert 3,000,000 entries: 418.70 ms, reinsert: 493540.47 ms (load factor = 0.875)

parent root

1 points Charming-Top-8583

I worked around this issue by randomizing iteration order without allocating per-iterator buffers.

Since my tables use power-of-two capacity, I pick a random start and a random odd step, then iterate slots as (start + i * step) & (capacity - 1). This produces a full-cycle permutation over all slots. Every slot is visited exactly once in a scrambled order.

Example (capacity = 8, step = 3): indices visited are 0, 3, 6, 1, 4, 7, 2, 5 (a permutation of 0..7). This works for both my RobinHoodMap and SwissMap because their capacities are powers of two.

private void advance() { next = -1; while (iter < capacity) { // & mask == mod capacity; iter grows, step scrambles the visit order without extra buffers. int idx = (start + (iter++ * step)) & mask; if (isFull(ctrl[idx])) { next = idx; return; } } }

thank you so much for your feedback!

parent root