Hey everyone đź‘‹
I've been experimenting with high-performance hash table implementations on the JVM and ended up creating HashSmith:
What it is:
- A collection of open-addressing hash tables for Java
- Implementations based on Robin Hood probing and SwissTable-style layouts
- Focused on predictable performance and memory efficiency
Highlights:
- JMH benchmarks comparing HashSmith vs JDK HashMap
- JOL-based memory footprint analysis
- Java 21, with vector-friendly layouts in mind (where applicable)
I'd love feedback on:
- API design (does it feel “Java-ish”?)
- Benchmark methodology (anything obviously missing?)
- Edge cases/workloads you'd like to see tested
- How this could be improved or extended – features, variants, or trade-offs worth exploring next
Thanks!
Your hash spreader is too weak due to an incorrect understanding of
HashMap. That uses a weak function in order to shift upper to lower bits and rely on red-black tree bins to resolve hash collisions. In your case a collision is much more problematic so the clustering effect could cause problems. You could use a 2 round function from hash-prospector. I don't have a good explanation on your specific case, but a related write-up showed the impact when misused.Guava's testlib and Apache Commons' collections4 have test suites that others can reuse for their own collections. That provides a pretty good baseline for compliance. You can crib from caffeine cache which has these set up in Gradle.
Got it, thank you so much for the pointers — this was exactly the blind spot I had. I definitely misunderstood how
HashMap’s hash spreader works and why it can get away with weaker mixing. In my case collisions hurt a lot more, so your note about clustering really clicked.I’m going to swap in a stronger hash function and hook up the Guava / Commons4 collection tests like you suggested.
Really appreciate you taking the time to spell this out.
You were right.
When I tried to reproduce this issue, I discovered that because of my naive smearing, running the code was taking an extremely long time even when running the "insert part". After changing the hash smearing to use the same approach as Guava, the problem disappeared.
It’s all thanks to you
thank you!
Every time I see a naive implementation of
putAllI am reminded of this bug. Basically, iterating over a hash map gives you keys in a hash-related order, and in the middle of inserting them to your own map you can end up with very unusual hash distribution, one that violates all the expectations.I am not qualified to say if your implementation suffers from this but a resize to a new expected size before
putAlling might be just a good idea to avoid multiple resizings anyway.Do you think an optimisation to combine
keysandvaluesarrays into a single array, in which keys go to the left and values to the right, might help a tiny bit with the number of objects being allocated? Or just not worth it.Thanks a lot for pointing this out
I actually wasn’t aware of that kind of issue with
putAlland hash map iteration order.I’ll definitely read up on the bug you mentioned and look into adjusting my implementation (including resizing up front).
Really appreciate you taking the time to explain it!
I think you’re talking about something like this layout, right?
[k, k, k, ... empty ..., v, v, v, ... empty]or maybe
[k, k, k, ... empty ..., v, v, v]My first thought is that it would indeed save one array allocation (and one object header) per map instance, so in workloads where you have lots of very small maps that could be a real, if small, win in terms of memory/allocations.
For iteration over just keys or just values, locality should still be pretty good because each side is contiguous. So I’d expect it to behave similarly to having separate
keys[]/values[]arrays in those cases.I’ve also seen implementations that store key and value together per slot like
[[k, v], [k, v], ...], which is a different kind of layout again. I’m not entirely sure how your “keys on the left / values on the right” idea would compare in practice without benchmarking it, but it’s an interesting approach. It’s probably something I’d like to experiment with.I tried running a test similar to the one in the link you shared.
I was able to reproduce the bug you mentioned!
HashMap insert 3,000,000 entries: 138.06 ms, reinsert: 177.46 ms (load factor = 0.9)
SwissMap insert 3,000,000 entries: 677.10 ms, reinsert: 253.16 ms (load factor = 0.5)
SwissMap insert 3,000,000 entries: 426.91 ms, reinsert: 1189.93 ms (load factor = 0.75)
SwissMap insert 3,000,000 entries: 418.98 ms, reinsert: 45081.89 ms (load factor = 0.875)
RobinHoodMap insert 3,000,000 entries: 623.41 ms, reinsert: 309.41 ms (load factor = 0.5)
RobinHoodMap insert 3,000,000 entries: 515.65 ms, reinsert: 62736.08 ms (load factor = 0.75)
RobinHoodMap insert 3,000,000 entries: 418.70 ms, reinsert: 493540.47 ms (load factor = 0.875)
I worked around this issue by randomizing iteration order without allocating per-iterator buffers.
Since my tables use power-of-two capacity, I pick a random start and a random odd step, then iterate slots as (start + i * step) & (capacity - 1). This produces a full-cycle permutation over all slots. Every slot is visited exactly once in a scrambled order.
Example (capacity = 8, step = 3): indices visited are 0, 3, 6, 1, 4, 7, 2, 5 (a permutation of 0..7). This works for both my RobinHoodMap and SwissMap because their capacities are powers of two.
thank you so much for your feedback!