ruby - Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([]) -
consider code:
h = hash.new(0) # new hash pairs default have 0 values h[1] += 1 #=> {1=>1} h[2] += 2 #=> {2=>2}
that’s fine, but:
h = hash.new([]) # empty array default value h[1] <<= 1 #=> {1=>[1]} ← ok h[2] <<= 2 #=> {1=>[1,2], 2=>[1,2]} ← why did `1` change? h[3] << 3 #=> {1=>[1,2,3], 2=>[1,2,3]} ← `3`?
at point expect hash be:
{1=>[1], 2=>[2], 3=>[3]}
but it’s far that. happening , how can behavior expect?
first, note behavior applies default value subsequently mutated (e.g. hashes , strings), not arrays.
tl;dr: use hash.new { |h, k| h[k] = [] }
if want simplest, idiomatic solution.
what doesn’t work
why hash.new([])
doesn’t work
let’s more in-depth @ why hash.new([])
doesn’t work:
h = hash.new([]) h[0] << 'a' #=> ["a"] h[1] << 'b' #=> ["a", "b"] h[1] #=> ["a", "b"] h[0].object_id == h[1].object_id #=> true h #=> {}
we can see our default object being reused , mutated (this because passed 1 , default value, hash has no way of getting fresh, new default value), why there no keys or values in array, despite h[1]
still giving value? here’s hint:
h[42] #=> ["a", "b"]
the array returned each []
call default value, we’ve been mutating time contains our new values. since <<
doesn’t assign hash (there can never assignment in ruby without =
present†), we’ve never put our actual hash. instead have use <<=
(which <<
+=
+
):
h[2] <<= 'c' #=> ["a", "b", "c"] h #=> {2=>["a", "b", "c"]}
this same as:
h[2] = (h[2] << 'c')
why hash.new { [] }
doesn’t work
using hash.new { [] }
solves problem of reusing , mutating original default value (as block given called each time, returning new array), not assignment problem:
h = hash.new { [] } h[0] << 'a' #=> ["a"] h[1] <<= 'b' #=> ["b"] h #=> {1=>["b"]}
what work
the assignment way
if remember use <<=
, hash.new { [] }
is viable solution, it’s bit odd , non-idiomatic (i’ve never seen <<=
used in wild). it’s prone subtle bugs if <<
inadvertently used.
the mutable way
the documentation hash.new
states (emphasis own):
if block specified, called hash object , key, , should return default value. it block’s responsibility store value in hash if required.
so must store default value in hash within block if wish use <<
instead of <<=
:
h = hash.new { |h, k| h[k] = [] } h[0] << 'a' #=> ["a"] h[1] << 'b' #=> ["b"] h #=> {0=>["a"], 1=>["b"]}
this moves assignment our individual calls (which use <<=
) block passed hash.new
, removing burden of unexpected behavior when using <<
.
note there 1 functional difference between method , others: way assigns default value upon reading (as assignment happens inside block). example:
h1 = hash.new { |h, k| h[k] = [] } h1[:x] h1 #=> {:x=>[]} h2 = hash.new { [] } h2[:x] h2 #=> {}
the immutable way
you may wondering why hash.new([])
doesn’t work while hash.new(0)
works fine. key numerics in ruby immutable, naturally never end mutating them in-place. if treated our default value immutable, use hash.new([])
fine too:
h = hash.new([].freeze) h[0] += ['a'] #=> ["a"] h[1] += ['b'] #=> ["b"] h[2] #=> [] h #=> {0=>["a"], 1=>["b"]}
of ways, prefer way—immutability makes reasoning things simpler (this is, after all, method has no possibility of hidden or subtle unexpected behavior).
† isn’t strictly true, methods instance_variable_set
bypass this, must exist metaprogramming since l-value in =
cannot dynamic.
Comments
Post a Comment