ruby - Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([]) -
consider code:
h = hash.new(0) # new hash pairs default have 0 values h[1] += 1 #=> {1=>1} h[2] += 2 #=> {2=>2} that’s fine, but:
h = hash.new([]) # empty array default value h[1] <<= 1 #=> {1=>[1]} ← ok h[2] <<= 2 #=> {1=>[1,2], 2=>[1,2]} ← why did `1` change? h[3] << 3 #=> {1=>[1,2,3], 2=>[1,2,3]} ← `3`? at point expect hash be:
{1=>[1], 2=>[2], 3=>[3]} but it’s far that. happening , how can behavior expect?
first, note behavior applies default value subsequently mutated (e.g. hashes , strings), not arrays.
tl;dr: use hash.new { |h, k| h[k] = [] } if want simplest, idiomatic solution.
what doesn’t work
why hash.new([]) doesn’t work
let’s more in-depth @ why hash.new([]) doesn’t work:
h = hash.new([]) h[0] << 'a' #=> ["a"] h[1] << 'b' #=> ["a", "b"] h[1] #=> ["a", "b"] h[0].object_id == h[1].object_id #=> true h #=> {} we can see our default object being reused , mutated (this because passed 1 , default value, hash has no way of getting fresh, new default value), why there no keys or values in array, despite h[1] still giving value? here’s hint:
h[42] #=> ["a", "b"] the array returned each [] call default value, we’ve been mutating time contains our new values. since << doesn’t assign hash (there can never assignment in ruby without = present†), we’ve never put our actual hash. instead have use <<= (which << += +):
h[2] <<= 'c' #=> ["a", "b", "c"] h #=> {2=>["a", "b", "c"]} this same as:
h[2] = (h[2] << 'c') why hash.new { [] } doesn’t work
using hash.new { [] } solves problem of reusing , mutating original default value (as block given called each time, returning new array), not assignment problem:
h = hash.new { [] } h[0] << 'a' #=> ["a"] h[1] <<= 'b' #=> ["b"] h #=> {1=>["b"]} what work
the assignment way
if remember use <<=, hash.new { [] } is viable solution, it’s bit odd , non-idiomatic (i’ve never seen <<= used in wild). it’s prone subtle bugs if << inadvertently used.
the mutable way
the documentation hash.new states (emphasis own):
if block specified, called hash object , key, , should return default value. it block’s responsibility store value in hash if required.
so must store default value in hash within block if wish use << instead of <<=:
h = hash.new { |h, k| h[k] = [] } h[0] << 'a' #=> ["a"] h[1] << 'b' #=> ["b"] h #=> {0=>["a"], 1=>["b"]} this moves assignment our individual calls (which use <<=) block passed hash.new, removing burden of unexpected behavior when using <<.
note there 1 functional difference between method , others: way assigns default value upon reading (as assignment happens inside block). example:
h1 = hash.new { |h, k| h[k] = [] } h1[:x] h1 #=> {:x=>[]} h2 = hash.new { [] } h2[:x] h2 #=> {} the immutable way
you may wondering why hash.new([]) doesn’t work while hash.new(0) works fine. key numerics in ruby immutable, naturally never end mutating them in-place. if treated our default value immutable, use hash.new([]) fine too:
h = hash.new([].freeze) h[0] += ['a'] #=> ["a"] h[1] += ['b'] #=> ["b"] h[2] #=> [] h #=> {0=>["a"], 1=>["b"]} of ways, prefer way—immutability makes reasoning things simpler (this is, after all, method has no possibility of hidden or subtle unexpected behavior).
† isn’t strictly true, methods instance_variable_set bypass this, must exist metaprogramming since l-value in = cannot dynamic.
Comments
Post a Comment