Some time ago someone asked on Stackoverflow how Ruby’s
Enumerators do their job. Consider there is an
implementation of an arbitrary Object that features an each method, then it is possible to get an
Enumerator for that object via Kernel#to_enum:
o = Object.new
def o.each
yield 1
yield 2
yield 3
end
enum = o.to_enum
Enumerators come in very handy, especially in situations where you want the basic functionality of
each, but with a twist - let’s say you would like to iterate through the characters of a String
but at the same time you want access to the index of the current character, without having to
maintain that index manually. That’s what each_with_index does, something you probably use a dozen
times a day. The importance here is that the each_with_index functionality is not something that
is provided by the String class, but something that comes with Enumerator. In fact,
String#each_char returns an instance of Enumerator.
alphabet = ('a'...'z').to_a.join("")
enum = alphabet.each_char
puts enum.class # => Enumerator
enum.each_with_index do |c, i|
puts "#{i}: #{c}"
end
Besides String#each_char, each_byte, each_line etc. there are other classes typically
returning Enumerators as well when each is called with no arguments, or they have special
each_xxx methods similar to String#each_char.
puts [1, 2, 3].each.class # => Enumerator
puts ({ a: "b" }).each_key.class # => Enumerator
...
So it turns out that we use Enumerators in quite a few places. It would be a real bummer if these
weren’t implemented efficiently, right? Now with those Enumerators that are built-in we can
certainly trust that they are implemented efficiently because we are in the context of a specific
class and probably know how to do things clean and fast in that context.
But an interesting question is how to proceed in the case of arbitrary Objects that implement an
each method, as was the case in our first example. A simple, straight-forward implementation of an
Enumerator in that case immediately jumps to mind: iterate through all the elements in the object,
store them in an Array and then hand them out one by one. This certainly works, an implementation
could look like the following. Let’s assume that the object implements nothing but a bare each
method that takes the usual block.
class EagerEnumerator
def initialize(enumerable)
@ary = [].tap do |a|
enumerable.each { |item| a << item }
end
@i = 0
end
def next
raise StopIteration.new("iteration reached an end") if @i == @ary.size
val = @ary[@i]
@i += 1
val
end
end
class MyEnumerable
def each
yield 1
yield 2
yield 3
end
end
e = EagerEnumerator.new(MyEnumerable.new)
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # => StopIteration is raised
So it works, but it has one major drawback: All the references of the values in the enumerable are
duplicated into an Array immediately when the EagerEnumerator is created. That’s not a biggie if
the enumerable is small, but it’s certainly a waste of memory and could become a problem if the
enumerable is quite large or the time to retrieve a single object in each is significant. That’s a
problem that is probably well known to Pythonistas, and they solved this by introducing
Generators. The goal is evident. Rather than doing the
whole work at once, we would like to do the work only on request, only when it is necessary. As it
turns out, Ruby has a very nice feature that allows us to do exactly that, and this is also how an
Enumerator is implemented for Object#to_enum.
Continuation is the tool that fulfills our requirements
perfectly, and Ruby offers it in the form of Fibers. Let’s have a look at how Object#to_enum is
implemented
in CRuby:
static VALUE
obj_to_enum(int argc, VALUE *argv, VALUE obj)
{
VALUE meth = sym_each;
if (argc > 0) {
--argc;
meth = *argv++;
}
return rb_enumeratorize(obj, meth, argc, argv);
}
As we can see, to_enum only works for Objects that have an each method (the sym_each). So
what rb_enumeratorize now does is it creates an Enumerator instance, and the really interesting
part is what happens in calls to next:
static VALUE
enumerator_next(VALUE obj)
{
VALUE vs = enumerator_next_values(obj);
return ary2sv(vs, 0);
}
static VALUE
enumerator_next_values(VALUE obj)
{
struct enumerator *e = enumerator_ptr(obj);
VALUE vs;
if (e->lookahead != Qundef) {
vs = e->lookahead;
e->lookahead = Qundef;
return vs;
}
return get_next_values(obj, e);
}
static VALUE
get_next_values(VALUE obj, struct enumerator *e)
{
VALUE curr, vs;
if (e->stop_exc)
rb_exc_raise(e->stop_exc);
curr = rb_fiber_current();
if (!e->fib || !rb_fiber_alive_p(e->fib)) {
/* sets up the fiber initially */
next_init(obj, e);
}
/* resume the Fiber if next value is needed */
vs = rb_fiber_resume(e->fib, 1, &curr);
if (e->stop_exc) {
e->fib = 0;
e->dst = Qnil;
e->lookahead = Qundef;
e->feedvalue = Qundef;
rb_exc_raise(e->stop_exc);
}
return vs;
}
The real interesting part is what happens in get_next_values. There, initially a Fiber is set up
that pulls out the values from the each method one by one. It’s here where the beauty of Fibers
and Continuations really shines. Without them, we would be stuck here, because calls to each are
of a all-or-nothing nature, we either process all of the values immediately or we don’t. But with
the additional functionality of a Fiber we can elegantly “pause” its execution immediately after
we retrieved a single value from each, and return that to the caller. Later, if another value is
requested, we can simply pick up in the Fiberfrom where we left off, get the next value, pause
again, return the value to the caller and so on.
The benefit of this approach is that we don’t have to do the housekeeping for any excess values, we
simply pass on the current value. Unlike before, where each has practically “flooded” us with all
its values at once, we now have more control over the process and with this finer-grained control we
are now able to tell each: “Stop. One is enough. I’ll ask for more later!”.
This pattern can be really useful in everyday tasks, and saving some memory or time is probably
never a bad idea. Fortunately for us, it’s really easy to implement in Ruby itself. Let’s have a
look at a lazy version of our Enumerator:
class LazyEnumerator
def initialize(enumerable)
@fiber = Fiber.new do
enumerable.each { |item| Fiber.yield item }
end
end
def next
@fiber.resume || raise(StopIteration.new("iteration reached an end"))
end
end
class MyEnumerable
def each
yield 1
yield 2
yield 3
end
end
e = LazyEnumerator.new(MyEnumerable.new)
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # => StopIteration is raised
This is pretty much the equivalent Ruby code for how Enumerators are implemented internally in
CRuby.