A Puzzle About Ruby Constants
Ruby’s algorithm for finding the definition of a constant is more complex than you might think.
The other day, I was doing some refactoring in the Rails CMS that serves thriveglobal.com. We allow our editors to toggle several boolean attributes on stories—starring them, flagging them, etc.—and we’d DRY’d up the controllers for these attributes by subclassing them to an abstract controller, StoryBooleansController
:
class StoryBooleansController < BaseController
def create
update_story(story_boolean => true)
end
def destroy
update_story(story_boolean => false)
end
private
def update_story(attributes)
story = Story.find(params[:story_id])
story.update!(attributes)
end
end
The child controllers implemented the #story_boolean
method to return the appropriate attribute, like this:
class StarsController < StoryBooleansController
private
def story_boolean
:starred
end
end
Noticing that this method always returned a static value, I wondered if it should be a constant instead of a method. So I refactored like this:
class StoryBooleansController < BaseController
def create
update_story(STORY_BOOLEAN => true)
end
def destroy
update_story(STORY_BOOLEAN => false)
end
# [SNIP]
endclass StarsController < StoryBooleansController
STORY_BOOLEAN = :starred
end
I ran the tests, and to my surprise I hit an error: NameError: uninitialized constant STORY_BOOLEAN
. The code that was calling the constant couldn’t find it even though it was able to find a method in the same context. Huh! What is going on here?
The Puzzle, Distilled
The method I changed into a constant is defined in one class, but is called from within its superclass. Here’s an example that distills the structure of the code I was working with:
class MyClass
def foo_via_method
foo_method
end
def foo_via_constant
FOO_CONSTANT
end
end
class SubClass < MyClass
FOO_CONSTANT = "foo"
def foo_method
FOO_CONSTANT
end
end
The original code was analogous to calling SubClass.new.foo_via_method
. After refactoring, the new code was analogous to calling SubClass.new.foo_via_constant
. We can recreate my failed refactor by noticing that the method version works but the constant version fails:
sub_class_instance = SubClass.new### THIS WORKS ###
sub_class_instance.foo_via_method
# => "foo"
### THIS DOESN'T ###
sub_class_instance.foo_via_constant
# NameError: uninitialized constant MyClass::FOO_CONSTANT
The version that refers to a method in the subclass returns the desired value, but the version that refers to a constant in subclass throws an error. So the puzzle is this: Why does the version that uses a method work but the version that uses the constant fail?
Method Lookup
Let’s look more closely at what’s going on when we call the successful #foo_via_method
. Here’s a crude, high-level description of how Ruby evaluates that method call:
- Look for a definition of
#foo_via_method
in the receiver’s class,SubClass
. Not there! - Look for a definition of
#foo_via_method
inSubClass
's superclass,MyClass
. Found it! - Pass the receiver the message
foo_method
. - Look up the value of
FOO_CONSTANT
:"foo"
. (I’ll go into much, much more detail about how this constant lookup works in what follows.)
Steps (1) and (2) exemplify Ruby’s method lookup algorithm. Ruby looks in the receiver’s class for the method definition, and if it can’t find it, Ruby iterates up the superclass chain (also known as the “ancestor chain”) until it finds a class that implements the method. (If it gets all the way to BasicObject
and strikes out, it invokes #method_missing
.)
So that’s the deal with method lookup. This is all pretty ho-hum for those with experience working in Ruby, or really any language that supports class inheritance. But why don’t constants behave the same way?
Constant Lookup
If you suspect that constant lookup does work similarly, you are largely correct. As with methods, Ruby is also able to look through the superclass chain to find constants. Here’s a demonstration:
class MyOtherClass
NAME = "Michael"
end
class OtherSubClass < MyOtherClass
end
OtherSubClass::NAME
# => "Michael"
Ruby is able to resolve OtherSubClass::NAME
even though that constant is not defined in OtherSubClass
, but rather in its superclass, MyOtherClass
. This is the same lookup behavior that caused #foo_via_method
to succeed.
So why does #foo_via_constant
fail?
The answer is that while both lookup algorithms look for definitions in the ancestors of the current class, they differ in how they determine which class counts as “the current class”. For method lookup the current class—the first class in the ancestor chain Ruby will traverse in search of a definition—is the receiver’s class. When we call #foo_via_method
on sub_class_instance
, the value of self
within the body of this method is our receiver, sub_class_instance
. So the current class for the purposes of looking up #foo_method
is SubClass
.
Constant lookup determines the “current class” differently. Rather than relying on the receiver to determine the current class, constant lookup starts with the class containing the method. To be more precise, constant lookup begins its superclass chain search using the class containing the current lexical scope. (If no class is open in the current scope, Ruby starts with the Object
class.)
Lexical scope is the context defined by where you are in the code. It’s what allows local variables to be defined within a block without affecting variables outside of the block, for example. So, while method lookup is relative to the receiver on which the method was called, constant lookup is relative only to the place in the code where Ruby encounters the constant.
(Useful tools: You can inspect the lexical scope hierarchy by referencing Module.nesting
in any context. You can inspect a superclass chain by calling #ancestors
on a class or module. For a good time, try calling .ancestors.count
on an ActiveRecord
model class in a mature Rails app.)
Going to the Source
To substantiate my claims about how Ruby finds constant definitions, let’s dive into the Ruby source code that actually implements this algorithm. (Note: For this discussion, I will be focusing only on the YARV implementation of Ruby, version 2.4.1.)
Let’s investigate what Ruby does when we call sub_class_instance.foo_via_constant
and it comes across the constant FOO_CONSTANT
in the body of that method. When we access a constant, the Ruby virtual machine, YARV, calls the getconstant
instruction, defined here. Let’s take a look at the comment on this function:
Get constant variable id. If klass is Qnil, constants are searched in the current scope. If klass is Qfalse, constants are searched as top level constants. Otherwise, get constant under klass class or module.
Qnil
and Qfalse
are YARV’s way of referring to Ruby nil
and false
. The parameter klass
refers to the explicit scope we apply to a constant when we call it. E.g. if we called SubClass::FOO_CONSTANT
, klass
would be SubClass
. In our puzzle we have a “naked” invocation of FOO_CONSTANT
, for which klass
is Qnil
. So what the comment is telling us is that when we encounter a naked constant, it will be “searched in the current scope”. That sounds promising! Let’s travel down the call stack to see what this really means in practice.
getconstant
calls vm_get_ev_const
, a friendly 75-line function that actually implements constant lookup. This function gets passed four parameters: a thread; the explicit class context, orig_class
; an identifier for the constant; and a caching parameter, is_defined
, with the value 0
. We are interested in a naked constant call, so orig_klass
will be Qnil
. Looking within the if (orig_klass = Qnil)
block, then, the first thing the function does is initialize a local variable, cref
(short for “code reference”). This variable holds the root of the lexical scope chain, which represents the place in the code at which the constant was encountered.
Next, the long while
block iterates up the lexical scope chain, checking at each step along the way to see if the constant is defined in that context. The line cref = CREF_NEXT(cref)
is where we take a step up the chain. The routine keeps climbing the chain until it finds a scope in which the constant is defined or until finally there is no next cref
, in which case we exit the while
block. It’s the latter that will occur in our puzzle when we call SubClass.new.foo_via_constant
; The constant FOO_CONSTANT
is not defined in the root lexical scope, that within MyClass
, and so lexical scope search will come up empty.
But YARV doesn’t stop its search there. It’s the next bit of code that is most crucial to our puzzle:
/*********************************************
From vm_insnhelper.c in definition of `vm_get_ev_const`
*********************************************//* search self */
if (root_cref && !NIL_P(CREF_CLASS(root_cref))) {
klass = vm_get_iclass(th->cfp, CREF_CLASS(root_cref));
}
else {
klass = CLASS_OF(th->cfp->self);
}
if (is_defined) {
return rb_const_defined(klass, id);
}
else {
return rb_const_get(klass, id);
}
YARV will now attempt to resolve the constant by looking through the superclass hierarchy. YARV’s first order of business is to identify the class it will use as the root of this hierarchy. How it determines that class is exactly what we were hoping to learn.
The root lexical scope is within the context of a class (MyClass
), so the condition
root_cref && !NIL_P(CREF_CLASS(root_cref))
is satisfied, and klass
is initialized as follows:
klass = vm_get_iclass(th->cfp, CREF_CLASS(root_cref));
The function vm_get_iclass
just returns the class it’s passed as a second argument (see definition here), so klass
gets assigned MyClass
. is_defined
was passed into vm_get_ev_const
as 0
, which is falsey, so the return value for the constant lookup we care about will be rb_const_get(klass, id)
where klass
is MyClass
.
Ok, we’re almost home! rb_const_get
calls rb_const_get_0
, which in turn tries to look up the constant via a call to rb_const_search
. This search function looks through the superclass hierarchy starting with the class it gets passed—in our case, MyClass
. To see that this is what rb_const_search
is doing, note that just before the retry block, tmp
is set to klass
, and we move up the hierarchy every time we hit tmp = RCLASS_SUPER(tmp)
here. The aptly named RCLASS_SUPER
takes a Ruby class and returns its superclass.
So there we have it! When we call sub_class_instance.foo_via_constant
, Ruby searches for FOO_CONSTANT
in MyClass
and its superclasses. It never looks in SubClass
, and so it can’t find a definition for the constant.
Summing Up: Method vs. Constant Lookup
In the last section, we established that Ruby’s constant lookup algorithm works as follows:
- Check if the constant is defined in the current lexical scope.
- If not, move up the lexical scope hierarchy and go back to (1).
- If you run out of scopes and still haven’t resolved the constant, move on.
- Check if the constant is defined in the class that’s open in the current lexical scope.
- If not, move op the superclass hierarchy and go back to (4).
- If you still strike out again: Error!
The bolded text in step (4) illustrates the difference between Ruby’s superclass chain search algorithm for constants and its similar algorithm for methods. For methods, the search starts with the receiver’s class. For constants, the search starts with the class you are in at the code location where the constant is called, also known as the lexical scope.
(Note: we are ignoring a further complication to this algorithm caused by Ruby’s ability to lazily initialize or “autoload” constants. That complication is not relevant to our puzzle.)
Why does Ruby work this way?
Ruby is simple in appearance, but is very complex inside…
Why doesn’t Ruby use the same superclass chain lookup algorithm for constants that it uses for methods? And was I unreasonable to expect that it might?
When we invoke a Ruby method, there is always an object we are calling it on. Method calls are messages that are passed to an object, the “receiver” of the message. This object might be explicitly mentioned in the code (some_receiver.some_method
), or it might be left implicit (some_method
), in which case the receiver defaults to the current value of self
. In either case, anywhere it finds a method call, Ruby has a unique object to work with. All Ruby objects have a class, and Ruby can—and does—look for the method’s definition in this class and its ancestors.
Constants, on the other hand, are not messages passed to objects. When we access a constant, that action is not relative to any particular object. So unless we provide Ruby with an explicit class to look in by prefixing our constant with SomeClass::
, the only scope available to search in is the lexical scope. So that’s where Ruby searches.
Why did I ever expect otherwise? In the case of my puzzle, it just so happened that my constant invocation occurred within the context of a method call. So in my puzzle there was a reference object, sub_class_instance
, that in theory could be used to determine a class scope. But this isn’t generally true for constant calls, because constants can be accessed outside of the context of method calls. Nevertheless, the apparent existence of a reference object tempted me to into thinking that Ruby would use it to resolve FOO_CONSTANT
.
Matz, Ruby’s creator, has said he is “trying to make Ruby natural, not simple”. He is willing to make Ruby’s implementation more complex in order to make its interface organic and frictionless. Ruby users embrace this design principle and have come to expect their language to “just work”. That is what I was doing when I refactored my controller. Sure, it would be more complex for Ruby to make an exception for constants encountered in the context of method calls. But if it did, we’d get to do things like sub_class_instance.foo_via_constant
, which feel natural. Given that Ruby goes out of its way to be natural, I don’t think expecting Ruby to behave this way was unreasonable.
And what about that controller?
This all started because I found it surprising that code like the following doesn’t work:
class StoryBooleansController < BaseController
def create
update_story(STORY_BOOLEAN => true)
end
def destroy
update_story(STORY_BOOLEAN => false)
end
private
def update_story(attributes)
story = Story.find(params[:story_id])
story.update!(attributes)
end
endclass StarsController < StoryBooleansController
STORY_BOOLEAN = :starred
end
If I really wanted to use a constant here, and if I wanted to continue to support many subclasses of StoryBooleansController
, what could I do?
One solution would be to provide an explicit class scope when we call the constant, like so:
class StoryBooleansController < BaseController
def create
update_story(self.class::STORY_BOOLEAN => true)
end
def destroy
update_story(self.class::STORY_BOOLEAN => false)
end
# [SNIP]
end
With this change, getconstant
will be invoked with klass
set to the subclassed controller—StarsController
in our example—and will immediately find the constant defined there. Whether or not this code is better than what I started with I leave to the reader’s judgement, but at least now we understand how and why it works.
Thanks!
I would never have considered going down this rabbit hole if it weren’t for Pat Shaughnessy’s excellent book about Ruby internals, Ruby Under a Microscope. The book served as my main reference text for this post. Conveniently, you can read the chapter on method and constant lookup online for free (PDF). Then go buy it! Pat also generously answered my questions on Twitter and helped me figure out where to look in the Ruby codebase. Thanks, Pat!
Thanks also to friends and colleagues who discussed or read drafts of this post, including: Becca Liss, Bryan Mytko, Greg Bednarek, Karl Rosaen, and Mae Capozzi.