Motivation
This post is written for those who, like me, have tried and failed to understand the Y Combinator despite the help of Wikipedia, blog posts constructing the Y Combinator in multiple languages, stories on Hacker News, comments on Hacker News… I think I mostly get it now (this comes courtesy of a night in Helsinki where I was up at a strange time due to the fallout from a Helsinki-Stockholm-Helsinki booze cruise)—and so perhaps these notes will help someone else. If nothing else, writing them up helped me! (Maybe the only way to understand it is to write a blog post about it.)
I think a large part of the problem understanding the Y Combinator is that the various bits of information available online are written from several different perspectives, which means that they emphasise different aspects of the Y Combinator story, to wit:
- The properties of the Y Combinator function itself. Technically, the Y Combinator is one example of a fixed point combinator, meaning that it’s a higher order function with a particular property (to be described later). Sadly it seems this property is very hard to grok unless you’ve recently spent some time with lambda calculus (or at least it was for me), so discussion is usually motivated by one particular application of the Y Combinator, which is that…
- The Y Combinator allows you to write “recursive” functions without
explicit recursion. (“Without explicit recursion” means
“recursion” without explicitly calling a function named
foo
in the body of a function that itself is calledfoo
.) Thus, discussion of the Y Combinator is tied up with anonymous (i.e. unnamed) functions. (Note that the Y Combinator-powered version of a function will almost certainly not be any faster than the recursive version, and it’s going to need just as much stack space.) Finally, Y Combinator posts and articles often have an extended section on… - The implementation of the Y Combinator in some language. The Y Combinator is an important part of (theoretical) computer science, and its implementation typically stretches a language’s support for first class functions—it’s often instructive to compare how elegantly (or not) it can be implemented in various languages. (A few examples: JavaScript, Perl, PHP.)
Deriving the Y Combinator
Step 1
As I mentioned above, discussion of the Y Combinator is typically motivated by a particular problem: how to write “recursive” functions without explicit recursion. I don’t know any better way of going about it, so let’s first consider that standard exemplar of recursion, the factorial function, and transform it into a version that uses the Y Combinator. The standard implementation:
var fact = function (n) {
if (n < 2) return 1;
return n * fact(n - 1);
};
(I’m using JavaScript in these examples because its widely known and has good support for first class functions.)
Step 2
How might you go about removing the self-referential call to fact()
within the function body? (The exact reasons for wanting to do this are
not important, but for example suppose that you want to inline the
entire call.) One obvious way is to use a keyword provided by the
language that refers to the function itself such as JavaScript’s
arguments.callee,
but, well, this is cheating and we can do better!
Step 3
Our first attempt is to transform the fact()
function slightly, so
that fact
becomes an argument to a wrapper function, instead of a
free variable referring to the factorial function. We can then use this
argument in the “recursive” call:
var fact_wrapper1 = function (fact) {
return function (n) {
if (n < 2) return 1;
return n * fact(n - 1);
};
};
At this point, the function contains no references to variables declared
outside the scope of the function itself—though overall we haven’t much
progress because we also don’t have a working fact()
function… If only
there were some way to call fact_wrapper1()
passing it its own return
value as an argument! If we could get this to work, the return value of
fact_wrapper1()
would be the factorial function we want.
Step 4
Okay, so that attempt didn’t quite work out; let’s try a slightly
different approach. Perhaps it’s possible to modify fact_wrapper1()
so
that it can be passed as an argument to itself. (This currently isn’t
possible because fact_wrapper1()
’s argument (fact
) is a simple
function that takes an integer argument and returns an integer; by
contrast fact_wrapper1()
itself is a higher-order
function that
receives a function as an argument, returning a simple function that
takes an integer argument and returning an integer.) If we can pass the
function to itself, we eliminate the need to figure out how to extract
just its return value. fact_wrapper1()
can be transformed into such a
function as follows:
var fact_wrapper2 = function (f) {
return function (n) {
if (n < 2) return 1;
return n * f(f)(n - 1);
};
};
Note that the only real change is that f(f)(n - 1)
replaces
fact(n - 1)
. This is because f(f)
returns a function, which we then
apply to n - 1
.
Step 5
Let’s just check to see if fact_wrapper2
can really be used as an
argument to itself:
> var fact = fact_wrapper2(fact_wrapper2);
> fact(5);
120
Okay, that works: so far, so good! (You can copy-and-paste all these examples into a NodeJS session.)
We can also make fact_wrapper2()
look a bit more like
fact_wrapper1()
by moving the weird f(f)
call into a helper
function:
var fact_wrapper2 = function (f) {
var g = function (n) {
return f(f)(n);
};
return function (n) {
if (n < 2) return 1;
return n * g(n - 1);
};
};
Step 6
By adding another helper function, we can avoid the need for
fact_wrapper2
to appear twice, and if we inline its definition, we can
avoid the need to give the wrapper function a name at all:
var recur = function (f) {
return f(f);
};
var fact = recur(function (f) {
var g = function (n) {
return f(f)(n);
};
return function (n) {
if (n < 2) return 1;
return n * g(n - 1);
};
});
We still get the same result:
> fact(5);
120
Step 7
Next, let’s transform Step 6 by eliminating the local variable g
,
instead passing it in as an argument of a new inlined function:
var recur = function (f) {
return f(f);
};
var fact = recur(function (f) {
return (function (g) {
return function (n) { // unchanged from above
if (n < 2) return 1;
return n * g(n - 1);
}
})(function (n) {
return f(f)(n);
});
});
Step 8
This lets us move the factorial function itself out of the helper code:
var fact_core = function (g) {
return function(n) {
if (n < 2) return 1;
return n * g(n - 1);
}
};
var recur = function (f) {
return f(f);
};
var fact = recur(function (f) {
return fact_core(function (n) {
return f(f)(n);
});
});
Step 9
Almost there! At this point we can also inline recur()
itself:
var fact_core = function (g) { // same as above
return function(n) {
if (n < 2) return 1;
return n * g(n - 1);
}
};
var fact = (function (g) { // the inlined recur() function
return g(g);
})(function (f) { // same as above; now an argument to the inlined recur()
return fact_core(function (n) {
return f(f)(n);
});
});
Step 10
Finally, we can turn the helper code into a standalone function (i.e.
eliminating the reference to fact_core
in the body of Step 9’s fact
by converting it into an argument), which we’ll call Y()
:
var Y = function (h) {
return (function (f) {
return f(f);
})(function (f) {
return h(function (n) {
return f(f)(n);
});
});
};
var fact_core = function (g) { // same as above
return function(n) {
if (n < 2) return 1;
return n * g(n - 1);
}
};
var fact = Y(fact_core);
Hurrah! We have finally achieved Y Combinator:
> fact(5)
120
Notes and Observations
Neither the function Y()
nor fact_core()
are self-referential—these
names do not appear anywhere in the code. A lot of other functions are
involved, but since they’re all anonymous, they can’t refer to
themselves either. Hence, the Y Combinator can be used to implement
self-referential (i.e. recursive) programs in languages that don’t
otherwise support it.
The function fact()
is technically the “fixed
point” of the
function fact_core()
and the Y Combinator is a function for finding
such fixed points. The fixed point p of a function f is a function
that satisfies the condition f(p) = p. What this actually means is
somewhat hard to get one’s head around (see the Wikipedia
page for more),
but it’s easy to check that fact_core()
and fact()
do have these
properties:
> fact_core(fact)(5);
120
> fact(5)
120
This means that the following expressions are all (mathematically)
equivalent to the function fact
:
Y(fact_core)
fact_core(Y(fact_core))
fact_core(fact)
fact_core(fact_core(fact_core(Y(fact_core))))
References
- Wikipedia’s page on the fixed point combinator
- Deriving the Y Combinator in 7 Easy Steps – JavaScript
- Deriving the Y Combinator – JavaScript
- The Why of Y (PDF) – Richard P. Gabriel; Scheme
- Fixed-point combinators in JavaScript: Memoizing recursive functions – JavaScript
- Y-Combinator in PHP – PHP
- The PHP 5.3 Y-Combinator – PHP