HAYCORN — 1 January 2011

Understanding the Y Combinator by writing about it

Mo­ti­va­tion

This post is written for those who, like me, have tried and failed to un­der­stand the Y Com­bi­na­tor despite the help of Wikipedia, blog posts con­struct­ing the Y Com­bi­na­tor in mul­ti­ple languages, stories on Hacker News, com­ments on Hacker News… I think I mostly get it now (this comes cour­tesy of a night in Helsinki where I was up at a strange time due to the fallout from a Helsinki-Stockholm-Helsinki booze cruise)—and so perhaps these notes will help someone else. If nothing else, writing them up helped me! (Maybe the only way to un­der­stand it is to write a blog post about it.)

I think a large part of the problem un­der­stand­ing the Y Com­bi­na­tor is that the various bits of in­for­ma­tion avail­able online are written from several dif­fer­ent perspectives, which means that they em­pha­sise dif­fer­ent aspects of the Y Com­bi­na­tor story, to wit:

  • The prop­er­ties of the Y Com­bi­na­tor func­tion itself. Technically, the Y Com­bi­na­tor is one example of a fixed point combinator, meaning that it’s a higher order func­tion with a par­tic­u­lar prop­erty (to be de­scribed later). Sadly it seems this prop­erty is very hard to grok unless you’ve re­cently spent some time with lambda calculus (or at least it was for me), so dis­cus­sion is usually mo­ti­vated by one par­tic­u­lar ap­pli­ca­tion of the Y Combinator, which is that…
  • The Y Com­bi­na­tor allows you to write “recursive” func­tions without ex­plicit recursion. (“Without ex­plicit recursion” means “recursion” without ex­plic­itly calling a func­tion named foo in the body of a func­tion that itself is called foo.) Thus, dis­cus­sion of the Y Com­bi­na­tor is tied up with anony­mous (i.e. unnamed) functions. (Note that the Y Combinator-powered version of a func­tion will almost cer­tainly not be any faster than the re­cur­sive version, and it’s going to need just as much stack space.) Finally, Y Com­bi­na­tor posts and ar­ti­cles often have an ex­tended section on…
  • The im­ple­men­ta­tion of the Y Com­bi­na­tor in some language. The Y Com­bi­na­tor is an im­por­tant part of (theoretical) com­puter science, and its im­ple­men­ta­tion typ­i­cally stretches a language’s support for first class functions—it’s often in­struc­tive to compare how el­e­gantly (or not) it can be im­ple­mented in various languages. (A few examples: JavaScript, Perl, PHP.)

De­riv­ing the Y Com­bi­na­tor

Step 1

As I men­tioned above, dis­cus­sion of the Y Com­bi­na­tor is typ­i­cally mo­ti­vated by a par­tic­u­lar problem: how to write “recursive” func­tions without ex­plicit recursion. I don’t know any better way of going about it, so let’s first con­sider that stan­dard ex­em­plar of recursion, the fac­to­r­ial function, and trans­form it into a version that uses the Y Combinator. The stan­dard implementation:

var fact = func­tion (n) {
 if (n < 2) return 1;
 return n * fact(n - 1);
};

(I’m using JavaScript in these ex­am­ples because its widely known and has good support for first class functions.)

Step 2

How might you go about re­mov­ing the self-referential call to fact() within the func­tion body? (The exact reasons for wanting to do this are not important, but for example suppose that you want to inline the entire call.) One obvious way is to use a keyword pro­vided by the lan­guage that refers to the func­tion itself such as JavaScript’s arguments.callee, but, well, this is cheating and we can do better!

Step 3

Our first attempt is to trans­form the fact() func­tion slightly, so that fact becomes an argument to a wrapper function, instead of a free vari­able re­fer­ring to the fac­to­r­ial function. We can then use this ar­gu­ment in the “recursive” call:

var fac­t_wrap­per1 = func­tion (fact) {
 return func­tion (n) {
 if (n < 2) return 1;
 return n * fact(n - 1);
 };
};

At this point, the func­tion con­tains no ref­er­ences to vari­ables de­clared outside the scope of the func­tion itself—though overall we haven’t much progress because we also don’t have a working fact() function… If only there were some way to call fact_wrapper1() passing it its own return value as an argument! If we could get this to work, the return value of fact_wrapper1() would be the fac­to­r­ial func­tion we want.

Step 4

Okay, so that attempt didn’t quite work out; let’s try a slightly dif­fer­ent approach. Perhaps it’s pos­si­ble to modify fact_wrapper1() so that it can be passed as an ar­gu­ment to itself. (This cur­rently isn’t pos­si­ble because fact_wrapper1()’s ar­gu­ment (fact) is a simple func­tion that takes an integer ar­gu­ment and returns an integer; by con­trast fact_wrapper1() itself is a higher-order function that re­ceives a function as an argument, re­turn­ing a simple func­tion that takes an integer ar­gu­ment and re­turn­ing an integer.) If we can pass the func­tion to itself, we elim­i­nate the need to figure out how to extract just its return value. fact_wrapper1() can be trans­formed into such a func­tion as follows:

var fac­t_wrap­per2 = func­tion (f) {
 return func­tion (n) {
 if (n < 2) return 1;
 return n * f(f)(n - 1);
 };
};

Note that the only real change is that f(f)(n - 1) re­places fact(n - 1). This is because f(f) returns a function, which we then apply to n - 1.

Step 5

Let’s just check to see if fact_wrapper2 can really be used as an ar­gu­ment to itself:

> var fact = fact_wrapper2(fact_wrapper2);
> fact(5);
120

Okay, that works: so far, so good! (You can copy-and-paste all these ex­am­ples into a NodeJS session.)

We can also make fact_wrapper2() look a bit more like fact_wrapper1() by moving the weird f(f) call into a helper function:

var fac­t_wrap­per2 = func­tion (f) {
 var g = func­tion (n) {
 return f(f)(n);
 };
 return func­tion (n) {
 if (n < 2) return 1;
 return n * g(n - 1);
 };
};

Step 6

By adding another helper function, we can avoid the need for fact_wrapper2 to appear twice, and if we inline its definition, we can avoid the need to give the wrapper func­tion a name at all:

var recur = func­tion (f) {
 return f(f);
};

var fact = recur(function (f) {
 var g = func­tion (n) {
 return f(f)(n);
 };
 return func­tion (n) {
 if (n < 2) return 1;
 return n * g(n - 1);
 };
});

We still get the same result:

> fact(5);
120

Step 7

Next, let’s trans­form Step 6 by elim­i­nat­ing the local vari­able g, instead passing it in as an ar­gu­ment of a new inlined function:

var recur = func­tion (f) {
 return f(f);
};

var fact = recur(function (f) {
 return (function (g) {
 return func­tion (n) { // un­changed from above
 if (n < 2) return 1;
 return n * g(n - 1);
 }
 })(function (n) {
 return f(f)(n);
 });
});

Step 8

This lets us move the fac­to­r­ial func­tion itself out of the helper code:

var fac­t_­core = func­tion (g) {
 return function(n) {
 if (n < 2) return 1;
 return n * g(n - 1);
 }
};

var recur = func­tion (f) {
 return f(f);
};

var fact = recur(function (f) {
 return fact_core(function (n) {
 return f(f)(n);
 });
});

Step 9

Almost there! At this point we can also inline recur() itself:

var fac­t_­core = func­tion (g) { // same as above
 return function(n) {
 if (n < 2) return 1;
 return n * g(n - 1);
 }
};

var fact = (function (g) { // the inlined recur() func­tion
 return g(g);
})(function (f) { // same as above; now an ar­gu­ment to the inlined recur()
 return fact_core(function (n) {
 return f(f)(n);
 });
});

Step 10

Finally, we can turn the helper code into a stand­alone func­tion (i.e. elim­i­nat­ing the ref­er­ence to fact_core in the body of Step 9’s fact by con­vert­ing it into an argument), which we’ll call Y():

var Y = func­tion (h) {
 return (function (f) {
 return f(f);
 })(function (f) {
 return h(function (n) {
 return f(f)(n);
 });
 });
};

var fac­t_­core = func­tion (g) { // same as above
 return function(n) {
 if (n < 2) return 1;
 return n * g(n - 1);
 }
};

var fact = Y(fact_core);

Hurrah! We have finally achieved Y Combinator:

> fact(5)
120

Notes and Ob­ser­va­tions

Neither the func­tion Y() nor fact_core() are self-referential—these names do not appear any­where in the code. A lot of other func­tions are involved, but since they’re all anonymous, they can’t refer to them­selves either. Hence, the Y Com­bi­na­tor can be used to im­ple­ment self-referential (i.e. recursive) pro­grams in lan­guages that don’t oth­er­wise support it.

The func­tion fact() is tech­ni­cally the “fixed point” of the func­tion fact_core() and the Y Com­bi­na­tor is a func­tion for finding such fixed points. The fixed point p of a func­tion f is a func­tion that sat­is­fies the con­di­tion f(p) = p. What this ac­tu­ally means is some­what hard to get one’s head around (see the Wikipedia page for more), but it’s easy to check that fact_core() and fact() do have these properties:

> fact_core(fact)(5);
120
> fact(5)
120

This means that the fol­low­ing ex­pres­sions are all (mathematically) equiv­a­lent to the func­tion fact:

  • Y(fact_core)
  • fact_core(Y(fact_core))
  • fact_core(fact)
  • fact_core(fact_core(fact_core(Y(fact_core))))

Ref­er­ences