Understanding the Y Combinator by writing about it

Motivation

This post is written for those who, like me, have tried and failed to understand the Y Combinator despite the help of Wikipedia, blog posts constructing the Y Combinator in multiple languages, stories on Hacker News, comments on Hacker News… I think I mostly get it now (this comes courtesy of a night in Helsinki where I was up at a strange time due to the fallout from a Helsinki-Stockholm-Helsinki booze cruise)—and so perhaps these notes will help someone else. If nothing else, writing them up helped me! (Maybe the only way to understand it is to write a blog post about it.)

I think a large part of the problem understanding the Y Combinator is that the various bits of information available online are written from several different perspectives, which means that they emphasise different aspects of the Y Combinator story, to wit:

  • The properties of the Y Combinator function itself. Technically, the Y Combinator is one example of a fixed point combinator, meaning that it’s a higher order function with a particular property (to be described later). Sadly it seems this property is very hard to grok unless you’ve recently spent some time with lambda calculus (or at least it was for me), so discussion is usually motivated by one particular application of the Y Combinator, which is that…
  • The Y Combinator allows you to write “recursive” functions without explicit recursion. (“Without explicit recursion” means “recursion” without explicitly calling a function named foo in the body of a function that itself is called foo.) Thus, discussion of the Y Combinator is tied up with anonymous (i.e. unnamed) functions. (Note that the Y Combinator-powered version of a function will almost certainly not be any faster than the recursive version, and it’s going to need just as much stack space.) Finally, Y Combinator posts and articles often have an extended section on…
  • The implementation of the Y Combinator in some language. The Y Combinator is an important part of (theoretical) computer science, and its implementation typically stretches a language’s support for first class functions—it’s often instructive to compare how elegantly (or not) it can be implemented in various languages. (A few examples: JavaScript, Perl, PHP.)

Deriving the Y Combinator

Step 1

As I mentioned above, discussion of the Y Combinator is typically motivated by a particular problem: how to write “recursive” functions without explicit recursion. I don’t know any better way of going about it, so let’s first consider that standard exemplar of recursion, the factorial function, and transform it into a version that uses the Y Combinator. The standard implementation:

var fact = function (n) {
  if (n < 2) return 1;
  return n * fact(n - 1);
};

(I’m using JavaScript in these examples because its widely known and has good support for first class functions.)

Step 2

How might you go about removing the self-referential call to fact() within the function body? (The exact reasons for wanting to do this are not important, but for example suppose that you want to inline the entire call.) One obvious way is to use a keyword provided by the language that refers to the function itself such as JavaScript’s arguments.callee, but, well, this is cheating and we can do better!

Step 3

Our first attempt is to transform the fact() function slightly, so that fact becomes an argument to a wrapper function, instead of a free variable referring to the factorial function. We can then use this argument in the “recursive” call:

var fact_wrapper1 = function (fact) {
  return function (n) {
      if (n < 2) return 1;
      return n * fact(n - 1);
  };
};

At this point, the function contains no references to variables declared outside the scope of the function itself—though overall we haven’t much progress because we also don’t have a working fact() function… If only there were some way to call fact_wrapper1() passing it its own return value as an argument! If we could get this to work, the return value of fact_wrapper1() would be the factorial function we want.

Step 4

Okay, so that attempt didn’t quite work out; let’s try a slightly different approach. Perhaps it’s possible to modify fact_wrapper1() so that it can be passed as an argument to itself. (This currently isn’t possible because fact_wrapper1()’s argument (fact) is a simple function that takes an integer argument and returns an integer; by contrast fact_wrapper1() itself is a higher-order function that receives a function as an argument, returning a simple function that takes an integer argument and returning an integer.) If we can pass the function to itself, we eliminate the need to figure out how to extract just its return value. fact_wrapper1() can be transformed into such a function as follows:

var fact_wrapper2 = function (f) {
  return function (n) {
      if (n < 2) return 1;
      return n * f(f)(n - 1);
  };
};

Note that the only real change is that f(f)(n - 1) replaces fact(n - 1). This is because f(f) returns a function, which we then apply to n - 1.

Step 5

Let’s just check to see if fact_wrapper2 can really be used as an argument to itself:

> var fact = fact_wrapper2(fact_wrapper2);
> fact(5);
120

Okay, that works: so far, so good! (You can copy-and-paste all these examples into a NodeJS session.)

We can also make fact_wrapper2() look a bit more like fact_wrapper1() by moving the weird f(f) call into a helper function:

var fact_wrapper2 = function (f) {
  var g = function (n) {
      return f(f)(n);
  };
  return function (n) {
      if (n < 2) return 1;
      return n * g(n - 1);
  };
};

Step 6

By adding another helper function, we can avoid the need for fact_wrapper2 to appear twice, and if we inline its definition, we can avoid the need to give the wrapper function a name at all:

var recur = function (f) {
  return f(f);
};

var fact = recur(function (f) {
  var g = function (n) {
      return f(f)(n);
  };
  return function (n) {
      if (n < 2) return 1;
      return n * g(n - 1);
  };
});

We still get the same result:

> fact(5);
120

Step 7

Next, let’s transform Step 6 by eliminating the local variable g, instead passing it in as an argument of a new inlined function:

var recur = function (f) {
  return f(f);
};

var fact = recur(function (f) {
  return (function (g) {
      return function (n) { // unchanged from above
          if (n < 2) return 1;
          return n * g(n - 1);
      }
  })(function (n) {
      return f(f)(n);
  });
});

Step 8

This lets us move the factorial function itself out of the helper code:

var fact_core = function (g) {
  return function(n) {
      if (n < 2) return 1;
      return n * g(n - 1);
  }
};

var recur = function (f) {
  return f(f);
};

var fact = recur(function (f) {
  return fact_core(function (n) {
      return f(f)(n);
  });
});

Step 9

Almost there! At this point we can also inline recur() itself:

var fact_core = function (g) { // same as above
  return function(n) {
      if (n < 2) return 1;
      return n * g(n - 1);
  }
};

var fact = (function (g) { // the inlined recur() function
  return g(g);
})(function (f) { // same as above; now an argument to the inlined recur()
  return fact_core(function (n) {
      return f(f)(n);
  });
});

Step 10

Finally, we can turn the helper code into a standalone function (i.e. eliminating the reference to fact_core in the body of Step 9’s fact by converting it into an argument), which we’ll call Y():

var Y = function (h) {
  return (function (f) {
      return f(f);
  })(function (f) {
      return h(function (n) {
          return f(f)(n);
      });
  });
};

var fact_core = function (g) { // same as above
  return function(n) {
      if (n < 2) return 1;
      return n * g(n - 1);
  }
};

var fact = Y(fact_core);

Hurrah! We have finally achieved Y Combinator:

> fact(5)
120

Notes and Observations

Neither the function Y() nor fact_core() are self-referential—these names do not appear anywhere in the code. A lot of other functions are involved, but since they’re all anonymous, they can’t refer to themselves either. Hence, the Y Combinator can be used to implement self-referential (i.e. recursive) programs in languages that don’t otherwise support it.

The function fact() is technically the “fixed point” of the function fact_core() and the Y Combinator is a function for finding such fixed points. The fixed point p of a function f is a function that satisfies the condition f§ = p. What this actually means is somewhat hard to get one’s head around (see the Wikipedia page for more), but it’s easy to check that fact_core() and fact() do have these properties:

> fact_core(fact)(5);
120
> fact(5)
120

This means that the following expressions are all (mathematically) equivalent to the function fact:

  • Y(fact_core)
  • fact_core(Y(fact_core))
  • fact_core(fact)
  • fact_core(fact_core(fact_core(Y(fact_core))))

References