Discovering Ada’s Bernoulli Numbers, Part 1
Have you ever wondered what the first computer program, designed by Ada Lovelace, was supposed to do? Hint: it didn’t print out “Hello, World!”. Ada was too hardcore for that. It printed out Bernoulli numbers. What the hell are Bernoulli numbers and why did Ada choose them as the subject for the first algorithm? I asked myself the same question:
A friend bought me the book (hence this blogpost) and I was surprised to find it was a blank notebook! I thought it would be a story about the gory details of the first algorithm. So looks like I’m going to have to write it myself. Starting with:
What the hell are Bernoulli numbers?
If you ask Wikipedia, it says it is this sequence of numbers:
What the hell. That’s the weirdest looking sequence of numbers I’ve ever seen. How can it jump to 0 and negative and positive and weirdo fractions. And why is it + or - 1/2 at the beginning?
Whenever I’m confused by something, I look up the history until it’s simple enough that I can comprehend it. Lo and behold, Jakob Bernoulli discovered these numbers when he was studying sums of powers.
Sums of powers? What are they? It’s these guys (where i goes from 1 to n):
The first line (of first powers) is the famous arithmetic progression, for which Gauss found the formula n(n+1)/2 when he was in primary school. There are formulas for all these sums:
But is there a formula for the general case, you ask? There are many — I even came up with one myself:
It’s pretty complicated and requires a fair bit of algebraic manipulation. It’s recursive (it relies on knowing the formula for the powers below it). Bernoulli came up with a non-recursive formula, which is pretty much the one in use today. From the article by the MAA:
Bernoulli derived symbolic formulas for the sums of positive integer powers using the method conjectured by Fermat, then noted a pattern that would make computation of these formulas much simpler
The method conjectured by Fermat makes use of the fact that every number in Pascal’s triangle is the sum of numbers diagonally up from it (the Hockey Stick rule), eg. 20 is the sum of the pink numbers diagonally above it.
Numbers in Pascal’s triangles have a combinatorial formula eg. 20 = (6 choose 3) = 6*5*4/3*2*1. When we combine these two facts we get the following rule for the aqua numbers:
Which can be rearranged to get a formula for the sum of 2nd powers:
Fermat’s method is simpler than my one, and as Bernoulli said “Thus we can step by step reach higher and higher powers”.
But what are the Bernoulli numbers?
I had been studying sums of powers, so as an exercise I tried to discover the Bernoulli numbers myself. I held off reading the last post in the MAA series (which doesn’t say exactly how he discovered them anyway), and only allowed myself the hint that he found them whilst applying Fermat’s method for sums of powers.
I spent a long time manipulating Fermat’s formulas, and whilst I discovered Stirling numbers (and came up with a nice formula for them), I didn’t see any Bernoulli numbers. I almost gave up, but I didn’t. I decided to try generating the formulas for each of the powers to see if I could recognise a pattern. Instead of using Fermat’s method, which involves a fair bit of algebra, I used the method on brilliant.org which is similar to Pascal’s method:
Take the difference of binomial expressions in the power above the one you’re interested in getting the formula for:
Apply the summation operator to both sides:
The left-hand side collapses down to n³ and the sum of 1 n times is n so
That’s a simpler formula to expand given we know the sum(i) thing is n(n+1)/2. And it’s easy to keep going for higher powers.
And when I started expanding these out, I noticed a recursive pattern. Let’s do it for the sum of 4th powers, given we have worked out the formulas for the sums of powers below it
We get this complicated looking thing, but it has a nice recursive pattern
If k is the power we want a formula for, the coefficient of the first term with the highest power is always 1/k+1. Let’s look at the last example for k=4. The first term is 1/5*n⁵. The next highest term only appears once, in the first bracket, so it’s coefficient is (5 choose 2)*(its coefficient in the previous sum, which is 1/4). The third highest term appears twice, so it’s (5 choose 2)*1/3 minus (5 choose 2)*(1/3 — which is the coefficient it had in the previous-previous sum). And the last so on until you get to the last power, which is always n. I probably lost you with all this text, but basically this recursive definition of previous sums and alternating combinatoric numbers means I could generate a table of these coefficients mechanically which represents the formulas of sums of powers. I started with 1 and calculated each row from the previous rows: each number was the result of an alternating sum involving the numbers in the diagonal, starting with the one nearest to it. The first number is always 1/k+1. Eg in the fourth row, the second number is 1/4*(6*1/3) = 1/2. The third number is 1/4(6*1/2) minus 1/4*(4*1/2) = 1/4. And the last number is 1/4*(6*1/6) minus 1/4*(4*1/2) plus 1/4*(1*1) = 0.
The Bernoulli numbers!! Yaaaaaas. I was pretty excited. Turns out the Bernoulli numbers are the coefficients of the last terms in the sums of powers formulas (which are polynomials). So the last line stands for the coefficients of the polynomial which is the formula for the sum of 6th powers:
Now we have a totally mechanical way to generate these formulas. I noticed some interesting properties. For example, the alternating sum of each row equals zero.
At some point, as I was calculating out by hand the 6th or 7th row, I noticed something else peculiar — that the Bernoulli numbers were showing up in the complicated sums I was doing to generate the numbers of each row. That’s when I realised I could just use the Bernoulli numbers (or the numbers in the outermost diagonal) as a shortcut to generate each row. Here’s that method for generating the 7th row:
Calculate the first 7 Bernoulli numbers by doing this elaborate alternating sum involving the combinatorial numbers and the previous Bernoulli numbers and dividing by 7:
Take the 7th row of Pascal’s Triangle:
and multiply them together and divide by 7:
E prego…these are the coefficients of the terms that you can add to get the sums of 6th powers. The way to get each coefficient [ n k ] where n is the power of interest and k is the position of the term (starting from 0) is
and you can sum these to get the sums of powers formula. Here is that formula from the Wikipedia page:
where the B numbers are the 0th, 1st, 2nd etc Bernoulli numbers. My formula for the nth Bernoulli number (which isn’t on the Wikipedia page) is
The shortcut I found using the Bernoulli numbers for generating the coefficients was just a coincidence unless I could prove they were equivalent. So I proved it.
So that was a huge friggin’ rabbit hole I went on to understand the subject matter of the first computer program. It has opened up so many more rabbit holes. There are so many more facts about Bernoulli numbers, and ultimately, I want to understand how Ada computed them and why she chose them for the first program. This quote, from a great write-up by Stephen Wolphram celebrating her 200th birthday, gives some hints
What’s become the most famous part of what Ada wrote is the computation of Bernoulli numbers, in Note G. This seems to have come out of a letter she wrote to Babbage, in July 1843. She begins the letter with “I am working very hard for you; like the Devil in fact; (which perhaps I am)”. Then she asks for some specific references, and finally ends with, “I want to put in something about Bernoulli’s Numbers, in one of my Notes, as an example of how an implicit function may be worked out by the engine, without having been worked out by human head & hands first…. Give me the necessary data & formulae.”
My guess is that the Bernoulli numbers were appealing because it’s not an explicit y = f(x) type calculation, and it’s more complicated than Fibonacci numbers because you need all previous Bernoulli numbers to generate the next one, that is, you need to hold a fair bit of state.
The next part of this series will explain what the weird formula with e in it on the front of my notebook means. I know it is a generating function but I don’t know what generating functions are yet.
If you’re curious about Bernoulli numbers, you’re in good company. I was happy to find that alongside Lovelace, both Conway and Knuth —also super famous programmers — have concerned themselves with this subject.
Here’s a youtube series Conway did on this last year
Here’s a textbook Knuth wrote where he explains how to derive Bernoulli numbers. (I haven’t read it yet — I’m excited! The pdf is free on a university site — google it).
Here’s a more advanced paper Knuth wrote about sums of powers (also a free pdf, I really can’t understand this one yet.)
And Laura Elizabeth S. Coen wrote her whole thesis on the history and use of Bernoulli numbers. I also plan to read that.
Meet you on the other side, Ada.
Oh also don’t forget that Europe is not the centre of truth and knowledge— Seki Kōwa in Japan independently discovered said numbers a year earlier than Bernoulli.
Happy Ada Lovelace Day from Melbourne, Australia, and thanks to @kaffeecoder for commissioning this post!