Wednesday, January 29, 2020

The Case Against Powerpoint Presentations

This post has been migrated to a new blog! Read this and more here.

I am going to try digital blackboard presentations for a bit to see how that goes.

Kind of like this:

Excerpts from:
Information retention from PowerPoint and traditional lectures

"...use in university lectures has influenced investigations of PowerPoint’s effects on student performance (e.g., overall quiz/exam scores) in comparison to lectures based on overhead projectors, traditional lectures (e.g., “chalk-and-talk”)..."
"Students retained 15% less information delivered verbally by the lecturer during PowerPoint presentations, but they preferred PowerPoint presentations over traditional presentations."
Does a High Tech (Computerized, Animated, Powerpoint) Presentation Increase Retention of Material Compared to a Low Tech (Black on Clear Overheads) Presentation?

"The purpose was to determine if differences in (a) subjective evaluation; (b) short-term retention of material; and (c) long-term retention of material occurred with the use of static overheads versus computerized, animated PowerPoint for a presentation to medical students."
"There were no significant differences between the groups on any parameter. Conclusions: In this study, students rated both types of presentation equally and displayed no differences in short- or long-term retention of material."

Monday, January 27, 2020

Deep Blue to Alpha Go - What are the key changes?

This post has been migrated to a new blog! Read this and more here.

20 years after Deep Blue defeated the World Champion at Chess, Alpha Go did the same for the World Champion at Go. What are the key changes since then?

Deep Blue

"The system derived its playing strength mainly from brute force computing power. It was a massively parallel, RS/6000 SP Thin P2SC-based system with 30 nodes, with each node containing a 120 MHz P2SC microprocessor, enhanced with 480 special purpose VLSI chess chips."

To be fair, it's not just "brute force computing" it does Alpha-beta pruning with some neat heuristics programmed by the team - "Deep Blue employed custom VLSI chips to execute the alpha-beta search algorithm in parallel, an example of GOFAI (Good Old-Fashioned Artificial Intelligence) rather than of deep learning which would come a decade later. It was a brute force approach, and one of its developers even denied that it was artificial intelligence at all"

Here's the ground-breaking (back in the day) paper for Deep Blue -

Excerpts from:

"Humans have been studying chess openings for centuries and developed their own favorite [moves]. The grand masters helped us choose a bunch of those to program into Deep Blue."

"Deep Blue's evaluation function was initially written in a generalized form, with many to-be-determined parameters (e.g. how important is a safe king position compared to a space advantage in the center, etc.). The optimal values for these parameters were then determined by the system itself, by analyzing thousands of master games. The evaluation function had been split into 8,000 parts, many of them designed for special positions. In the opening book there were over 4,000 positions and 700,000 grandmaster games. The endgame database contained many six-piece endgames and five or fewer piece positions. Before the second match, the chess knowledge of the program was fine-tuned by grandmaster Joel Benjamin. The opening library was provided by grandmasters Miguel Illescas, John Fedorowicz, and Nick de Firmian." -

Answer to: How did Deep Blue advance from 1996 to 1997 in order to beat Kasparov? - "We did a couple of things. We more or less doubled the speed of the system by creating a new generation of hardware. And then we increased the chess knowledge of the system by adding features to the chess chip that enabled it to recognize different positions and made it more aware of chess concepts. Those chips could then search through a tree of possibilities to figure out the best move in a position. Part of the improvement between ‘96 and ‘97 is we detected more patterns in a chess position and could put values on them and therefore evaluate chess positions more accurately. The 1997 version of Deep Blue searched between 100 million and 200 million positions per second, depending on the type of position. The system could search to a depth of between six and eight pairs of moves—one white, one black—to a maximum of 20 or even more pairs in some situations."

AlphaGo Zero

The paper that the cheat sheet is based on was published in Nature and is available here

Some key assertions of the paper:

"Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules."

"Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo."

"Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) =fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass)"

"Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte Carlo rollouts. To achieve these results, we introduce a new reinforcement learning algorithm that incorporates lookahead search inside the training loop, resulting in rapid improve­ment and precise and stable learning."


"DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network." -

Effectively, rather than scoring using man-crafted heuristics (i.e. human gameplay experience), AlphaGo encapsulates game playing "experience" in the neural network. This effectively means that AlphaGo learns its own evaluation heuristic function.

The neural network:
  • Intuitively predicts the next best move based on the state of the game board.
  • Learns that intuition by playing many games with itself without human intervention.
  • Reduced the need for calculating ~200 million moves a second for an average of 170 seconds (average of 34 billion moves per move) to 1600 moves in ~0.4 seconds.
AlphaGo Zero took a few days to learn its "heuristic" function from tabula rasa in contrast to Deep Blue that had a database of chess moves from Grandmasters over the years.


Deep Blue versus Garry Kasparov - Game 6 Log as released by IBM: 

Additional Reads

Tuesday, December 18, 2018

Open AI Lunar Lander - Solving with Vanilla DQN (aka Reinforcement Learning with Experience Replay)

This post has been migrated to a new blog! Read this and more here.

It's alive!!!

The artificial brain figures out how to land consistently on target after the equivalent of 52 human hours of trying:

So what's the big deal?

I know it looks like a simple game, but trust me, it's not that simple.

Here's why:
  • It simulates real-ish physics - meaning there is gravity, momentum, friction and the landing legs have spring in them!
  • The engines do not fire consistently in the same direction all the time, if you look close enough you'll notice that the particles shoot out in randomly varying angles.
  • The artificial brain knows nothing about gravity or what the ground means. In fact, it does not even know that it's trying to land something!
  • All we give the brain is a score to work on. If it gets closer to the landing target, the score goes up and vice versa if it moves further away.
  • The score also goes down whenever an engine is fired, apparently we want the brain to be eco-friendly and not use fuel unnecessarily.

In the beginning...

It starts by just doing randomly choosing between:
  1. Fire Main Engine (the one below the lander)
  2. Fire Left Engine (to rotate clockwise and nudge it a little in the opposite direction)
  3. Fire Right Engine (does the opposite of Left Engine)
  4. Do nothing
Unsurprisingly the brain pilots the lander like a pro at this stage:

It copes with diverse situations

Everything the game starts the lander gets tossed in a random direction with varying force as a result, it learns to cope with less than ideal situations - like getting tossed hard to the left at the start:

Learns to stay alive

When it moves out of the screen or hits the ground on anything else but its legs, the score goes down by a hundred. Hence, very quickly (at just over 1 human hour of trying) it learns to stay alive by just hovering and staying away from the edges:

Finding Terra Firma

At around 11 human hours it starts to figure that the ground is kind of nice but drifts off to the right in the process (also notice that its piloting skills are a little shaky at this point):