How Reversi AI Works: Minimax, Alpha-Beta, and Evaluation Functions

Understand how Reversi AI works — from minimax search and alpha-beta pruning to evaluation functions, endgame solvers, and modern neural network approaches.

Reversi AI works by searching future move sequences using a minimax algorithm, pruning losing branches with alpha-beta pruning, and scoring positions with an evaluation function that weights corners, mobility, stable discs, and frontier discs. In the endgame (last 20–25 moves), strong AI switches to a perfect solver that calculates the exact disc outcome for every possible line of play. For how this relates to solving the game, see Is Reversi solved?

The Core Algorithm: Minimax

What Minimax Does

Every position in Reversi has a set of legal moves. Each of those moves leads to a new position, which has its own set of moves, and so on. This creates a game tree — a branching structure of all possible futures.

Minimax navigates this tree with one assumption: both players play optimally. From any position:

  • The current player (the maximiser) picks the move leading to the best outcome for them
  • The opponent (the minimiser) picks the move leading to the worst outcome for the current player

Working backward from evaluated leaf nodes (positions at the bottom of the search), minimax assigns values to every node in the tree. The root node (the current position) ends up with a value — and the AI plays the move that leads to that value.

A Simple Example

Imagine Black has two legal moves: A and B.

  • Move A leads to a position where Black’s best achievable score is +6 (60% of discs)
  • Move B leads to a position where Black’s best achievable score is +4 (55% of discs)

Minimax selects Move A. But both values were computed by also simulating White’s optimal responses — so they represent guaranteed outcomes, not just wishful thinking.

Search Depth

The AI doesn’t search the entire game tree — that would take too long. Instead, it searches to a fixed depth (number of moves ahead) and evaluates the positions at the bottom of that search using an evaluation function.

  • Easy AI: Searches 2–3 moves ahead
  • Medium AI: Searches 4–6 moves ahead
  • Hard AI: Searches 8–12+ moves ahead (plus perfect endgame solving)

The deeper the search, the more accurately the AI can predict the consequences of its moves — and the stronger it plays.

Alpha-Beta Pruning: Searching Smarter

A naive minimax implementation searches every possible branch of the game tree. Alpha-beta pruning makes the same algorithm much more efficient by skipping branches that can’t change the outcome.

How It Works

The algorithm maintains two values:

  • Alpha: The best score the maximiser (current AI) has found so far
  • Beta: The best score the minimiser (opponent) has found so far

When the algorithm finds that a branch will result in a score worse than what’s already been found elsewhere — it stops searching that branch. The branch is “pruned.”

The result: Alpha-beta pruning reduces the number of nodes evaluated by approximately the square root. In practice, this means the AI can search roughly twice as deep in the same time compared to naive minimax.

Move Ordering

Alpha-beta pruning is most effective when the best moves are searched first. If the AI examines promising moves early, it establishes tight alpha and beta bounds quickly, allowing more pruning later.

Strong Reversi AI uses move ordering heuristics — rules that estimate which moves are likely to be best and search them first. Common heuristics:

  • Search corner captures first
  • Search moves that avoid X-squares next
  • Order remaining moves by estimated position value

Good move ordering can make alpha-beta pruning cut 80–90% of the search tree, enabling much deeper searches.

The Evaluation Function

When the AI reaches the bottom of its search tree (the depth limit), it must score the position without searching any further. This is the evaluation function — the AI’s model of what a position is worth.

Key Components

1. Corner control (highest weight) Corners receive extreme positive weight when taken, and extreme negative weight when given to the opponent. A single corner can outweigh almost any other positional factor. See the board values guide for how human players use the same square-weighting logic.

2. Mobility score The difference between the number of moves available to each player. High relative mobility is strongly positive — it means the opponent has fewer options.

3. Stable disc count Discs that can never be flipped (corners, full edge rows, etc.) receive strong positive weight. Stable discs guarantee those points in the final count.

4. Frontier disc count Discs adjacent to empty squares are vulnerable targets. Fewer frontier discs is better — the evaluation function penalises high frontier counts.

5. Square position weights Each square has a static weight (corners ~+20, X-squares ~−6, edges ~+2, interior ~0). These weights are added based on which squares each player occupies.

6. Disc count parity (endgame only) Disc count is weighted near zero in the early game and becomes the primary factor in the endgame.

Weight Tuning

Different Reversi AI programs use different evaluation function weights. Weights can be:

  • Hand-tuned: Human experts estimate which features matter most
  • Self-play trained: The AI plays millions of games against itself, adjusting weights based on outcomes
  • Learned from databases: Weights derived from analysing thousands of high-level human games

The famous Logistello program (1990s) introduced the use of large feature databases trained on game positions — a major advance over hand-tuned functions.

Endgame Perfect Solving

The most important distinction between strong Reversi AI and weak AI is endgame solving: the ability to calculate the exact final disc count for every possible move sequence.

When Endgame Solving Activates

When a certain number of empty squares remain (typically 20–25, depending on the program’s speed), the AI switches from heuristic evaluation to exact computation. It searches every possible sequence of remaining moves and finds the one that maximises its final disc count.

This is possible because:

  • With 20 empty squares, there are at most ~10^20 positions to consider from that state — manageable with efficient algorithms
  • The evaluation is exact (count the discs) rather than approximate
  • Move ordering and pruning techniques work especially well in the endgame because strong moves are more predictable

The result: strong Reversi AI plays the endgame perfectly. If you’re losing in the final 20 moves to hard AI, it is not making mistakes — every response is optimal.

Endgame Solving Techniques

  • Negamax: A simplified formulation of minimax used for endgame calculation
  • Fastest-first heuristic: Play moves that result in the fewest opponent responses — reduces branching factor dramatically
  • NWS (Null Window Search): Search with narrow windows (alpha = X, beta = X+1) to quickly determine if a move is better or worse than a threshold
  • Transposition tables: Cache previously computed positions to avoid redundant calculation

Modern Approaches: Neural Networks

More recent Reversi AI implementations borrow from the success of DeepMind’s AlphaZero (originally for Chess and Go) and apply neural network-based evaluation.

How it works:

  1. A neural network is trained by self-play — the AI plays millions of games against itself
  2. The network learns to evaluate positions without hand-crafted features
  3. It combines with Monte Carlo Tree Search (MCTS) for move selection, rather than pure minimax

Advantages over traditional AI:

  • Can capture complex positional patterns that hand-tuned evaluation functions miss
  • Improves automatically through continued self-play
  • Doesn’t require human expertise to design features

Current status: Neural-network Reversi AI is an active research area. Traditional alpha-beta programs with strong evaluation functions still perform very competitively, particularly because Reversi’s perfect endgame solving is so effective.

What This Means for Human Players

Understanding how Reversi AI works has practical implications:

The AI is not “cheating” — it plays the same game you do, with the same rules. It just searches deeper and evaluates more accurately.

Endgame is a lost cause against strong AI — once the endgame solver activates, you will not out-calculate it. Focus on entering the endgame with a positional advantage instead.

The AI’s weaknesses are in evaluation, not calculation — hard AI doesn’t make tactical mistakes. But its positional evaluation can occasionally produce counterintuitive moves in complex midgame positions. This is where human creativity sometimes finds moves the AI underestimates.

Studying AI games teaches you the same principles the AI uses — corner control, mobility, stable discs, frontier minimisation. The AI’s evaluation function is essentially a formalised version of what every Reversi expert already knows.

Frequently Asked Questions

How does Reversi AI work?

Reversi AI uses a minimax algorithm to search possible future moves, evaluating positions based on corner control, mobility, stable disc count, and frontier discs. Alpha-beta pruning eliminates losing branches early to search deeper within the same time. Stronger AI settings search more moves ahead and use more sophisticated evaluation functions.

What is minimax in Reversi?

Minimax is the core algorithm most Reversi AI uses. It builds a tree of all possible moves and responses, assuming both players play optimally. The AI maximises its own score while assuming the opponent will minimise it. The move at the root of the tree with the best guaranteed outcome is chosen.

What is alpha-beta pruning in Reversi AI?

Alpha-beta pruning is an optimisation of minimax that eliminates branches of the search tree that cannot affect the final decision. When the AI finds a branch that is provably worse than one already found, it stops searching that branch. This allows the AI to search roughly twice as deep in the same time.

How does Reversi AI choose its moves?

Reversi AI generates all legal moves for the current position, searches several moves ahead using minimax with alpha-beta pruning, scores each resulting position using an evaluation function (weighting corners, mobility, stable discs, etc.), and chooses the move leading to the highest-scoring position.

Why does the Reversi AI always take corners?

Corners receive the highest weight in the AI’s evaluation function because they can never be flipped — they are permanent advantages. The AI has this knowledge built into its scoring system, so any move that secures a corner scores extremely high, while any move that gives the opponent corner access scores very low.