Heuristics involve the study of methods and rules for discovery and invention[efn_note]G, Polya. (1945). How to Solve It. Princeton, NJ: Princeton University Press.[/efn_note]. In AI, heuristics involve defining rules for choosing branches in state space search that are likely to lead to acceptable solutions. This can be particularly useful in two situations. These are problems that don’t require an exact solutions due to the ambiguities in the problem statement or available data, and problems that do have an exact solution but the state space is exceptionally large and therefore computationally expensive.
Have no fear of perfection – you’ll never reach it.
Salvador Dali
There are well developed heuristics I’d like to talk about today, along with some examples of each.
Hill Climbing Heuristics
One of the simplest heuristics we can use is the hill climbing heuristic. This strategy involves picking the best child and searching that, without retaining any information on parents or siblings. This heuristic tends to search deep into the state space.
Some obvious problems with this strategy is the tendency to get stuck at a local maxima (or minima). This can often result in failing to find an optimal solution.

This method is often called greedy, as it simply takes the best option at time ignoring possible future gains. However this technique can be optimal in some situations. These situations typically tend to be when moves in state space can’t be undone. Think tic-tac-toe; not chess. We can add a heuristic to a tic-tac-toe game that picks the child that leads to to most possible wins. For example, the first move X would make is in the center. The center leads to 4 possible wins (2 diagonals, middle row, middle column), and no other move leads to 4 or more wins.

Dynamic Programming
Another creative way to reduce state space is to break the problem down into subproblems and reuse the subproblems that have already been solved. This technique is often called memoization (as in memo, not memorize!)
Dynamic programming is used to solve a large number of problems. Let’s think of a problem where we need to find the longest common substring (LCS). For example we have the arrays (1,0,0,1,0,1,0,1)
and (0,1,0,1,1,0,1,1,0)
, and we want to find the longest string of common bits. An example answer would be (1,0,1,0)
from (1,0,0,1,0,1,0,1)
and (0,1,0,1,1,0,1,1,0)
. But is it the longest?
We will first build a table.

This table was calculated row-wise from top to bottom. Row i = 0
, and j = 0
are trivial and filled with 0 (if either array is empty the LCS is obviously 0).
The first comparison is between X1 = 1
, and Y1 = 0
, and these don’t match. Both the cell above and to the left are 0, so the symbols (←↑) are inserted indicated either path can be taken, and the current LCS remains the same.

The second comparison is between X1 = 1
, and Y2 = 1
, and these do match. We add the (↖) symbol, to indicate we transition both row and column. The cell the arrow points to is 0, so we add 1 to that and our new LCS is length 1, with the element (1).
We continue this to fill out the table. Once the table is filled we can trace back to find the LCS. In the table we found the LCS to be length 6, with the elements (1,0,0,1,1,0)
. Note this is just one example of a possible sequence (1,0,1,0,1,1)
is another valid longest common subsequence, but the length always remains 6 for any LCS.

Best-first search heuristics
Best-first search allows for recovery from dead-ends and local maxima and minima through the use of a priority queue. This priority queue is an ordered list that places the most promising paths at the top of the list. Dijkstra’s algorithm, or the SPF (Shortest Path First) algorithm is the textbook example.
The algorithm works by maintaining a list of explored nodes. The starting node’s children are examined, and the algorithm simply chooses to go to the “closest” child and explore that node’s children. As it does this a list is maintained, and the algorithm always chooses to the shortest path so far. It continues iteratively until the goal node is found. Because this algorithm always chooses the best option first, it is also considered a greedy algorithm.
A* Heuristics
This approach is guaranteed to find the shortest path. However, some problems in this domain are NP hard, and a heuristic would help. Imagine you want to develop an algorithm that finds the best route from one location in a city to another. In a large city with multiple gridded roads, the algorithm would easily get lost in the multiple paths. The A* heuristic tries to inform the algorithm by including information like Euclidean, or Manhattan distances to the goal node. By including this information the algorithm can prioritize paths that are shorter, and paths that decrease the distance to the goal node.
A Common Lisp implementation of this can be seen here.
Thus ends my discussions specific to heuristics and search algorithms. I truly hope that you consider the abstract representations of these problems when solving real-world examples. Abstraction is the core of computer science.