ESP Schachzwerg chess computer

Schachzwerg translates to "chess dwarf".

@author Andre Adrian
@date 2019-08-20


The chess game was once the "Drosophila melanogaster" of the artificial intelligence (AI) research. The creator of an AI-program is intelligent, but is the creation intelligent, too? This is a statement from L. Torres y Quevedo, the creator of the first chess playing machine, to a Scientific America reporter in 1915:

There is, of course, no claim that it will think or accomplish things where thought is necessary, but its inventor claims that the limits within which thought is really necessary need to be better defined, and that the automaton can do many things that are popularly classed with thought. It will do certain things which depend upon certain conditions, and these according to arbitrary rules selected in advance.

One school of thought simply says: If it passes the Turing test, it is intelligent. The other school of thought says: We have no good definition of intelligence, how can we have a definition of artificial intelligence?
What is the purpose of this endeavour? First to have fun and learn something new. Second to verify or reject the "small is enough" idea. Third to create the philosophical question: How can a creator with a chess performance rating of 1000 Elo create a chess computer with a chess performance of 1500 Elo (yes, this is the design goal).
The predecessor of ESP Schachzwerg and Super ESP Schachzwerg is my AVR MAX Schachzwerg chess computer from 2009. The magazine Elektor provided a kit. A PCB, an ATMega88V SoC (System-on-a-Chip), some push-buttons, four 7-segment LEDs and two AA batteries . Performance is 1290 Elo out of 8MHz 8-bit CPU, 8KByte ROM and 1KByte RAM. Because of RAM limitations a non-recursive version of the search algorithm was used. The program is based on Micro-Max from H.G. Muller.

Small is enough

The program of ESP Schachzwerg should work like an insect brain or a hummingbird brain: a little animal can only afford a little brain. But because the little animal has to live in the same complex environment as bigger animals with larger brains it can not afford to be stupid. Scientific American MIND July/August 2017 magazine wrote: "How could such a tiny brain perform as well as a bigger one? Through vicious, cutthroat evolutionary efficiency". The same "small is enough" principle is used for ESP Schachzwerg. The program uses primitive evaluation for a deep search within the limitations of the hardware.

Computer chess program Tell, 1974

The typical computer for a chess program in 1974 was an IBM 360/168 mainframe. But Johann Joss used a HP2115A mini computer. This 16-bit computer was on the market since 1967. The maximum core memory amount was 8KWord, that is 16KByte.

Picture: The HP2115A cabinet. Another cabinet the same size housed the power supply.

With strong competition in Stockholm 1974, Tell got rank 11 out of 13. In Dortmund, 1975, competition was weak and Tell got first rank! In Toronto 1977 Tell only got last rank The strong competition at this time was Chess from U.S.A. and Kaissa from UdSSR. This was the time of cold war at the computer chessboard.

1974 Stockholm
1975 Dortmund
1977 Toronto
Chess 4.6
Chess 4.0
Schach MV 5,6
Orwell III
Black Knight

The chess computer Belle was special. It was a PDP-10 together with special hardware. Belle terminated the reign of the mainframe chess dinosaurs. Orwell III was written by Nitsche. Nitsche and Henne and the Mephisto chess computer made Hegener & Glaser an important chess computer company.

The ESP Schachzwerg hardware

The ESP8266 SoC from Espressif was designed as WLAN-bridge, an IO-chip that gives another CPU WLAN (WiFi) capabilities. That is, it was the chinese low-cost answer to the Nordic nRF24 chips. The big difference is that the ESP8266 CPU can execute user programs next to the protocol stack firmware. The ESP8266 uses an 80 MHz 32-bit RISC core. The program execution is somewhat special. The program is read from an external FLASH memory into the 32KByte execution cache/32 KByte execution memory. A jump outside the 32KByte segment triggers a load from FLASH. This is slower than having "real" memory for program memory, but cheap. The RAM is 80KByte "fast memory" plus 16KByte non-volatile "slow memory".
The user interface is done very modern with a touchscreen LCD. A stylus is used as in the old Palm PDA handheld computers. The LCD is of good quality TN technology with 6-bit color depth per primary color, LED backlight and 240x320 pixel resolution.
The ESP8266 and the touchscreen LCD are available as ready-made breadboard modules. The breadboard has 2.54mm (1/10 inch) contacts. After hardware development on breadboard, a simple PCB can replace the breadboard. The author choose a nodeMCU clone ESP8266 module that provides USB port, USB-to-UART bridge and low-dropout 3.3V voltage regulator. The touchscreen LCD has SPI interface for LCD-controller and touch-controller. The controller chips ILI9341 for LCD and XPT2046 for touch are fully supported by the IDE.
The brightness of the LCD backlight is selected with a resistor. A 33Ω resistor is used. The ESP8266 and other components are 3.3V devices. Therefore a 3.6V lithium battery in AA form factor is used. A 5V USB power supply works as well. An on/off switch, battery holder and insulated wire completes the hardware BOM (bill of materials).

Top picture: nodeMCU ESP8266 clone
Second picture: 2.8" touchscreen LCD ILI9341, XPT2046
Bottom picture: 400 pins breadboard

Hardware connection

The SPI bus Clock-Out, MISO and MOSI signals are connected to ESP8266 pins D5, D6 and D7. These signals go the the LCD and to the touchscreen in parallel, that is ESP8266 Clock-Out is connected to LCD Clock-In and Touch Clock-In. To separate the two devices LCD and touch, every device has a Chip Select (CS) pin. The LCD CS is connected to ESP8266 pin D3, the touch CS is connected to D8. The LCD can receive data or command codes. The code type is transmitted via the LCD Data Command (DC) signal which is connected to ESP8266 D4. The Reset pin of the ESP8266 is connected to the LCD reset pin. And last but not least, the LCD backlight pin is connented via a 33Ω resistor to 3.3V. The power supply pins 3.3V and GND are routed between ESP8266 and touchscreen LCD module, too. The on/off switch is traditionally placed in the 3.3V wire between battery and circuit. But you can also place it in the GND wire. As an old telephone technician I could tell you why this is the correct way to do things, but our little system has no earth connection and therefore this information is not needed.

D8 Touch CS
Touch CS

The SPI bus uses serial data transfer. The ESP8266 is the bus master. Only the master provides the clock signal. The "Master Out Slave In" MOSI pin is an output at the ESP8266 and an input at the LCD or touch. The "Master In Slave Out" operates in the other direction. The SPI bus is really primitive, and best of all does not need royalty payment.

The Super ESP Schachzwerg hardware

The nodeMCU ESP8266 can be replaced by an ESP32 module. The ESP32 is the slightly more expensive "bigger and better" brother. It has a 160MHz CPU and 520KByte RAM. This big RAM allows a powerful transposition table. The 2338 Elo monster "Mephisto Genius 68030" from Richard Lang has 512kByte transposition table RAM. Having the same amount of RAM will not give automatically the same Elo rating. But a Super ESP should have 1700 Elo. The Spracklens did 1755 Elo in 1983 with a 3.2MHz 8-bit 6502, 24KByte ROM and 3KByte RAM.

The ESP Schachzwerg IDE

The integrated development environment is as important as the hardware for the success of any project. I choose the Arduino IDE with the necessary enhancements for ESP8266, ILI9341 and XPT2046. Being a C programmer since 1987 on Atari ST and a C++ programmer since 1990 on IBM clones, using C/C++ was a no-brainer for me.

The ESP8266 protocol stack firmware

The protocol stack firmware runs always in the ESP8266. If you compile your user-program in Arduino-IDE, the firmware is linked in. At least every 10ms the firmware has to be called. If not, a watchdog timer interrupt happens. Placing a yield() call at a convinient position in the search algorithm is all that is needed to keep the firmware watchdog happy. The watchdog is active even after the WLAN stack is powered down to save battery power. The following C source code shows the WiFi switch off:


King Rook - King endgame algorithm

In 1915, the Scientific American magazine reported about the mechanical/electromechanical chess automaton from L. Torres y Quevedo. The best part of the article is the description of the algorithm. We can easily implement the algorithm as a software program instead of a hardware automaton. Here is the algorithm:

The algorithm is a simple state machine. The state is position of white_king, white_rook and black_king. The algorithm transforms the state into six different non-overlapping cases. These six cases have some sub-cases which get executed:

1 carries castle to column A,
l' carries the castle to column H,
2 carries the castle downward one square,
3 carries the king downward one square,
4 carries the king one square to the righ t,
4' carries the king one square to the left,
5 carries the castle one square to the right,
5' carries the castle one square to the left,
6 carries the castle downward one square.

This is really a simple algorithm and I can not imagine a more simple algorithm. No recursive search, no Megabytes of hash memory and not Gigahertz of CPU speed needed. For me this is strong indication for "small is enough". The today available 4-pieces and 5-pieces endgame databases work the same. A simple look-up in these large data files gives perfect chess play. No thinking needed, just recall. An opening book allows the best play according to current knowledge. But new chess knowledge can invalidate an old opening book. If we ignore this detail, we can say: playing an opening is recall, too.
The case 1 has a problem. If white_king and white_rock have the same row, the rock move may be impossible because the king can block the movement to A column or H column. Therfore we change the cases into:

1 carries castle to column A if king is not blocking, else next to the king,
l' carries the castle to column H if king is not blocking, else next to the king,

And here is the original hardware implementation:

The chessboard is a small square part below the weights in the left picture.

The inverse move generator (IMG)

The normal move generator starts with the from-field and advances to the to-field. The inverse move generator starts with the to-field and searches the to-field. Because the move-pattern of the piece on the from-field is not known, a super-piece move-pattern is used: the pattern of pawn, knight, bishop and rook combined. The king-pattern is a one-step bishop/rook pattern and the queen-pattern is a long-range bishop/rook pattern. The strongest capture move is the capture of the king. Therefore the IMG starts with the to-field of the king and goes down the material ranking (queen, rook, bishop, knight, pawn). The IMG can search the pieces on the chessboard. The better solution is to have a pieces-list. The pieces-list has 32 entries for the 32 chess-men. Every piece has its own location. The white knight that started the game on c1 will remain white_knight1 for the whole game in the pieces-list. As the game progresses, entries in the pieces-list will become nil, that is the piece is gone to chess-heaven. Here is the pieces-list:

wPn2 wPn3 wPn4 wPn5 wPn6 wPn7 wPn8
bPn2 bPn3 bPn4 bPn5 bPn6 bPn7 bPn8

This pieces-list can not handle two queens on the chessboard. If the opponent gets two queens, the program resigns.


A chess move consists of white piece movement and black piece movement. A ply is a half move. I use ply and bit for singular and plural. I am an engineer. I say 3 Volt and not 3 Volts, because I write 3V and not 3Vs. Because 3Vs means 3*Volt*second for me. But you can do as you like!

The move generator (MG)

The move generator is the oldest part. The Claude Shannon "random mover" was a move generator and a select a move by chance. Shannon wrote in 1949: "The writer played a few games against this random strategy and was able to checkmate generally in four or five moves (by fool's mate, etc.)". The move generator has the problem of illegal moves: your king is not allowed to move thru a field that is under enemy control while doing the castling. Even more simple: the king is not allowed to move himself into check. The solution of these problems are: we ignore the castling case. The king moves himself into check is handled by the search-tree. In the next ply the tree-search will recognize win, threrfore it will avoid this part of the search-tree. The move generator produces pseudo-legal moves.
The most simple move generator starts with field a1 and walks column-first thru the chessboard: a1, a2, a3, ... h8. First variant is to use a1 ... h8 for white and h8 ... a1 for black. Second variation is to start in the middle of the chessboard and go d4, d5, d6, ... d3. use the pieces-list. Third variant is to go in a spiral: d4, e4, e5, d5, c5, c4, ... a8.
Our MG does not generate capture moves. This is the job of the IMG.
The MG uses a chessboard representation. The most space efficient is the 8x8 board. A better space/time trade-off has the 10x12 array (mailbox) or the 128 fields 0x88 chessboard. The mirco-Max program used a 129 fields 0x88 chessboard. The last field is to "dump" garbage, thst is to always have a destination field. I think I stay with the 129 fields 0x88 chessboard.

Do move, undo move

There is one chessboard and there is one pieces-list. And there is a search-tree that is 20ply deep. How does this fit together? We need a move-stack. We have a "do move" operation that executes the move and pushes the undo-information on the move-stack. And we have a "undo move" operation that brings the chessboard and pieces-list back to the previous condition.
The do/undo-move part of the chess program is a nasty piece of software. It is executed of every search-node, that is hundred thousand times or even a million times for every move the computer calculated. Therefore execution shall be quick. But there are nasty details like undo a castling or undo an en passant capture or undo a promotion of a pawn to a queen (queening). Or even the simple question can the king still do a castling or did he move already e.g. e1-d1-e1? The good thing is that we can test the do/undo-move very easy. Even if we need hundreds of test-cases, we should not progress to advanced topics of our chess program before the do/undo-move is not rock-solid and fast.

Material balance

Normally the do/undo-move maintains the material-balance (MB). Depending of how much chess-knowledge is in the program, the material-value (MV) of a piece is a constant or a variable. The traditional values are Pn=1, Kt=Bp=3, Rk=5, Qn=9 and Kg very high, e.g. Kg=100.
A pawn on seventh row can have the value of a rook. An end game king and two knights is a draw, an end game with king, knight, bishop is at least in theory a win. If we have two knights and one bishop it makes sense to increase the MV of the bishop.
The MB is calculated incrementally. If you loose a pawn, you subtract the actual pawn value. If the pawn moves to seventh row and gets the value of a rook, you add the difference of new_pawn_value and old_pawn_value to the MB.
Like the do/undo-move calculation, the material balance calculation should be rock-solid and fast.

Piece-Square Tables

There is a piece-square table for every kind of chessmen. The piece-square table "nudges" the balance to good piece positions on the chessboard. A pawn piece-square table can keep the king shield intact and suggest a center pawn development before the development of other pawns. David Kittinger started with "pre scan heuristics", that is use a chessboard evaluation, the oracle, to calculate the piece-square tables. In mid game the king should hide behind the king shield, but in end game the king should go to the center of the chessboard.
Because the piece-square tables do not change with search depth, a deep search can cross the mid game to end game "border" and will have the wrong piece-square table for the deeper search levels. One possibility is to have two different piece-square tables for every type of chessmen and "slide" between the tables, that is give a value of a*table1+(1-a)*table2 with a the "sliding" factor between 0 or 1.
ESP Schachzwerg will use another algorithm. First the "ESP oracle" defines the piece-square tables. Then it uses for the first M searches in the iterative deepening the material balance plus piece-square tables balance and for the deeper searches only the material balance. Real material gain is best for ESP Schachzwerg and can be calculated without piece-square tables balance. The danger of wrong piece-square tables balance is small for a shallow search.
The score for depth M search with piece-square tables balence and for depth N search without piece-square tables balance are stored. ESP Schachzwerg plays the best move of all. If there is a sure deep search material gain, e.g. a pawn fork, it will play this, else it will play the shallow search development move.

Material and piece-square tables balance

The piece-square tables have "nudge" values in centi-pawns. The chessmen values in centi-pawns are Pn=100, Kt=320, Bp=330, Rk=500, Q=900, Kg=20000. A 16-bit signed integer is used for the material balance. Typical piece-pawn tables "nudge" values are -50, ..., -5, 0, 5, ... 50, that is between a 1/20 of a pawn up to 1/2 of a pawn.

Position hash

Usually the do/undo-move maintains the position hash. See transposition table heuristic for details. The current plan is to use Zobrist hashing and no assembler programming.


The recursive NegaMax or NegaScout algorithm is for many programmers conplicated stuff. The good news is: We do not need to understand it! We set the piece values, start the search and get the best move as answer. Okay, for space-saving we convert the recursive algorithm into an iterative algorithm and maintain the local program stack-variables in an indexed struct. And we want to show the principal variation to the user, that is the chain of ply the computer is most afraid of.
The simple fact that if we have a search depth of 20ply, but we have a check-mate after 5ply is a special case for the search. But don't worry. I have seen other programmer's chess program crash, so why not allow our program to crash, too?
Okay, this is the ugly truth: our search-tree explodes. Even if we have a branching factor of only 6 because of NegaMax or NegaScout search, the result is 6^ply. 6^10 is 60 millions, 6^20 is 3656 trillions for the "short scale" trillion.

Transposition table heuristic

We get the same position if we start the game with 1. e4 e5, 2. d4 d5 or if we start with 1. d4 d5 2. e4 e5. This different order of moves but same position happens over and over in the search-tree. The transposition table (TT) helps to avoid duplicated work in the search. But there are ugly details. The NegaScout algorithm has a start-window that is not -infinite/+infinite. The TT results are only valid for "more narrow" start-windows. One solution to this TT problem is to use NegaMax together with TT.
The "good old" chess computers like the "Mephisto Risc 1MB" from 1992 had 1MByte of TT memory. Our ESP8266 has 80KByte, and we need 2KByte for the do/undo-move and search-tree and XKByte for the user interface. You want to see nice graphics, yes? The ESP32 is much better for TT.
The TT is not filled depth-first, but it is filled wide-first. We want to have all transpositions on depth-1, depth-2, depth-3 until the TT memory is exhausted. One possibility is that more shallow transpositions will overwrite deeper transpositions. Another method is to allow fixed sizes of TT memory for depth-1, depth-2 and so one. In this case there is no need for the individual transposition to carry the depth information.
The indentification of a position is done with a "hash". Hashing uniformly distributed data is usualiy done with division remainder or mod. The chess position is interpreted as a long number. This long number is divided by an 32-bit prime number. The remainder is a good hash.
For me this long division should be done in assembler. Normally a N bit CPU has a 2*N bit divide by N bit operation with a N bit quotient and N bit remainder. Such an operation is normally not available in C language. It was avaiable in the Forth language.
Another, less powerful, hash is the Zobrist hashing. The good thing about this hash is that you can calculate it incrementally. Like the material-balance you only have to add or subtract the change on the chessboard.
The disadvantage of hashing are hash collisions. Because the hash is much smaller than the original data, several different chess positions have the same hash. Konrad Zuse did a little calculation in 1941. You need approximate 4 bit (exact 3.7005 bit) to express the 13 separate inhabitants of a field: wKg, wQn, wRk, wBp, wKt, wPn, bKg, bQn, bRk, bBp, bKt, bPn and empty. You have 64 fields, that is the information of a chess position is 64*4bit=256bit. A typical mod-hash is 32-bit long, a typical Zobrist-hash is 64-bit long. You can imagine how many hash-collisions are possible! Because of this I call it TT heuristic. Thank goodness, the chessboard is only sparsely populated. At least half of the fields are empty. The end game profits most of TT.

Picture: Konrad Zuse's calculation of the information in a chess position.

The TT memory helps in detection threefold repetition. If all played-out positions are "hashed" and we have two hash collisions for the same hash, the same position has occured three times.
If a piece moves away from "its" field and later moves at the same field, and the adversary does the same, we have the same position in two different depths of the search tree. The back-and-forth movements are don't care in most cases. But moving the king back-and-forth before castling makes the castling impossible. Just directly after the two-square first move of the pawn is an en passant capture legal. Again we can have a legal en passant capture or an illegal en passant capture, if we insert back-and-forth movements. Maybe I ignore these nasty details - every chess computer needs an "identifying" bug. Okay, thinking time is over. The pawn that can be captured in the next ply by en passant is not a wPn or bPn but a wPnep or bPnep. There are eight wPnep positions and eight bPnep positions. And the chess rules tell the location the adversary pawn has to hit to make the en passant capture. The Kg "inhabitant" gets differentiated into KgC, wKg, bKg. The KgC is a king before his first move. It has no color information, but the position e1 or e8 tells the color. After the first move the KgC becomes a wKg or bKg. With these additions we need now exactly 256bit or 32Byte to describe the chess position. Konrad Zuse was right!

Iterative deepening

Chess is a game against your opponent and against time. You have this nasty chess-clock. The iterative deepening in the chess program helps the computer to stay in its time budget. If after the ply_N search is done there is time left, the ply_N+1 search starts. But the iterative deepening does more. It allows to start the ply_N+1 search with the best ply_N move. This gives better cut-off or a smaller branching factor. And there is another advantage. If we have a maximum search of 20 ply, but we have a mate-in-5, the iterative deepening stops after depth 5. The game is over - at least for the computer. Therefore iterative deepening makes it unnecessary to report win-in-N or loose-in-N. The terminate condition for the search is "if there is no more time or if there is a win". Last but not least does iterative deepening tell us the size for the transposition table (TT). After depth_1 search we have used this amount of memory, after depth_2 we have used that amount of memory and depth_3 TT can only use the free transposition table memory.

Maximum search depth

What is the maximum search depth? A chess game can last 80 moves or 160 ply. But the combinatorial explosion kills us long time before. We can learn a little from David Kittinger. Here are some results of some of his chess computers:

max. Ply
Novag VIP
Novag Super VIP
Novag Jade II
Novag Turquoise

Poor David had a handicap. All presented chess computers use the H8 SoC. The H8 SoC has only 1KByte RAM. He could use the RAM for search-depth or for chess-knowledge. In his career he changed first from search-depth to chess-knowledge and later back. Our conclusion: A maximum search-depth between 16 and 20 ply should be fine for ESP Schachzwerg. Lets give the ESP8266 18 ply and the ESP32 20 ply. In 1990 Ed Schröder gave the "Mephisto Polgar 6502 10 MHz" chess computer a maximum search depth of 30 ply. I think this was overkill. The Ed Schröder computer of the following year, "Mephisto Milano 6502 5 MHz", has 22 ply maximum search depth.

Permanent brain

Permanent brain describes the continous working of the chess computer. While the chess-clock ticks away your time, the chess computer uses the principal variation, that is the move the chess computer expects from you, to "think ahead". If you play the expected move, the chess computer has already perform a ply_N search and can continue with the ply_N+1 search, that little bastard! If you make another move than the "feared", the chess computer starts a new search at ply_0. But this is no problem. You did not play the strongest move, but a weaker one - at least in the opinion of the computer. Permanant brain is the holy grail for ESP Schachzwerg. Hopefully it is not too complicated for my little 1000 Elo brain!

Null move and other forward pruning heuristics

Forward pruning like null move does reduce the branching factor, but for a price! Even the forward pruning fans agree on this: "To reduce the chance that Zugzwang will cause a problem, null move forward pruning is not done in the end game." A scholar's mate is a checkmate in the opening. I do not want to forward prune away something like this. The old chess computer game comments are full of "why had the program not seen the checkmate-in-N?". Because the program had forward pruned away the relevant sub-tree!
I know that debugging do/undo-move, material-balance and Zobrist hash is hard. I assume that debugging forward pruning is impossible.


The old mainframe chess programs used 64-bit bitboards. I do not understand bitboards and I have not heard that the professional 8-bit chess computer programmers like the Spracklens, David Kittinger, Ed Schröder, Thomas Nitsche, Elmar Henne, Kaare Danielson, Ricard Lang or Frans Morsch have used bitboards. The ESP8266 and ESP32 are 32-bit RISC CPUs. But I treat them as powerful 8-bit CPUs. One important reason for mainframe bitboard use was that the old-and-ugly beasts could not address space in chunks of 8-bit. Bitboards are a work-around for mainframe deficiencies.
For the same odd reason we have "packed char" in good old Wirth Pascal. On the Control Data Corporation CDC6000 60-bit word size mainframe, one word could hold one floating point number or 10 packed 6-bit char. This was the architecture of Wirth's first Pascal computer and the reason why Pascal is all capital letters. Or on a 16-bit word machine an unpacked char used a word (16-bit) and a packed char only 8-bit. By the way: knowing this makes me really feel old. There is no Wikipedia entry about "packed char", but the cited Pascal reference book is a newbie from 1983! I have seen core-memory computers in action like the Siemens 330 or Raytheon RDS-500 in the system DERD-MC. And the Telefunken TR-86 was even older. It had the same small scale integration technology as the Apollo guidance computer.

About the author

I work as Referent for the german air traffic control. We do our operational business with Linux operating systems and mostly C/C++ programming. The ops business uses redundant real-time systems with measured availability of 99.999%, running 24/7 every day of the year. This is 5 minutes not-planned down-time per year. I have a hardware background as Fernmeldeelektroniker and a software background as Diplom Informatiker(FH). I am a ham, too. Callsign is DL1ADR. This qualifies me as triple nerd™. But I paint with acrylic colors and I have a big interest in dancing and Kamasutra. These are my non-nerdy facets. I have three children, two are already adults. My e-mail address is: