Intelligent Chess Opening

By Cyrus Ferdosi

Email: cyrusf@cogs.susx.ac.uk

Intelligent Chess Opening

ABSTRACT

INTRODUCTION

ANALYSIS OF TREE REPRESENTATION

ANALYSIS OF GRAPH REPRESENTATION

ROBUSTNESS

ACCESS EFFICIENCY

MEMORY REQUIREMENT

ALGORITHM

CONCLUSION

Computer Chess reading

Some Chess Opening

ABSTRACT

In the opening, each player tries to control the centre, set up a flexible pawn structure, develop the pieces rapidly and harmoniously, sometimes even go for direct attack. But there are so many complicated variations.

Contemporary computer chess programs use a preprocessed database known as the database of opening moves in order to take advantage of the initial moves which are already known to be sound and well thought of. Therefore providing both players with a good and solid foundation on which to start a game of chess. The opening database of moves that computers use is organised in a tree-like structure. The propos of this paper is to show that the same data can be represented in a graph structure which has all the advantages of the tree structure. Further more the graph representation of the data has a much greater superiority over the tree representation of the same data.

Back to top

INTRODUCTION

The faith of a chess program depends largely on the first 15 to 20 moves. These opening moves have been tried and tested for the past 100 years by the grand masters and proven to be good for both black and white. Almost every computer chess program uses a selection of these opening moves to aid the program with both saving time and avoiding potential danger in the initial phases of a game. If a program plays well in the begining of a game, it is most likely to maintain a good game and hence increase its chances of wining. As stated in the abstract, contemporay databases of opening moves are represented in a tree like fashion. In fact their representation is not truly tree like, to be more precise the opening lines of moves in these databases are represented as a set of paths in the tree of moves (Figure 2 is a list of paths representation of figure 1). Therefore the trees implemented in these chess programs and the trees used in other aspects of computer science, differ in the number of nodes in them. There are more nodes in a set of paths of a tree then there are in the same tree which has been represented as a set of paths. The number of lines of opening moves differ from database to database. Average programs have about 10 to 15 thousand lines of opening moves, in some programs these can increase up to 100000 lines.

The way these databases are used by most programs is as follows:

Suppose the computer plays black and its human opponent plays white. The opponent plays its first move, say K4 (it means the pawn in front of the king advances by two squares). For the computer to respond, it will have to check all the list of paths (starting from the first one) in its database which starts with the move K4. Next it chooses one of these paths at random and then extracts the move noted immediately after K4 in that path, say Nf6 (meaning black knight moves to square f6) and plays this move. Now, when white plays its next move, say Bc4, the computer will have to find a path in its database which starts with the sequence of the first three moves. i.e. All the paths which start with K4, Nf6, Bc4 and at random pick one of these paths, choose the move noted immediately after Bc4, and play that and so on ...

A demonstratation: Suppose the set of paths in figure 2 is the database representation of the tree in figure 1 in some computer program. The human player chooses to play move "a" of the figure 1. In the set of paths of the figure 2 all the paths start with move "a", so the computer can choose any of them at random. Suppose the computer chooses to play move "b", at this stage the human player plays move "f". Now, the computer will have to choose a path which starts with a sequence of moves "a","b","f", there are 3 of them. So, the computer chooses one at random and plays it. And so on ... Some programs might use a more intelligent method of choosing their move, but in general the above procedure is what most computer program use.

Back to top

ANALYSIS OF TREE REPRESENTATION

Conventional databases suffer from three major drawbacks.

Duplication: The bigger the tree of moves is, implies there are more duplicated nodes in the representing set of paths. As an example consider the tree in the figure 1. The same tree is also represented as a set of paths in figure 2. It can be seen that the root node (node a) is recorded once in figure 1, but the same node is repeated 13 times in figure 2. Through a short study of figure 1 and 2, one can conclude that the number of times each node of the tree in figure 1 is duplicated, is equal to the number of leaf nodes of the subtree for which the repeated node is the root node. Hence, if a database holds at least 10000 opening lines, it implies that the root node alone must be repeated 10000 imes, and so each subsequent node is repeated according to the number of terminating nodes in their subtrees.

Robustness: The program can prematurely exit the database of the opening moves. For example, suppose the human player reverses a move while still in the opening stages. To those who are familiar with chess, it is clear that such a line of play which reverses a move, can not be found in the book of the opening moves. Therefore it can't be found in the database of opening moves, (since a computer database of opening moves is only a copy of what human players have established as opening moves in the chess books) so the computer is bound to be forced out of its database, which is a disadvantage for a computer. The other possibility is that, in many occasions a player can temporarily come out of one opening line of play and after a few moves find himself in an other opening line, this happens quite often. This too would force a computer out of the opening phase, since such a sequence of moves can not be found in its database. Therefore, one can conclude that the computer's lack of robustness lies in the fact that the contemporary databases of opening moves assume such a sequences of moves. For technical reasons, as long as the databases of opening moves are represented as list of paths in the tree, this assumption can't be avoided.

Intelligence: The database does not have any means of recording of what has happened in previously encountered, similar board positions. In other words it does not possess any form of memory, so it can not learn from experience. The best they can be expected to do, is that their random selection method chooses a different line of play when they come across a similar board position in which they have lost previously.

Back to top

ANALYSIS OF GRAPH REPRESENTATION

Eliminating duplication: A graph representation of the database of moves contains no duplicates. In order to understand this in better detail it is necessary to high light one more drawbacks of the tree presentation. In chess, quite often one starts with an opening and all of a sudden finds himself in an other opening . This implies, that a lot of lines of play share the same sequence of moves for their latter parts. In other words there must be a lot of duplication of parts of the paths in the databases of opening moves. As an example, suppose that the sequence of moves "a","d","j" in figure 1. has taken place in a game and that the board position "j" is similar to board position "f", now the subtree of "j" must be a copy of the subtree of "f". In the figure 1., both subtrees are shown separately, but in a database you can imagine the line joining the node "d" to "j" would be joining the node "d" to the node "f" in other words, the subtree "j" would not exist. This would make the lines j-w, j-v, j-u in figure 2. equal to the lines f-o, f-n, f-m and their subtrees. The higher up this happens in the tree, the more duplicates are bound to be in the list of paths of the opening moves. However, a database organised in a graph structure, never suffers from duplication, there is only one node for a board position.

Learning: Because the graph representation of the database records each board position in a node it is possible for each board position to be accompanied with some extra information. A part of this information could be the history of that particular board position. For example, the program can know how many times a particular board position has been a member of a opening line which resulted in the loss of the games played previously. Or whether the board position was a member of a losing line the very last time it was played. If the computer loses a game, it can easily identify the board positions which were involved in the opening moves of the last game and therefore avoid choosing the same move when it come across the same board position next time. Or simply, it can learn from its mistakes. Alternatively, a number of previously played games can be fed to the computer to play through them with the aim of updating the relative counters. The point here is, as the program plays more, it learns to choose the strongest of opening lines. By removing the choice of board positions which lost their lost game the program provides the oppertunity for other moves to be given the chance. Hence one can expect some sort of evolution in the way the program chooses its openings. The more the program plays the more it finds out which opening lines are the strongest and play those more often. more over, the program is far less predictive, because it could use the information in the nodes of the database to decide not to play the same line of opening in the next immedaite game, in other words depriving the human player from repetitive and identical wins.

Back to top

ROBUSTNESS

Robustness: A graph database does not assume a sequence of moves. As long as a board position exist in the database it does not care how this board position has been arrived at. Therefore, a program that uses a graph database has no problem recognizing a board position. For example, in our analysis of robustness of a tree like database we said that if the human player reverses its move it can derail the program out of its opening, or if the program comes out of an opening it will not be able to recognize if it happens to find itself in an other opening where as these don't pose problems for a graph database. If a player reversed its move, by definition it must have reverses to a move which was previously in the database, therefore, it must still be in the database. As long as the same board position can be achieved with a single move, the program can always get back to it if it wishes.

Back to top

ACCESS EFFICIENCY

Access efficiency: In our analysis we didn't criticise the conventional databases for their access time. Conventional databases use some hashing functions which are reasonably fast. However, it can be shown that a graph database has a much better access time without even having any form of hashing function, it can learn how to access itself. This is how it works. Suppose it is now the computer's turn to play. The computer chooses and plays its move. Next it maintains a pointer to a section of the node which represents the current board position. Now, when the human player plays its move, there are two possibilities. Either the move chosen by the human player is in the database or not. If it is not in the database we can ignore it. However, if it is in the database, the program can now register the address of the move on the disk in the location where the pointer is pointing. Next time, when the same board position turns up the computer knows the precise address of the responding move on the disk. There are usually more than one responding move to each one of the opponent moves. So the program must update the registered address if the next response to the same move happens to be a bit higher up in the file. This should be done for both players. Due to the design of the graph database (which I shall explain later) the responding moves to an opponent move ought to be adjacent to each other in the same section. With this approach, even if the last registered address is not the exact address of the responding move, it must be either slightly higher up of the responding move or the address of the head of the section of the responding moves. In either case the result is far more efficient than any hash function one could think of.

Back to top

MEMORY REQUIREMENT

Memory requirement: It can be shown that we only need 35 bytes to indicate a move in a graph-structured database. For a node to exist, there must at least be one respond move in the database. Therefore, to register a move and a response move we need 70 bytes. Since each square could at maximum hold 8 different bits of information to indicate, whether there is a piece in a square or not, either a piece is black or white, and differentiate between the 6 different pieces. 4 bits can easily represent 8 binary information therefore 1 byte can represent two squares. We need 32 bytes to represent the board position for one move. Therefore, for two moves we need 64 bytes. Of the remaining 11 bytes, 1 byte is used to indicate the begining of each section (node), and five bytes are used as new lines. A further three bytes are used to register the address of the responding moves. The 16 remaining bits are used to hold the history of the responding move in this node. Since a graph-structured database uses 35 bytes to represent a move where as the tree representation of the same database uses only 4 bytes to represent the same move, at first it seems that the graph-structured database has just over 8 times greater disk space requirement. However, this is not the case. I converted 10000 lines of opening at 12 moves deep each line, from the GNU chess database. The result was two files each of which were about 7 megabytes. The reason that the final result was less than half of what was expected is because there were a lot of duplicated data in original data. The amount of duplicated data in 10000 lines of opening at 12 moves deep is considerably less than the amount of duplicated data one would expect in a much larger databases. So, as the database gets larger the difference between the two version of representation would reduce.

Back to top

ALGORITHM

The algorithm for building graph structured database as follows: First you need to have a program which can take a conventional database, plays each move of the database and writes the resulting board representation of the moves in two different files (say White.dat and Black.dat). Assume the first move which is being played through is a white one. The board presentaion of this move must be written in file White.dat. The board representation of the next move which is black must be written in file Black.dat and also in file White.dat as a response move to the first move of white. With the exception of the first move of white, all the board representations must be written in both files in the same manner as explained for the black second move. The last move which is a black move must only be written in the file White.dat. Next you need to write a program which takes either of these two files (White.dat or Black.dat) as its input, and it writes the first board position together with its response move into a new file as output. Then the program must find all the board positions which are similar to the first board position found and write their response move just below the response move to the first move in the new file (so that all the response moves to the same move are put to gether in the same section (this is where the duplicates are eliminated). All the moves which were copied over to new file should be marked so that they are not checked in the next round. Repeat the process until all the first moves (the move to which the response moves are found) and its response moves are placed in their own sections in the new file. Do the process for both White.dat and Black.dat and create the New-White.dat and New-Blsck,dat. Now, these new files (New-White.dat and New-Black.dat) are in the graph form.

CONCLUSION

Conclusion: The advantages of graph structured database over the tree-like database is that it is intelligent in the sense that it learns how to access itself. It finds which opening lines are strongest and evolves them. the more it plays the more it learns further more, the rate of learning is very efficient. It is robust. It can not be fooled out of opening. The drawbacks are, that in order to create the graph database you must first have the conventional tree-like database and also there is considerable amount of work to do in order to convert the one database to another, but this has to be done once only. The second drawback is that the graph database requires more disk space. However, at the present time hard disk space is not a major problem. Hence the extra cost is well worth the final result.?

Back to top