How to write an AI for a card game

Tuesday, July 2, 2024

Since the advent of ChatGPT in November 2022, AI seems to be all over the place. I sometimes get asked how the AI for the cards games on Whisthub works, so I thought that it was a good time to write a blog post on it. Beware, this post is long and can get a bit technical! Let's go.

Now, first of all, the way the Whisthub AI works is not per se the way to build an AI for a card game. I am no expert in AI whatsoever, but I found that the approach worked really well, and after all it's just for card games, it's not like I'm building self-driving cars or something.

Before we dive in, it's useful to point out how AI models like ChatGPT work. They are giant neural networks with billions of parameters that have been trained on enormous amounts of data, in the case of ChatGPT pretty much the entire public internet. While it's perfectly possible to train such a model for card games too, the problem is that you need training data. If you're writing an AI for some niche card game, then chances are pretty low that you will find a dataset that contains millions of games played by humans.

If you have read the history of Whisthub, then you know that I was very well aware of this problem during the initial development of Whisthub. In theory you can also train an AI model by letting it play random cards that are allowed according to the rules and then have it learn from the outcome, but I absolutely had no clue how you would even start such a daunting task. That's why I decided to use a simpler approach with an AI that just follows a set of basic, deterministic rules.

Fundamentally the AI for all card games on Whisthub consists out of two parts: every AI player keeps track of the probability that a certain card is held by a certain player. This is called the assumptions matrix and it is continually updated throughout the game based on what the players do. Note that this means that the AI plays with the same information as a human would. The AI does not know the cards of their opponents, it just estimates the probabilities of their cards.

With the assumptions matrix we can then do more complex calculations, such as "what is the probability that this player still has hearts ♥ left?" These derived probabilities are then used by the second part of the AI, which is a giant decision tree that basically contains a set of deterministic rules, for example "if the probability that my partner is void in this suit is higher than x, play this card". This decision tree is then manually tuned with hundreds of tests to ensure that the outcome is somewhat human-like.

The assumptions matrix

As mentioned, the assumptions matrix holds the probabilities that a certain card is held by a certain player. Imagine that we play a card game with 3 players and every player has two cards in their hand, then the assumptions matrix would initially look like

	♥A	♥K	♥Q	♥J
Ross	0.5	0.5	0.5	0.5
Rachel	0.5	0.5	0.5	0.5

We don't have any information about the cards yet, but we know that every card must be held by either Ross or Rachel, so the vertical sum must be 1, or

\sum\limits_i p_{ij} = 1

As there is no reason why a certain card would rather be held by Ross than by Rachel, the probabilities are initially uniform, which means that they are all equal to $\frac{1}{2}$ . For a card game played with 4 players - as all games on Whisthub - this would mean they are all equal to $\frac{1}{3}$ initially, but let's keep it simple in the example with just two opponents.

Note that the assumptions matrix has another interesting property: the horizontal sum must always be equal to the amount of cards $n$ held by each player. Theoretically this is equal to saying that the expected value of the amount of cards held by a player $E\left(n_i\right)$ is the sum of the individual probabilities $p_{ij}$ , or

E(n_i) = \sum\limits_j p_{ij} = n

In our example we know that every player has two cards, so the horizontal sum should always be equal to 2, which you can verify is indeed the case. The fact that we know all vertical and horizontal sums, form the boundary conditions for the assumptions matrix: no matter how the values in the matrix change, they must always obey the horizontal and vertical sum constraints.

While this is all fun and stuff, uniform probabilities aren't really helpful. That's why the assumptions matrix is constantly updated based on what happens in the game. The most obvious example of this, is a card being played. Imagine that in the example above, Ross plays ♥A. We could then update our matrix like

	♥A	♥K	♥Q	♥J
Ross	1	⅓	⅓	⅓
Rachel	0	⅔	⅔	⅔

See what happened? Ross played ♥A, which reveals he indeed held that card, so $p_{\text{Ross},\text{A}} = 1$ . Obviously this also means that Rachel can no longer have ♥A, so this probability becomes 0. We can then also update the rest of the matrix based on our boundary conditions. We learned nothing special about Ross's other cards, so we assume that the probabilities have remained uniform, which can only mean that the probabilities are now $\frac{1}{3}$ instead of $\frac{1}{2}$ because the horizontal sum must still be 2. We can then easily find the remaining probabilities of Rachels cards as $\frac{2}{3}$ with our vertical sums that must still add up to 1 too.

Note that updating the matrix is easy in the example above, but it actually becomes a lot harder for games with 4 players (and hence three opponents). The problem is here that our boundary conditions are not linearly independent, meaning that if we create a set of equations with them, the determinant becomes 0 and we need additional boundary conditions to solve for the unknown probabilities. This was actually one of the hardest problems to solve with the AI, and I ended up solving the assumptions matrix with an iterative approach. Basically this means that we iteratively redistribute the remaining probabilities until the boundary conditions are sufficiently met again.

It's obvious that playing a certain card reveals certain information, but there are other ways too that information can be revealed. In card games, not being able to follow suit often indicates that this player is void in that suit. In that case we can set the probabilities in the matrix for that player to 0 for all cards in that suit, and then run the update algorithm again. You can look at it this way: if the play reveals that a player is void in the suit, then there's a higher probability he will hold cards in other suits, and vice versa the probability is higher that the other players will hold the cards in the voided suit. The update of the assumptions matrix will nicely reflect this.

Shape functions

Another interesting way of revealing information about the cards happens in games with a bidding process, such as Colour Whist. In this game, a player can propose a certain suit, and this often indicates that they have a lot of cards in that suit, and probably high cards too. We can reflect this in the assumptions matrix as well.

To do this, the Whisthub AI uses the concept of shape functions. A shape function basically contains the probability that a certain card of a suit is held by a player based on what they proposed. The shape function of a bid like "Abondance ♥9" can for example look like

This means that if a player bids Abondance ♥9, the AI assumes that the probability of that player having ♥A is about 89%, for ♥K it is 86% and so on. The sum of all these probabilities $\sum{p_{i}}$ is hence the expected value $E\left(n\right)$ of the amount of cards the player holds in that suit.

Wait a minute. The shape function has values for all 13 cards, but what if I hold some of them myself? In that case I know that the opponent cannot have those cards!

Well, the shape functions have to be considered as general shapes which then get fit to what a player can actually have in their hand based on the information we have of our own hand. We've developed a specific algorithm for this, which has the property that it maintains the expected value of the shape function. For example, if the sum of all values of the shape function is 6, then this means we expect this player to hold 6 cards in the given suit, and the fitting algorithm redistributes the probabilities so that this expected value is maintained.

For example, consider that our own hand looks like

and our opponent bids Abondance ♥9, then the shape function gets transformed to

If you look closely, you see that the probabilities for ♥Q and ♥6 get "redistributed" over the other cards so that the expected amount of cards held by the abondance player remains the same. Consequently the probabilities of the other cards all become slightly higher. Note that the redistribution algorithm does not "blindly" redistribute the probabilities by the way: it ensures that any probability can never exceed 1.

The various shape functions that are used in the AI were initially just built up intuitively by myself. Just something I made up which looked reasonable, but not based on anything. However, once I had a few million logs of actual games that were played, I reworked the shape functions to actually be based on what happens in real games. For example, I logged how often someone holds the ♥A when they propose a suit, how often the ♥K and so on. This then becomes the shape function.

In a way you could consider this as training the AI. It isn't really how training works in AI models like ChatGPT, but at least there was now a feedback loop. The best part of this was that once I updated the shape functions to be based on the real life situations - which honestly differed a lot from my initial guesses - the AI started playing noticeably better and more human-like. It's hard to describe what made me notice this, but it was one of the most satisfying feelings I ever had during the development of Whisthub!

The card spreads

While it's all fun and stuff to know the individual probabilities of a card being held by a certain player, this in itself isn't very useful. The real power here comes from the ability to do derived calculations with them. For example, using the assumptions matrix we can calculate the probability that a certain player is void in a suit, or the probability that if you play a certain card, no-one will be able to overtake it anymore.

A lot of these derived probabilities are calculated by looping all possible ways that the remaining cards can be distributed over the other players, which we call the spreads. For example, imagine we know that there are 2 cards of clubs ♣ left. We can model these spreads as 2 balls that have to be distributed over 3 bins

For each of these spreads, we can easily determine if it satisfies the condition we're looking for, for example when we're calculating the probability that all players behind us are void in a suit. Using the probabilities from our assumptions matrix, we can calculate the probability of each spread, and by summing those we get the probability of the condition we're looking for.

Note that in the spreads above, we've made abstraction of the values of the cards. In reality, ♣A| |♣2 could result in a fundamentally different situation than ♣2| |♣A! However, the amount of possible spreads becomes so extreme in those cases that it's simply not feasible anymore. Often we also don't really need it. For example, if we need the probability that the player behind us will be able to overtake our ♦Q with ♦A or ♦K, we don't need the spreads for this, we can just calculate it as the probability that this player does not lack both ♦A and ♦K, or in mathematical terms

p = 1 - \prod\limits_{i} (1 - p_i)

The approach of investigating all spreads and assigning probabilities to them can be considered as a kind of Monte Carlo simulations of limited depth. This is for example a common technique in chess AI programs, but in those cases the simulations look multiple moves ahead, while with the spreads we just look at the current possible situations.

The decision tree

The fact that we now have a way to calculate probabilities for specific situations, allows us to simulate how a human might reason when playing cards. Consider for example the case of drawing trumps in a game of Colour Whist, which means that you lead a trump card a few times to make sure the opponents won't be able to trump future tricks in other suits. Usually you always draw trumps at least twice, but whether you should draw a third time or not could depend on the probability that the opponents still have trump cards.

While easy on paper, the question here is of course what limit you use for the probabilities. At what probability of the opponents still having trumps will you draw a third time? 50%? 75%? 90%? The truth here is that there's no single answer to this - just as no two humans have the same playing style - and this is the responsibility of what I call the decision tree. This decision tree identifies certain game situations and then makes a decision based on a probability and an arbitrary threshold for that probability. At its very core it is indeed just a giant if-else structure.

But still, how do we determine the probability thresholds used by the decision tree? Well, basically just a gut feeling and constantly asking yourself

As a human, what probability makes sense in this situation?

However, remember that the exact values of the thresholds actually don't matter that much. What matters is that the AI plays reasonable and human-like on the outside. What goes on behind the scenes is actually irrelevant.

In order to ensure this, we have implemented hundreds of test cases where it is tested what the AI does in specific situations. Such test cases typically look like

it('leads the highest trump card', function() {

  this.log = `
    1: ♥Q64 ♦AJ106 ♣642 ♠AJ7
    2: ♥K973 ♦K5 ♣1085 ♠Q432
    3: ♥J85 ♦Q7432 ♣AQJ ♠K6
    0: ♥A102 ♦98 ♣K973 ♠10985
    1: Propose ♦
    2: Propose ♠
    3: Accept ♦8
    0: Propose ♣
    2: Pass
    0: Pass
  `;
  let card = this.decide();
  expect(card).to.equal('♦A');

});

The test cases are typically found by manually playing the AI until we notice that it does something strange. If this happens, we create a test for that specific game situation, which then allows us to dive into the decision tree and see where the AI actually makes its decision. Often this means that we have to add a specific condition to the decision tree for that situation, but it also happens that it turns out that the thresholds that we mentioned above need to be finetuned.

Once we've fixed the AI's decision in a specific test case, it's also essential to run all existing test cases again to ensure that the existing behavior did not change due to our fix. In software engineering this is called a regression: we want to avoid that by fixing something in a certain location, nothing breaks in other locations.

Note that this is one of the main benefits of a decision tree: it's super easy to follow the reasoning of the AI and figure out why it makes a certain decision. This is way harder with neural networks, like ChatGPT for example, which essentially act like a black box where it's impossible to figure out why the AI has made a certain decision. This would also make it hard to improve the AI, where the answer is often just to get more - or better - training data. I hence consider the fact that the Whisthub AI is not a black box as a huge advantage.

Another way of finding relevant test cases is by running simulations where the AI plays itself. This is especially relevant for game situations that are rare and hence hard to encounter when playing the AI manually. In that case we temporarily program a condition into the AI that logs the game if we reach a specific situation, and then we can use that log to ensure the AI plays correctly.

Last but not least, simulations are also important to ensure that there are no bugs in the AI, more specifically bugs that would crash the AI. Due to the deterministic nature of the AI, I can easily simulate more than 200 games per second on my own PC, so if you leave it running for a few minutes and it didn't crash after 100,000 games, then we can be pretty sure that the AI is bug-free.

Ambiguous game situations

While the concept of a decision tree is straightforward, the problem is that you will always have game situations where it's not very clear what is the right play. Adding test cases and tuning thresholds doesn't help us either because by definition ambiguous situations mean that not all humans would play the same card.

As for testing these ambiguous situations, it means that we don't test what the next move should be, but rather ensure that the next move is not a blunder. That way it allows that changes to the AI may alter its decision in ambiguous situations, but at least we make sure that the AI won't suddenly decide to make a move that would actually be considered a blunder!

However, in games like Barbu King and Hearts we've developed a specific algorithm that is used in those ambiquous situations. Both Barbu King and Hearts are negative games, meaning that you have to avoid taking penalty points. If we're not sure what card to play, the AI will calculate the expected penalties for every card that can be played and then try to minimize this amount.

For example, consider the following situation in Hearts:

Imagine we have both ♥3 and ♥K, and ♠Q is still left, then we have two options in this situation:

We can throw ♥3 now and avoid the penalties already in the trick, but risk to get hit with the Queen of Spades ♠Q and her 13 corresponding penalties later on due to keeping ♥K.
We can throw ♥K now and take the 4 penalties, but keep our safe ♥3 for later, which means we can't be forced to take ♠Q on a trick in hearts ♥.

The AI then estimates the expected penalties for both cases - taking into account the information from the assumptions matrix - and then picks the option that results in the least amount of expected penalties. Note that this is actually the same reasoning that a human would do as well: try to assess what is the best option in the long run.

While the approach sounds simple, implementing an algorithm for the expected penalties was far from easy. It makes heavy use of the assumptions matrix and the spreads, and it also makes a lot of simplifications to not end up having to examine thousands of possibilities. Testing this is also often done by checking that the AI does not make a blunder, rather than ensuring the AI plays a specific card.

The AI's memory

While the AI is definitely not able to play at a human level, it is still better than any human in one aspect: the AI has a perfect memory. In card games it is often beneficial to know at all times what cards are still left in the game. For example, you often don't want to waste a trump card if your partner has played the highest card left in a suit. The problem is that, as a human, it's natural to forget what cards have already been played so you end up trumping unnecessarily.

The AI will never make this kind of mistakes because it knows exactly what cards are still left in the game - though remember that it does not cheat and can only see its own hand! You could argue that we could make the AI play more like a human by giving it an imperfect memory instead.

However, we explicitly decided to not do this. Not only does it needlessly complicate the logic of the AI, users typically don't like playing with AI players, so if the AI players would make mistakes by forgetting which cards have been played, this can be extremely frustrating if you get teamed up with an AI, especially in a tournament. The AI is far from perfect, so it makes no sense to make it intentionally play even worse. Computers have perfect memory, that's their strong point, and they're allowed to use it.

Conclusion

I don't know if this is how you expected the AI to work or not. The AI on Whisthub is fundamentally different from how modern AI systems like ChatGPT work, and in a way can be considered not to be an AI at all as it's based on a deterministic set of rules.

It is tempting to think that the AI could be improved if it uses a neural network instead of a deterministic decision tree. However, one must not forget that due to being deterministic, the AI on Whisthub is lean and extremely fast. If we were to train a neural network for all the available card games, the size of the model would easily be a few megabytes - or maybe even gigabytes - while the size of the AI is currently just a few kilobytes.

Running neural networks also uses a lot of computational resources, so it would require us to run a separate server - probably multiple ones - just for the AI alone. This is not worth it, especially because the focus of Whisthub lies on multiplayer games against other humans, even though it's sometimes impossible to avoid AI players, for example in tournaments.

In the meantime I have collected a few million game logs of various card games, so training a neural network is definitely possible, but just for fun or educational purposes. I'm no expert in the field, so for the moment I don't feel like investing my - limited - time in it. Still, if you're doing a PhD in AI or something, I'm open for a collaboration, it could be a fun project!