Monday, September 12, 2016

Game AI (part 1): To exploit or not

Rock, Paper, Scissors.  Imagine we were building the AI for that, how would we do it?  It seems simple at first, randomise the decision, we can't be beat, right?  It is right, statistically we could win or lose against a computer doing this, but over 1000s of games, we will win the same percentage as the computer, it's essentially a no win decision.  It is also an unexploitable AI decision.  No matter how clever we are, we can never find a system to get better than break even in the long run.  This is not fun for humans, we like to be able to beat AI, so how do we dumb down this decision just a little bit?

Lets look firstly at making the decision a whole lot smarter.  Think about this.  After playing this rock, paper, scissors game for 10 minutes, we come to the conclusion it's completely random, so we starting choosing paper every single time, because we just don't care, or perhaps we are trying to work out if it really is just random.  This is no better or worse for the player, because the AI is indeed random.  Our unexploitable AI decision now seems very dumb though, because it should be choosing scissors every single time and winnning.  If you were playing against a human, there is no way the human would randomly choose rock if we had chosen paper 5 times in a row.  So the AI in this case is unexploitable, and completely non-human like.  That can't be good?  So lets add a bit of logic.  If human player chooses paper 4 times in a row, choose scissors, else choose random.  For argument sake, lets say it's also coded that any decision 4 times in a row, on the 5th attempt make the decision to beat that decision on the 5th attempt.  The AI just got a little bit more human, and at the exact same time, is now exploitable.  Once the human works out the AI decision making process, they can make a strategy that exploits the computer.  Once worked out, I would always choose rock 4 times, then on the 5th choose scissors.  Then scissors 3 more times, then paper, etc etc, until we are splitting 4/5ths of the time, and winning 100% of that 5th turn.  In this game, that's a pretty decent level of success.

Again, the human decides instead to just alternate in the exact same order, rock, paper, scissors, rock, paper, etc, etc.  Again another human player would pretty quickly work this out and start exploiting, but how does a computer do that?  Most likely they have to recognise it as a pattern, which we effectively have to program in.  So we program it in, if they chose, rock, then paper, then scissors, we'll choose paper next, to thwart their next decision, which probably rock.  Next instead the human keeps trying scissors, paper, rock (ie. reverse it).  Now, the computer should be able to exploit that.  We could end up with a lot of code, just trying to be more human.  And every bit of logic we add in, makes another part we can exploit.  To go down this route too much further would be recognise a human pattern, exploit it, but then realise we could be exploited back.  The code gets really hairy here.

To go to the nth degree on this, look at this web page:
http://www.nytimes.com/interactive/science/rock-paper-scissors.html?_r=0

In theory this is the machine learning approach to beating us.  When a player does X, and we do Y, our DB, sampled over 1000s of users says their next move is most often Z.  There could probably be a whole book on the approach to using this data to make a great machine learning rock paper scissors, and the machine learning itself could be extremely simple or complex depending on who is driving it.  I think in terms of the one used here, it basically looks over the last 5 decisions made by both players, and matches that up against a patter in the DB of all the other users it played before, and works out what your next decision will be by weight of most common next human decision.  This strategy is creepily good, because humans are actually pretty predictable when playing this random game.  Yes we are trying to exploit, but then so are most humans, and we get caught trying the same exploit.  To be honest, with a decent range of games and opponents that have played, this AI is very human like, and a very smart human like... however, it's still exploitable, because it goes into every game with a set strategy.

Finally, this approach, though very awesome, still says nothing about me personally.  Forget rock, paper, scissors for one second, and think about a generic FPS game.  The programmers have gone to great lengths, and machine learned everything, right down to how fast humans usually respond, lets say 0.3 seconds, and the AI decides to make their decisoin 0.35s, giving the average human a slight edge.  This is great, but if the human happens to be more like 0.4s, we will get slaughtered and likely stop playing.  Likewise, the better players our there are well under 0.2s and they think this game is far too easy.  This seems an ideal time to think about either editable settings, or perhaps going the rubber band theory, allow the AI to adjust it's settings based on previous history.  Of course, the major issue with rubber band theory is the rubber band can be manipulated.  For example, if I play 3 games in a row, and deliberately react 2s for every shot, the AI will have a lot of evidence that I am on the extreme side of slow, and treat me accordingly, how could it not?  So when I go to get a high score/show my friend how I crush this game, I suddenly go back to 0.3s to show them, and suddenly I'm winning the game with amazing ease.  Yes, this theory is exploitable too.  I could probably adjust way quicker with the rubber band to evidence of suddenly improved times, this would make it a lot more human.

I guess it comes down to, the first thing you have to admit with AI, is that it will in some way be exploitable (or be your lifes work), so at that point, you can embrace that fact, and make sure you control just how you want that exploit to be happen for someone to beat your AI.  I'm planning to have a little fun with exploiting in my game.

No comments:

Post a Comment