How to create a self taught vehicle

In this post, I want to talk about the AI created for Fuel Renegades, a game developed by a group of students at ESAT.


The beginning

The basic idea of the AI is to simulate the behaviour of a real player: movement, overtakes, fast laps, circuit interactions, etc. So I started simulating the movement of the vehicle calculating the inputs that the vehicle received.

The inputs are:

  • Throttle
  • Brake / Going backwards
  • Turn
  • Drift
  • Interact with elements of the circuits (e.g. boost pads)
  • Vehicle movement

The circuits are based on a spline that will be the centre of the road and is made with the condition that it is not necessary to drift to finish a clean lap, so drifting will not be a priority. Also, going backwards will not be necessary and unless we need to avoid an obstacle we will always speed up as much as possible. Therefore, our biggest problem is the steering.

To start I put an invisible ball a little forward of the position of the vehicle in the spline and have the vehicle follow it. We will talk more in depth about this later.

As simple as that, now we have two points with which to calculate the necessary turning input, the ball and the vehicle. With just a scalar product we get the angle between our position and the point where the vehicle wants to go (the invisible ball) and that angle will be converted to an input value with the turning range of the vehicle. How is calculated that range? I would like to say that I used a mathematical way to calculate it but it was just trying and fail until it feels realistic.

The road


Like Formula 1 races split their circuits into different sectors to time which one is faster, we divided our road (I mean spline really) but in much smaller sectors. How small? Small enough to keep the track of all the necessary steering changes due to curves but big enough to avoid advance more than one sector in the same frame.


So, the question now is, what will be a good distance? Well, I made some tests measuring the distance of the vehicle at maximum speed and turning it the maximum it can but I realize that the maximum speed could change in the process (and it does) so all the measures will be invalid and at the end, all my measures were tried and failed. So basically, I made some tests in a circuit with closed curves until the vehicle could be able to update its trace correctly and feel realistic.

Although all this work was good, was just the beginning. We only had a bunch of vehicles following the centre of the road, not trying to take advantage of faster traces or trying to overtake each other. Therefore, I decided to split the width of the road equidistantly. This way I was able to find better traces and have an alternative route in case of overtaking.


To store all this data I used a hash table, the key was the relationship between the position of the sector in the spline with the total length (if the sector is at 100 units and the spline is 1000 units, 100/1000 = 0.1). The value was a struct that stored an id of the checkpoint, its position, and another two tables storing the time of the trace and the number of checkpoints that the trace has passed (in total) with one row for each checkpoint of the next section.


Why is this really interesting? Well, you have to know that the divisions are symmetrical and if there are no divisions, there is only one checkpoint, the spline, so its id will be 0.5. With 1 division (per side) you have 3 checkpoints with ids 0, 0.5 and 1. Next division, 5 checkpoints, 0, 0.25, 0.5, 0.75, 1, and so on… So, in each division, you add new checkpoints but without losing the previous information, therefore you can start with a naive spline that the vehicle will “learn” in one lap and add new divisions until you have covered all the road. This way you have faster results and it will be easier to penalize wrong traces when training.


At this point, we have all the necessities to start learning where are the best paths. The process is simple, the vehicle has to go from checkpoint to checkpoint of different sectors evaluating all the possible combinations and penalizing those paths that take the vehicle off the road. The key here is to penalize them instead of discarding them right away. Why? Because the vehicle almost never goes by the exact position of the checkpoints, they are just reference points for the turning calculation. So, the position of the vehicle when that checkpoint has to be evaluated is highly dependent on the previous checkpoints. For example:


As you can see in the picture, the previous path of the vehicle makes it go away because it doesn’t have enough space to turn. To penalize the path I used the time passed as a weight value, increasing the current time stored in if the trace is not valid but storing the time passed since the beginning of the lap until that checkpoint if the lap is completed. Also, the number of checkpoints passed is stored in each checkpoint of the trace (only it’s updated if the number is higher than the previous value).

Let’s look a similar case:


The final combination of checkpoints is the same but not all the previous trace. Now the vehicle has enough space to turn and continue the trace. That is why we should evaluate each combination more than once.

Of course, there is a limit, as much for start evaluating a checkpoint or avoid it, I have to configure some thresholds: one to consider that the checkpoint has enough information to tag it as “learned” and another to tag that checkpoint is invalid. Therefore, we keep trying the same checkpoint in the division until we have learned it and continue the next checkpoint in the same division.

What happens when all the checkpoints in the division are learned? I keep the evaluation of choosing one random checkpoint. If all are good paths they will keep between right values. And what happens if all the checkpoints in the division are invalid paths? Well, this is highly probable, especially in the first divisions of the lap, so I reset its values to the half of the range between learned and invalid checkpoint. If the checkpoint is really invalid, it will be above the threshold right away, if not it will correct its value slowly. As you will notice, the difference between the thresholds has to be high enough to not be evaluating the same division all the time.


The major problems that I had to face were the table lookups, so I try to optimize those ones that were hot spots mainly but of course, there are other parts that can be check and be improved. The game was running at 60 fps so I left the AI optimization and move on to more necessary tasks.


This was my first approach to machine learning and I know there are still some things that can be improved and do better but in general, I am very satisfied with the result. Any questions, advice and ideas, I would be very glad to hear them.

Thanks for reading.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at

%d bloggers like this: