Creating an AI Football Scout
AI is taking over sports.
Analytics, fan experiences, health, refereeing, and more are having AI implemented into them and are replacing jobs, improving processes, and more.
Scouting is one of the key parts of running a successful club and it can result in the club getting great players that will positively impact the team.
Or it can result in a massive misplacement of funds on players that either weren’t a good fit for the team or had been overhyped by a recent performance of some sort.
It doesn’t just have a sporting impact but can have a huge financial impact as well so getting picks right as a scout is critical.
To see if this AI scout idea I had would be plausible, I needed to figure out
how do I build it
can it give me good enough results that I would actually use it if I were working for a club
How I built it
With all of these large language models, you can essentially take two routes
The first option is an open-source model like Meta’s llama.
With this route you have:
more customization
can be cheaper
more complex
The other option is you use prompt engineering on prebuilt models such as GPT-4, which is what powers the all-mighty ChatGPT.
doesn’t offer as much customization
more expensive
the time to build and test is much faster
So for time’s sake and because I didn’t see myself spending a lot of money to just test and validate this, I decided to just spend a little bit of money and use OpenAI’s GPT4-Model
I headed over to OpenAI’s site, purchased my credits, and I was ready to go.
But wait!
One of the big issues of GPT models is that they only have information up to a specific date.
As of right now that cutoff is December 2023.
Which means that it really doesn’t have any data to go off of and the data it does go off of will probably be outdated.
I’m looking to create a scout that allows us to be up-to-date and see if we can really replace scouting with AI.
We need current data to give it!
I researched a couple of different ways to achieve this and the solution I came up with is to
Scrape up-to-date data
Inject that data into my prompt
The downside is this will cost me a bit more to inject the data into it, but it will be more accurate so it’s a trade-off I was happy to make.
To get the data, we’ll use fbref.com which has summary statistics that we can use to evaluate a player.
So at this point I now have
My model picked out
My data source
And the fun part now comes with the prompt engineering and using these models to get a scouting report.
Prompt engineering is a very iterative process to try and fine-tune your prompt to give you the expected results.
These prompts can be very sensitive so it’s a lot of back and forth with evaluating a prompt to make sure you aren’t telling it one thing and it’s doing another.
I essentially needed to set it up in a way that would take in the data that I would scrape and then return in the same format every time.
So the full flow now looks like:
scrape data
craft the prompt and include the data we scraped
plug that prompt into our model
then take the output of that response to create a markdown file which would be considered our “deliverable”
Evaluating the results
now that we have that flow set up let’s look at some of the results for 2 different players
Lamine Yamal
Nathan Broadhead (https://fbref.com/en/players/43309491/Nathan-Broadhead)
AI in the future of football scouting
AI taking over the sport is a sure thing to come, but I like to think that a lot of jobs and things in sports won’t be replaced by AI but rather by a person who uses AI to their advantage.
The model did better than I thought it was going to do.
It was able to take simple data and create a somewhat reasonable report with good insight into what type of system a player would fit.
Though it wasn’t perfect.
To improve this model we would want to build in more variables such as
market value
physical data
normalize stats for age
add in event data such as shot locations
and other data points that would help us account for these different things
Where I see this being beneficial to current scouting is helping to sift through a lot of data.
Say every three months you use an AI bot to help me sift through hundreds of players that you are looking for as a scout.
This will help narrow down results much faster than doing it manually, as well as save a lot of costs and money on travel, seeing players you’re not interested in or wouldn’t be a good fit, etc.
That’s it for this week!
Let me know what you think about AI in scouting. Does it seem like something that will start to be used by scouts? Or is it all just hype?
McKay