Machine learning – challenging the algorithm of the lottery

Machine learning in action

Predicting the winning lottery results would be pretty super. You’d make yourself rich countless times and could well surpass Jeff Bezos as the richest person in the world. Currently, over £17 billion has been handed out in jackpot winnings alone.

That figure would be much higher if we calculated the total amount of money won in all the prize brackets. I’m going to attempt the impossible and use machine learning to predict this week’s winning lottery numbers. Wish me luck, but first…

Contact Us

What exactly is machine learning?

The terms machine learning and AI have been used quite a lot in the headlines recently. Machine learning and AI are all part of Artificial Intelligence and those in the SEO world have been pontificating about what its impact could be on the search world. Machine learning is just a subcategory of AI. Today I’m going to give a brief overview of some of the methods involved in machine learning and the types of algorithms used to show their predictions.

If you’re new to the field, I’d recommend this book which you can get from Amazon.

Deep learning

Deep Learning is another subcategory of Machine Learning. The topic has expanded in recent years and uses the brain’s neuron network to act in a similar way. This uses the Neural Network approach to mimic the way the human brain computes data. It’s coupled with Big Data and requires huge amounts of computational power to process properly.

We’re still a very long way off from being able to match our brain’s ability to process lots of information at the same time, Deep Learning is still pretty good a doing a single task like predicting outcomes based on a dataset very, very well.

IBM has created IBM Watson a cloud-based supercomputer that allows institutions such as a global banking network (70% of them actually use IBM Watson)

Reinforcement learning

Reinforcement learning is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Unsupervised learning

Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set.

Supervised learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.

 

In this example, I’m going to be using Linear Regression which is a type of Supervised learning as I’m presenting labelled data in the form of a DataFrame.

Getting the data

Getting and cleaning the data was the hardest part of this whole project. Unfortunately, there wasn’t one nice and tidy CSV file that I could download. The national lottery has all the results archived by year, one page for every year since 1994 when it started. That’s 27 pages, and it’s line after line of cute numbers in balls, not practical for me, copy and paste wasn’t an option, especially not 27 times either.

I opened up my Jupiter Notebook and created a really simple loop to change the URL and grab the HTML tables for every year and created a nice long list with them all in. I then created another loop to take each element in the list and then create a handy data structure called a DataFrame and merged them all together.

After removing the £ signs, I was able to convert everything into numbers and separated each ball into different columns I was ready to start some Machine Learning.

Exploratory data analysis (EDA)

I thought id take a look at how often each ball had been pulled out and I noticed something straight away. Checking out frequency is always a good idea.

  • 50-59 have a much lower frequency count, Camelot only introduced them to make winning more difficult and to increase the prize money with multiple rollover prize pots.
  • Unlucky number 13 has been the least drawn ball with 298 appearances. 26.7% less frequent than the highest frequency ball, number 38 being drawn 408 times.
  • Across all numbers, there is not much deviation in frequency

Linear regression

I used a Basic Linear Regression model to predict the outcomes of each ball drawn. I use the Jackpot as my x value so I could predict the number based on the amount of money at stake.

As I mentioned in the Data Cleaning section I created a separate column for each ball as it’s drawn and didn’t sort the values in any kind of order, just as they came out. I created another loop to go through the jackpot winnings and the associated ball number drawn with each date. After 7 loops I had each number in its position. Check out my neat infographic below with a few. more details on each ball.

Putting my money where my mouth is

Overall I don’t think there’s enough data to use machine learning to accurately predict what the winning lottery numbers are going to be. There are over 45 million different combinations of numbers that can be used and we’d need about 20 million draws to get something that would be 80% correct. But just in case I’m wrong or by chance, these numbers do come up, I’ve decided to put my money where my mouth is and go ahead and try it out.

EDIT: Unfortunately I had an emergency before being able to post the finished blog but I didn’t win anything. A few numbers were close, but no dice.

Contact Us

Check out one of my other posts, the one on Benford’s Law is particularly interesting if do say so myself.

Latest

Latest News & Blogs