Let’s Discuss GoogleBot and ‘Crawl Budgets’…
If you’re clued up on search engine optimisation practices and search engine marketing as a general area of marketing, you’ve probably heard or are aware of GoogleBot.
If you’re not, though, let me give you the rundown.
What is GoogleBot?
Essentially, GoogleBot is the coined name for Google’s web crawler. It’s the terminology for their tool that simulates a user on both desktop and mobile devices respectively and ‘crawls’ the web and its indexed web pages. Websites are crawled by both ‘simulations’ Googlebot Desktop and Googlebot Smartphone however they both follow the same user agents in robots.txt files.
And, practically since GoogleBot and Google’s website crawling has been a thing, there have been conversations amongst search engine marketers regarding ‘crawl budgets’, meaning a limit to the number of pages Google will or can crawl on your website.
Luckily, Google’s John Mueller has recently commented on this subject.
John Mueller explains GoogleBot crawls
During a recent ‘SEO Office Hours’ by Google, John Mueller was asked:
‘Why does Google not crawl enough web pages?’
In short, because the web is a big place (this is an understatement, too) Google naturally, and understandably, aims to only index higher quality web pages that it would appreciate its users going to and ultimately denying an index opportunity to web pages that it deems too low quality.
And, according to Google’s Google’s developer page for huge websites:
‘The amount of time and resources that Google devotes to crawling a site is commonly called the site’s crawl budget.
Note that not everything crawled on your site will necessarily be indexed; each page must be evaluated, consolidated, and assessed to determine whether it will be indexed after it has been crawled.
Crawl budget is determined by two main elements: crawl capacity limit and crawl demand.’
A follow up question was posed after the first question:
‘Do you have any other advice for getting insight into the current crawling budget?
Just because I feel like we’ve really been trying to make improvements but haven’t seen a jump in pages per day crawled.’
After John Mueller understood how large the site in question was – with the answer being in the hundreds and thousands of pages with only around 2000 pages per day being crawled.
‘So in practice, I see two main reasons why that happens.
On the one hand if the server is significantly slow, which is… the response time, I think you see that in the crawl stats report as well.
That’s one area where if… like if I had to give you a number, I’d say aim for something below 300, 400 milliseconds, something like that on average.
Because that allows us to crawl pretty much as much as we need.
It’s not the same as the page speed kind of thing.
So that’s… one thing to watch out for.’
Alongside this answer from John Mueller, it was stated that site quality can significantly affect the number of web pages that get crawled by GoogleBot with him stating that:
‘The other big reason why we don’t crawl a lot from websites is because we’re not convinced about the quality overall.’
So, in conclusion, there is indeed such thing as crawl budgets and limits, and there are myriad reasons that Google justify because of this.