Everything we know about the massive Google algorithm leak

On May 27th, a leaked document, containing 2,500 pages of API documentation from Google’s internal team, sent the SEO world into an utter frenzy. The Google algorithm leak provides some fascinating insights into what data points the tech giant may be using to calculate its rankings, while also disproving bold claims from prominent Googlers that certain factors, such as clicks and domain authority, do not factor into a site’s position in the SERPs.

Before we take a closer look, I have to make a quick disclaimer that the Google leaked document does not include the weightings of different elements of their algorithm. It also isn’t entirely certain which data attributes are actually used in calculations. The API document is simply a repository of all the data points at a Googler’s disposal to use in a project, so some may have only been used in testing or perhaps never been used at all. However, considering there are notes in the document to signify attributes that should be ignored, there’s a good chance that the remaining attributes were in use at the time of the leak on March 27th.

Either way, knowing all the tricks that Google has at its disposal makes it a lot easier for us SEOs to understand and prepare for any changes in the algorithm.

In this blog, we’ll take you through the key takeaways, dramas, and consequences of the Google algorithm leak. If by the end of this post, you still have questions or any concerns, feel free to get in touch with a member of our expert team.

Contact Us

How did the Google leak happen?

Let’s start at the beginning.

On May 5th, Rand Fishkin (CEO and co-founder of Moz) received an email from a then-anonymous source, stating that they had access to a monumental leak of API documentation from Google’s internal team. As of May 28th, we know that the orchestrator of the leak was Erfan Azimi, an SEO consultant who released this public statement on Youtube. He claimed that the document had already been reviewed by ex-Googlers who had verified its authenticity, and that it provided fascinating insights into Google’s search API.

Fishkin jumped on a video call with Azimi on May 24th and saw the full leaked document for the first time – a 2,500 page leviathan with 14,014 attributes from Google’s API warehouse.

Fascinated, but not yet quite convinced, Fishkin reached out to some ex-Googlers to find out if the document was legit. One of them refused to comment on the leak, but two others verified that the document matched others they had seen, using the same notation style and formatting. He also brought the discovery to Mark King (founder of iPullRank and leading technical SEO), who agreed that the document appeared legitimate, and put together his own initial, yet thoroughly detailed analysis of the leak on May 27th.

There is now no doubt about the authenticity of the Google algorithm leak, as the company confirmed its legitimacy with tech magazine Verge on May 29th, although spokesman Davis Thompson stated that: “We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information”.

As for how exactly the leak came to be in Aziz’s possession, it seems most likely that the API document was uploaded to GitHub on March 27th and made public completely by accident, as indicated by the fact that there are a bunch of links to private repositories and internal Google pages for employees. The leak remained public until it was removed on May 7th, leaving plenty of time for it to circulate across the web.

Google lied to us?

Now we’ve covered the background, let’s get to the most exciting part of the story – the lies!

The Google API leak makes it clear that several data points Googlers have stated don’t bear any weighting on rankings, are in fact used in the search API. Let’s go through each of these and get the facts straight.

1. Clicks DO matter

This shouldn’t be a major shock, as the US vs. Google Antitrust Trial already confirmed the existence of Navboost – a system that gives a ranking boost to pages that generate more clicks than the amount expected. However, prominent Googlers like Gary Illyes have made multiple statements in the past dismissing the possibility of using clicks for rankings. The leaked API document leaves nothing up for discussion, with clear attributes for badClicks, goodClicks, lastLongestClicks, unsquashed Clicks, and unsquashedLastLongestClicks.

clicks attributes in google's leaked api documents

Source: iPullRank

2. Google has a domain authority score

Google has stated numerous times that they do not factor domain authority into rankings and that they do not calculate an authority score for websites. And yet, here it is, a siteAuthority attribute clearly marked in the document:

chrome attribute in the google algorithm leak

Source: iPullRank

3. Your site may well be in the sandbox

In the past, Googlers have responded to people asking how long their new site is kept in a ‘sandbox’, saying that such a thing doesn’t exist. However, one of the modules in the document has an attribute called ‘hostAge’. The notes for this attribute state that it is used ‘to sandbox fresh spam in serving time.’ SEOs have had a suspicion that a sandbox exists for a long time, but now we have concrete proof that new sites are at a disadvantage in the SERPs.

attribute for domain age in the google leak

Source: SERanking

4. Chrome data has an impact on organic search

Google spokespeople have also claimed that Chrome data is not used in their ranking model. However, an attribute in the Google algorithm leak named ‘chromeInTotal’ shows that the algorithm is tracking Chrome views to determine user behaviour.

chrome attribute in the google algorithm leak

Source: iPullRank

There is also reference in the document to the attribute ‘chrome_trans_clicks’, which likely uses the frequency of clicks on pages in Chrome browsers to decide which pages on a website are the most popular.

chrome clicks attribute in the google algorithm leak

Source: AIOSEO

Other insights from the Google leak

With the most news-breaking insights out of the way, there are a few other interesting data points which have been found in the Google leak. Keep in mind that this is a repository with over 14,000 attributes, and I am one person on a deadline, but these are a few highlights which are most relevant to fellow SEOs and business owners looking to strengthen their organic search performance.

Brand identity is key

Something several other SEOs writing about this topic have pointed out is that branding now seems more important than ever. There are several attributes in the document that are used to identify, sort, rank, and filter different “entities” (brands). This means that growing a recognisable brand, both in and out of the search results, is key to getting better rankings.

I will note that the Google leak bears very little information about indicators of expertise (or E-E-A-T) for use in rankings, aside from Google Maps and author attributes. However, this doesn’t mean it’s not important. E-E-A-T signals, including things like author bios, awards & qualifications, and original photos, are crucial for good user experience, and their importance has been highlighted by Google’s Search Quality Raters guidelines. Every digital marketer should remember that we are making content for people, not robots, and ensuring good user experience is the best way to get your brand ranking in Google.

Whitelists

It is clear from the API document that Google has whitelists for three topics: travel, elections, and Covid-19, meaning that sites which discuss these topics have to be approved before they can go into the rankings. It is not clear whether whitelists for the travel industry only apply to Google’s ‘Travel’ tab or to the entire SERPs, and Nina Clapperton has suggested that since this is the only non-YMYL niche that has a whitelist, it may even be a leftover from the tight travel restrictions imposed during lockdown, and may no longer be used. Still, this emphasises the importance of doing your due diligence when writing about topics of important global significance.

Fresh content

While we’ve established that new sites may face some difficulties breaking into the SERPs, new content is a non-negotiable if you want your site to rank. Attributes such as ‘bylineDate’, ‘syntacticDate’ (URL date) and ‘semanticDate’ (date of on-page content), are all used by Google to calculate the freshness of your content and evaluate rankings based on this.

To make sure you stay ahead of your search competitors, you should review and repurpose the content on your site regularly. A blog is not finished immediately after you hit the publish button. Content can always be re-optimised to be more accurate, more relevant to users, better aimed at target keywords, etc. Google trends and search volumes are constantly changing, and if you want to keep your top positions you need to make fresh content.

Further tid-bits

  • YMYL topics, including things like finance, medical science, and health & safety, have a stricter scoring system.
  • Google logs domain registration information and can track when a domain is due to expire. This is likely used to prevent expired domain abuse.
  • An attribute named ‘titlematchScore’ seems to be a way for Google to calculate how well-matched a page title is to a query.
  • There is an attribute called ‘smallPersonalSite’ but it is not clear what this is used for and whether it promotes or demotes a site.
  • Google tracks font sizes for links and font weight for body text on a site.  This is likely used to determine the accessibility of a page.

What now?

Google has made very little comment on the leak apart from verifying its authenticity. It is unclear whether the event will encourage more transparency from the company, or give them further reason to be secretive about their API.

In an interview with the Verge, Fishkin stated that “Journalists and publishers of information about SEO and Google Search need to stop uncritically repeating Google’s public statements, and take a much harsher, more adversarial view of the search giant’s representatives”. So, regardless of Google’s future actions, it seems the best course of action is to trust our instincts. Thus far, organic search experts have mostly relied on our own observations and data exchanges to crack the code of the search results, and this is likely to continue being our best method to work out the truth.

The Google algorithm leak has given us some valuable insights into what ranking factors we should be looking at when competing in the SERPs. As always, digital marketers should remember that high-quality content will always be the best policy when it comes to SEO. Clicks, impressions, authority, and engagement are only products of the effort that you put into your site.

At Embryo, we are committed to creating exceptional digital marketing strategies that focus on user-experience. From Organic Search to PPC, DPR, Paid Social, and UX/UI Design, our services are tailored to match your own unique KPIs and business goals. To speak to a member of our team, reach out today!

Contact Us


Related blogs

Latest

Latest News & Blogs