How Search Engines Rank Pages
As SEO professionals, we usually focus on the question: “How can I rank my page?”
An equally, if not more important, question we should ask ourselves is, “How do search engines rank pages?”
Why Search Engines Rank Web Pages
Before we dive into how search engines rank web pages, let’s stop for a moment and think about Why they rank them.
After all, it would be cheaper and easier for them to just display the pages randomly, by word count, by freshness, or by one of many easy sorting systems.
The reason why they don’t is obvious. You wouldn’t use it.
So when we ask the question of rankings, what we always have to keep in mind is that the user we are trying to satisfy is not ours, he belongs to the engine and the engines lend him to us .
If we misuse this user, he may not come back to the engine and therefore the engine cannot have him, because his advertising revenue will decrease.
I like to think of the screenplay as some of the resource pages on our own site.
If we recommend a tool or service, it’s based on our experience with them and we believe they will serve our visitors as well. If we learn that they do not, we will remove them from our site.
That’s what engines do.
I don’t have any eavesdropping devices at Google or Bing.
Google has one on my desk and another that I take with me when I’m not there, but for some reason the message collection doesn’t work the other way around.
I say this to clarify that the following plan is based on about 20 years of observing the evolution of search engines, reading patents (or more often – Bill Slawski‘s patent analysis), and starting every day for many years examining events in the industry, from SERP layout changes to acquisitions to algorithm updates.
Consider what I say an educated breakdown that is hopefully about 90% correct.
If you’re wondering why I’m thinking 90% – I learned from Frederic Dubut of Bing that 90% is a great number to use when estimating.
It’s just a simple 5 steps – Easy
The complete page ranking process has five steps.
I’m not including technical challenges like load balancing and I’m not talking about every different signal calculation.
I’m just talking about the basic process that every request must go through to start its life as a request for information and end it as a set of 10 blue links buried under a sea of advertisements.
Understand this process, understand who it is for, and you will be well on your way to properly thinking about how to rank yourself. your pages to their users.
I also think it’s necessary to note that the words used for these steps are mine and not some type of official name.
Feel free to use them, but don’t expect any of the engines to use the same terminology.
Step 1: Classify
The first step in the process is to classify the incoming request.
The request classification gives the engine the information it needs to perform all subsequent steps.
Before complex classification could take place (read: back when engines relied on keywords instead of entities), engines had to essentially apply the same signals to all queries.
As we will see later, this is no longer the case.
It is in this first stage that the engine will apply such labels (again, not a technical term but an easy way to think about it) to a query such as:
I have no idea how many different classifications there are, but the first step the engine should take is to figure out which ones apply to a given query.
Step 2: Context
The second step in the ranking process is to assign a context.
As much as possible, the engine should take into account all the relevant information it has about the user entering the query.
We see it regularly for queries, even ones we don’t ask. You can see them here:
And we see them here:
The latter, of course, being an example where I didn’t specifically type in the query.
Essentially, the second step in the process is for the engine to determine which environmental and historical factors come into play.
They know the category of the request, here they apply, determine or extract the data relating to the elements deemed relevant for this category and this type of request.
Here are some examples of environmental and historical information that would be considered:
- If the request is a question
- The device used for the request
- The format used for the request
- If the query is related to previous queries
- If they have seen this query before
Step 3: Weight
Before we dive in, let me ask you, how tired of hearing about RankBrain are you?
Well, buckle up because we’re about to talk about this again, but only as an example of this third step.
Before an engine can determine which pages should rank, it must first determine which signals are most important.
For a query like [civil war] we get a result that looks like:
Solid result. But what would happen if the freshness had played a strong role? We would end up with a result more like:
But we can’t rule out freshness. If the request had been [best shows on netflix]I would care less about authority and more about publication date.
I hardly want a heavily linked article from 2008 describing the best DVDs to order on their service.
Thus, with the type of query in hand as well as the context elements extracted, the engine can now rely on its understanding of the signals that apply and with what weights for the given combinations.
Some of this can certainly be accomplished manually by the many talented engineers and computer scientists employed and some will be handled by systems like RankBrain which is (for the 100th time) a machine learning algorithm designed to adjust signal weights for new requests. but later introduced into all of Google’s algorithms.
With the claim that around 90% of its ranking algorithms rely on machine learning, it’s safe to assume that Bing has similar systems.
Step 4: layout
We have all seen it. In fact, you can see this in the Civil War example above. For different queries, the layout of the search results page changes.
The engines will determine the possible formats that apply to a request intent, the user performing the request, and the available resources.
The full SERP page for [civil war] looks like:
I put an educated guess on the base factor used to determine when each element is present.
The truth is that it is a moving target and it relies on knowledge of the entities, how they connect and how they are weighted.
This is a very complex subject, so we will not dwell on it here.
What is important to understand in the context of this article is that the various elements of a given search results page must be determined more or less on the fly.
That is, when a query is executed and the first three steps are completed, the engine will reference a database of the various possible elements to insert on the page, the possible locations, and then determine what happens. will apply to the specific query.
An aside: I noted above that search results pages were generated more or less on the fly.
While this is true for infrequent queries, for common queries it’s much more likely that engines keep a database of what they’ve already calculated to match the user’s likely intent in order to not having to deal with them every time.
I imagine there is a delay after which it refreshes and I suspect it refreshes the full input at low usage time.
But moving forward, the engine now knows the classification of a query, the context in which the information is requested, the signal weights that apply to such a query, and the layout most likely to meet the different possible intents for a request.
Finally, comes the time for the classification.
Step 5: Ranking
Interestingly, this is probably the easiest step in the process, although not as singular as one might think.
When we think of organic rankings, we think of the 10 blue links. So let’s start there and look at the process so far:
- The user enters a query.
- The engine takes the query type into account and ranks it to understand which key criteria apply at a high level based on similar or identical previous query interactions.
- The engine takes into account the position of the user in space and time to take into account his probable intentions.
- The engine takes query classifications and user-specific signals and uses them to determine which signals should have which weights.
- The engine uses the above data to also determine layouts, formats, and additional data that can satisfy or complement the user’s intent.
With all this in hand and with an algorithm already written, the engine only has to calculate the numbers.
They will select the different sites that can be considered for ranking, apply the weights to their algorithms and calculate the number to determine the order in which the sites should appear in the search results.
Of course, they have to do this for each page element in different ways.
Videos, stories, features, and information all change, so engines have to command not just the blue links, but everything else on the page.
Ranking the site is easy. It’s putting it all together to do it that’s the real work.
You may wonder how understanding this can help you in your SEO efforts. It’s like understanding the essential functions of how your computer works.
I can’t build a CPU, but I know what they do, and I know what features make them a faster CPU, and how cooling affects them.
Knowing this, I have a faster machine that I need to update and upgrade much less often.
The same goes for SEO.
If you understand the heart of how the engine works, you will understand your place in this ecosystem.
And this will result in strategies designed with the engine in mind and serving the real user – their user.
Featured Image Credit: Paulo Bobita