All you need to know
Crawling and indexing websites is the first step in a complex process of understanding what web pages are in order to present them as answers to user queries.
Search engines are constantly improving the way they crawl and index websites.
Understanding how Google and Bing approach the task of crawling and indexing websites is helpful in developing strategies to improve search visibility.
How Search Engines Work Today: Indexing
Let’s look at the inner workings of how search engines work.
This article focuses on indexing. So, let’s dive in…
Indexing is where the ranking process begins after a website has been crawled.
Indexing basically refers to adding a webpage’s content to Google to be considered for rankings.
When you create a new page on your site, there are several ways to index it.
The easiest way to index a page is to do nothing.
Google has bots that follow links and so provided your site is already in the index and new content is linking from your site, Google will eventually discover it and add it to their index. More on that later.
How to get a page indexed faster
But what if you want Googlebot to get to your page faster?
This can be important if you have timely content or have made a significant change to a page that you want Google to know about.
I use faster methods when I’ve optimized a review page or adjusted the title and/or description to improve clicks. I want to know precisely when they were fetched and displayed in the SERPs to know where the measurement of improvement begins.
In these cases, there are a few additional methods you can use.
1. XML Sitemaps
XML sitemaps are the oldest and generally reliable way to get a search engine’s attention to content.
An XML sitemap gives search engines a list of all the pages on your site, along with additional details about it, like when it was last modified.
A sitemap can be submitted to Bing through Bing Webmaster Tools and it can also be submitted to Google through Search Console.
But when you need a page indexed immediately, it’s not particularly reliable.
2. Request indexing with Google Search Console
In Search Console, you can “Request Indexing”.
You start by clicking on the top search box which reads by default, “Inspect and URL in domain.com”.
Enter the URL you want to index, then press Enter.
If the page is already known to Google, you will be presented with a bunch of information about it. We won’t cover that here, but I recommend you log on and see what’s there if you haven’t already.
The important button, for our purposes here, appears whether the page has been indexed or not – meaning it’s good for content discovery or just asking Google to understand a recent change.
You will find the button as shown below.
Within seconds to minutes, you can search the new content or URL in Google and find the change or new recovered content.
3. Participate in Bing’s IndexNow
Bing has an open protocol based on a push method for alerting search engines to new or updated content.
This new search engine indexing protocol is called IndexNow.
It’s called a push protocol because the idea is to alert search engines using IndexNow to new or updated content that will entice them to come and index it.
An example of an extraction protocol is the old Sitemap XML method which depends on a search engine crawler to decide whether to visit and index it (or be retrieved by Search Console).
The advantage of IndexNow is that it wastes less web hosting and data center resources, which is not only environment-friendly but also saves bandwidth resources.
The biggest benefit, however, is faster content indexing.
IndexNow is currently only used by Bing and Yandex.
Implementing IndexNow is simple:
4. Bing Webmaster Tools
In addition to participating in IndexNow, consider a Bing Webmaster Tools account.
If you don’t have a Bing Webmaster Tools account, I can’t recommend it highly enough.
The information provided here is substantial and will help you better assess issues and improve your rankings on Bing, Google and elsewhere – and likely provide a better user experience as well.
But to index your content, you just need to click on: Configure My Site > Send URLs.
From there, you enter the URL(s) you want indexed and click “Submit”.
So, that’s pretty much all you need to know about indexing and how search engines do it (keeping an eye on where things are going).
More details on the Bing Webmaster Tools URL Submission Tool help page.
There is also a Bing Webmaster Tools indexing API that can also speed up the time content appears in Bing search results within hours. More information on the Bing Indexing API here.
You can’t really talk about indexing without talking about the crawl budget.
Basically, crawl budget is a term used to describe the amount of resources Google will spend to crawl a website.
The allocated budget is based on a combination of factors, the two main ones being:
- Your server speed (i.e. how much can Google crawl without degrading your user experience).
- How important is your site.
If you run a major news site with constantly updated content that search engine users will want to know about, your site will be crawled frequently (dare I say…constantly).
If you run a small barbershop, have a few dozen connections, and are rightly not considered important in that context (you may be a major hairdresser in the area, but you’re not not important as far as the crawl budget is concerned), then the budget will be low.
You can read more about crawl budgets and how to determine them in Google’s explanation here.
Google has two types of crawling
Indexing by Google starts with crawling, which has two types.
The first type of crawling is discovery, where Google discovers new web pages to add to the index.
The second type of crawl is refresh, where Google finds changes in web pages that are already indexed.
Find out how search engines work
Optimizing websites for search engines starts with good content and ends with submitting it to be indexed.
Whether you do it with an XML sitemap, the Google Search Console URL Submission Tool, Bing Webmaster Tools or IndexNow, indexing this content is where your web page begins its scroll to the top of the search results (if everything works!) .
That’s why it’s important to understand how search indexing works.
How do search engines work discusses how search engines work and the key factors that influence search engine results pages.
Download them here.