Welcome to this collection of posts exploring the basics of SEO! Read our introduction to the series here.
For our first post of the series, we’ll be looking at the first steps with technical SEO. Why start here? Well, although creating high-quality content that meets the needs of your users should be one of your main priorities, if your technical SEO is set up incorrectly your site may encounter one or more of these revenue-killing problems:
- Google might not be aware of your site – if Google can’t find it themselves, needless to say that no-one using Google will be able to either!
- Google may be blocked from crawling (reading) the site or indexing the content. This will lead to the same problem as the above.
- The site may be very slow which will frustrate users, leading to lower conversion rates.
- Pages may be broken on the site which leads to a poor user experience and site abandonment.
Happily, by using free tools from Google, we can examine the above and more in order to create a healthy website in no time. This is what we’ll be going through in this post:
- Checking the robots.txt file
- How to get your website indexed in Google
- Checking that Google can find, render (paint) and index (store) your site’s content correctly
- How to check for problems with page loading speed
- Making sure your site is mobile-friendly
- How to find error pages (404 status code and similar pages)
Let’s get started!
Robots.txt
First, we’re going to check our website’s robots.txt file to make sure Google is allowed to crawl the website we’re working on. This is a key step, because search engines will not crawl and index content from a site that they’ve explicitly been told to leave alone. They will treat it as an instruction and obey it as such, regardless of whether the instruction was accidental or not. We’ll be covering how to make sure your site can be visited by search engine bots in this section.
What is a robots.txt file?
A robots.txt file is a text file that sits at the root of a website, usually under domain.com/robots.txt. Its purpose is to instruct search engines where they are allowed to go within the website and also serves as a location to place the website’s sitemap.xml file.
Do I need one?
If you have a small website (under 50 pages) and you’re happy for search engines to see all of your website’s content, then you can skip this part! It’s important to remember that if there is no robots.txt file in place, search engines will assume they can crawl all of a site’s content.
What a typical robots.txt file looks like
Usually, a robots.txt file for a simple, active website should look like the below:
Let’s break down what the text in these files means first:
User-Agent
This specifies which user-agent (search engine bot) the instructions are meant for. An asterisk (*) indicates that the command is for all search engine spiders. Alternatively, it could make specific references to Googlebot, Bingbot and so on, depending on the setup of the file. The majority of the time, User-agent: * will be the version encountered.
Allow: and Disallow:
These rules specify where on the site search engines are allowed to go. By default, they will assume all pages can be visited, and will only not crawl a page or sub-folder (for example, a category on an e-commerce site) when explicitly told not to. For this reason Allow: is technically not needed, and Disallow: is the more common rule. To block a part of the site from being crawled, simply leave a space after the Disallow: part and then paste in the path of the URL without the domain part, like this:
Disallow: /exampleURL
You can do this as many times as you like. Bear in mind that you can simply block a whole folder or category rather than having to block URLs on an individual basis, if needed.
Sitemap:
The other use for a robots.txt file is to act as a place to put a link to your sitemap.xml file to inform search engines about your content, the usage of which we’ll cover below. The format, as in the example above, is simply:
Sitemap: (full URL of your sitemap)
Other Robots.txt configurations
What you don’t want to see, assuming you want your website to show up in Google, is the below:
What this means is that all search engines are forbidden from crawling the root folder (homepage) of your site, and by extension any page below it in the site’s hierarchy i.e. search engines are being explicitly told to not visit any page on your site! This can happen when a staging or development site is blocked from being indexed during development and the disallow rule isn’t taken off afterwards. Fortunately, this is very easy to fix – just access the file through the your back-end of your website and delete the offending Disallow: / line!
[showmodule id=”8378″]
Getting specific
Under certain circumstances, certain sub-folders or sub-sections of the site are blocked as below:
This is usually fine, as there may be certain areas of a site that you may not want Google crawling (e.g. sections of the site meant purely for users for example, or faceted navigation creating lots of duplicate pages). A BigCommerce robots.txt file would be an example of a well-configured file using more specific exclusions:. Below, user-specific pages (checkout process pages, filters, search pages, wish-lists) are all blocked:
It can be a good idea to examine the pages/folders that have been blocked and weigh up whether they should be or not. For example, on an e-commerce site with tens of thousands of pages worth of filtering URLs on, this may be worth doing.
It’s also worth mentioning that Style Sheet or CSS (aesthetics) and Javascript or JS (functionality) files should never be blocked in robots.txt files, as this will interrupt Google being able to read the page properly.
Now that we’ve got the robots.txt file checks out of the way, we can get our site indexed in Google!
How to get your website indexed in Google
Getting your website seen by the search engines is always the first step of any SEO strategy. The best way to do this is to tell them about the content yourself! To achieve this we’ll be using XML sitemaps in combination with Google Search Console.This is an easy way of informing the search engines of the content on your website.
What are XML Sitemaps?
An XML sitemap is a search engine readable list of a website’s pages which can be used to notify them of its content. Usually, a website should have a sitemap.xml file already live and they tend to be found in URL locations like;
https://www.example.co.uk/sitemap.xml
Sitemaps sometimes have different naming conventions depending on the platform such as;
/sitemap_index.xml
For sitemap indexes generated by the popular Yoast WordPress plugin above, it should be easy to either find in the back-end, or your web developer should know where it is.
What if I don’t have a sitemap file?
It should be straightforward for your web developer to generate a sitemap.xml file from the list of the site’s page that they will have. Alternatively, you can generate one yourself (this is something we’ve written about before here).
If your website is very small (i.e. a dozen or so pages) then as long as all the site’s pages are linked to from the homepage, a sitemap isn’t essential. Once the homepage is indexed, the rest of the site’s pages should be as well.
Telling Google about your pages
First, you’ll need to set up Google Search Console if you haven’t already (official documentation here) and once that’s happened, navigate to the ‘Sitemaps’ section as below:
From here, simply paste in the URL of your sitemap (without the domain part) and press submit. Job done! We’ll be revisiting this shortly. It may take Google several days to find your sitemap file, so don’t worry if your site isn’t indexed immediately.
[showmodule id=”8378″]
Can Google render and index your pages?
Although we won’t get too deep into boring technical details here, Google’s indexation process essentially has 3 stages:
- Crawling (reading the page’s code)
- Rendering (building up a snapshot of the page, essentially taking a picture of it)
- Indexing the page (storing it in their databases so people can find it)
We’ve already covered the robots.txt file, and as long as your web pages don’t have any elements on them that would stop indexing (for more details see our post on having an indexation checklist) the only step we haven’t otherwise covered is rendering.
Why is rendering important? Well, simply put, if Google can’t build an accurate picture of your site’s pages, then your rankings and traffic can suffer. This is because Google subsequently can’t fully process the content on the page (and may assume, correctly or incorrectly, that the page doesn’t display properly for users).
The above scenario previously happened to one of our e-commerce clients, where some custom JavaScript code used to create a fading-in effect when users arrived at a category page was causing Google to think the products weren’t all displaying on a page. This was happening as it couldn’t understand the fading-in process, which broke the rendering. This was fixed without too much of a problem, but it just goes to show that unexpected issues can arise from time to time!
To check Google can render pages, all you need to do is pop over to Search Console and paste in a URL in the top search bar. This will display the following screen (if the page has been indexed);
Now click on ‘view crawled page’ and then ‘screenshot’ on the right. Then click on ‘test live URL’
This should then generate a screenshot that looks like the below:
You can then scroll down and see what Google thinks the page looks like. Click on ‘more info’ to get some additional details about the page:
What you’re looking for here is any indication that the page isn’t loading as expected which could include:
- Content not appearing on the page as expected
- Incorrect text, image and video layouts
- Large numbers of resources not loading on the page, or lots of errors in the Javascript console log (both under more info)
We’d advise doing this process several times on different types of pages to makes sure that each type of page (e.g. blog post, product page, category page, homepage) can be understood by Google.
Pagespeed
Measuring Pagespeed
An easy way to measure your site’s pagespeed in aggregate is to use Google Analytics (if you don’t have it set up, the documentation on how to do so is here). We’ll be using this initially, as other pagespeed tools, including Search Console, can have a steep learning curve. At this stage, we just want to run some quick checks to see if there are any major issues we should be aware of.
Once set-up, go to Behaviour > Site Speed > Overview, as below, and set the date range to the last 30 days:
In terms of what the data means:
Avg. Page Load Time (sec)
Simply put, how long it takes for the page to fully load in the browser. Anything between 2 to 4 seconds is fine (some studies have shown that extremely fast loading times can actually reduce conversion rates due to it being ‘jarring’ to visitors. A site such as Google doesn’t suffer from this, partly due to immense brand strength and also due to a very simple interface that minimises cognitive effort).
Avg. Redirection Time (sec)
The average time it takes for a site’s redirects to function. Although beyond the scope of this article, a high figure here (anything over 1-1.5 seconds) might mean that there are redirect chains that need to be dealt with.
Avg. Domain Lookup Time (sec)
How long it takes for the site’s DNS (domain name system) to be contacted. This should be very short (0.01-0.1 seconds).
Avg. Server Connection Time (sec) and Avg. Server Response Time (sec)
How long it takes to establish a connection to the site’s server and receive a response from it, respectively. If this is under a second, this should be fine. However, any figure much above this might indicate a slow or incorrectly configured server which may need looking into. This part of pagespeed is particularly important where search engines are concerned. They will carefully monitor your site’s server when crawling and ‘back off’ if the response times are slow. This may mean that as well as being slow, your content may not be indexed correctly.
Avg. Page Download Time (sec)
For new sites, there often won’t be much by way of data initially. You may have to wait a few weeks or a month or so for some data to build up. For older sites, that don’t have a huge amount of traffic, you might need to set the date range further back, say 3 or 6 months. In the meantime, asking friends or family to visit the site beforehand (who haven’t opened it on their computers or phones, so they’re effectively pretending to be real, new users) can be a useful way of obtaining some empirical information about how well the site works. If it does seem to be slow, your web developer should be able to help you. However, there is one helpful action that can be taken that requires minimal technical knowledge, yet may have a big impact….
Improving pagespeed, the easy way
Improving the pagespeed of a website can be complicated, but time and time again the most common (and easily solved) issue we come across relates to images. We’ve written about this before in our image optimisation post, which is something easy and relatively non-technical that can be done to boost pagespeed without getting developers involved (as we’re reducing the overall ‘weight’ of the page, this will help browsers to load it quicker) although it is worth saying that doing this manually might not be an option on larger websites with thousands of products or landing pages!
It is worth ensuring that content creators for the site’s blog compress images before they upload them, so the blog pages load smoothly. High definition images can look great, but when you have several of them on one page this leads to serious speed issues if left unchecked! If you’re concerned about images looking poor after they’ve been compressed, don’t be. As the below image from TinyPNG shows, image compression technology is good enough now that the naked eye will find it very difficult to tell a compressed image from an uncompressed one:
Making sure your site is mobile-friendly
It’s essential, both now and in the future, for your site to be mobile-friendly. This is a key metric for search engines and, more importantly, if your site is difficult to use on a mobile, your users may become frustrated and leave without converting or purchasing. More users than ever are using their smartphones to engage with companies. Therefore, it’s vital to make sure they have a good experience whatever device they use to interact with your site. The majority of modern development frameworks are mobile friendly by default, but if your website is using an older code base or has been custom-built, it’s possible it may not adhere to mobile-friendly guidelines.
Search Console has a great feature called mobile usability which lets you see at a glance how the site is doing in this regard:
Here, pages are divided into errors and valid pages. In an ideal world, we obviously want to see that all our pages are valid, but there may be occasions where certain pages, or even the whole site, are not mobile friendly. These are broken down into sub-sections of which the below are the most common issues:
- Text too small to read
- Clickable elements too close together
- Content wider than screen
These are fairly self-explanatory. Clickable elements too close together just means that buttons, links etc are spaced too near to each other to be usable on a mobile. If they all appear together on multiple pages, this is an obvious sign that the site has not been developed with mobile-friendliness in mind. Other, less common issues are explained by Googles’ Mobile Usability Report Guide.
[showmodule id=”8378″]
How to find error pages (404 status code and similar pages)
When users and search engines land on broken pages, it can be a frustrating experience for them (although this can be mitigated from the point of view of a user by having a well-designed, custom 404 page), so identifying and fixing them as appropriate can help keep your site well-maintained for visitors.
In Search Console, if we navigate to ‘coverage’ and select both ‘error’ and ‘excluded’ below we can find instances of error pages if they appear:
If you’ve come back to Search Console since submitting your sitemap and notice that Google doesn’t appear to have indexed a number of your pages – don’t worry. This is quite common and usually easily resolved. Google’s index coverage report documentation details the kind of errors that can be encountered when they crawl your pages which may help to diagnose any issues your site may have on this front.
For now though, let’s stick to identifying pages that might be broken (showing ‘page not found’ errors, server errors and so on) using Search Console. The kind of items in the report we’re looking for are:
Server error (5xx)
This happens when the server generates an error when a page is accessed. This can have a number of causes such as:
- The usage of ‘Illegal’ characters (not allowed by web browsers, such as spaces)
- Misconfigurations or bugs with the site’s server
- Access permissions not being set correctly
Unlike 404 errors, these are not easily fixable by non-development teams.
Submitted URL seems to be a soft 404
A ‘soft 404’ is where the page says it’s working (giving a 200 OK status code to the search engines) but is actually an error page. This is problematic as search engines generally don’t like encountering these when they expect to see a working page. From a user’s point of view, however, these are effectively the same as regular 404 pages. This is typically something for developers to fix, as it will entail changing how the site serves broken pages to users and search engines.
Submitted URL not found (404) and not found (404)
Google may have found URLs in your sitemap that are broken (404) pages and it may also find other error pages not included on the sitemap when crawling. These are generally the easiest form of error to fix. This usually involves:
- Making pages live that have been put into drafts or temporarily disabled (For example, WordPress default pages to 404 if they were live, but then reverted to draft status)
- If the page doesn’t exist anymore, it can be 301 redirected to the nearest equivalent page (preferably an equivalent product or landing page).
When not to fix 404 pages
Although it’s usually advisable to fix 404 errors when they crop up, in certain circumstances it may not be advisable to do so. For example:
- When there is no suitable page to redirect to. Search engines generally frown on redirects to non-relevant pages, particularly if error pages are redirected to the homepage. In cases such as this, they may simply ignore the redirects or, at worst, actually reduce traffic to the pages in question.
- Your e-commerce store has clearly-defined ‘out of stock’ content and CTAs (Call-To-Actions) for users and structured data to notify the search engines, but ‘out of stock’ pages serve a 404 as well. In this instance, although the creation of a soft 404 is not ideal, the content on the page is suitable and if the soft 404 issue was resolved this would be fine.
Other possible errors
We might also encounter the following:
“Submitted URL blocked by robots.txt” and “Submitted URL marked noindex”
Both of these can occur when URLs in the sitemap have been blocked by the robots.txt file or set to noindex, both of which may prevent the pages in question from showing in search.This should be easily fixed, either through your developers or your site’s back-end. Robots.txt files, as mentioned above, are easy to edit and most CMS systems will have check boxes on a page-by-page level as to whether a URL has a noindex tag applied or not, which is easy to remove.
That concludes our introduction to technical SEO. We hope you found it useful! Join us next week for part 2 where we’ll look at keyword targeting, onsite SEO and content. See you soon!