Of late, the SEO community has become very keen to learn the Python programming language to further their SEO skills. As good as a goal as this is; there are some things to consider before learning the language or indeed blindly applying it to tasks. We thought it might be interesting to look at it from the perspective of why it might be an idea to NOT learn Python this year!
Reasons to NOT learn Python for SEO in 2020
Do you actually need it? Many of the functions can be achieved already with existing tools
In the hurry to be the latest cool kids on the block, many in the industry seem to have launched themselves into learning Python without actually stopping to think if they need it or not!
Whilst Python can be very useful for enterprise level practitioners or those working with large volumes of data, most of the time, tools already in the marketer’s arsenal can actually achieve the same functions. Let’s look at some common SEO uses of Python:
Crawling for URLs and broken links – This is very easy to do with Ryte and similar tools, which most SEOs will already have anyway.
Extracting metadata – see above! Most tools will also summarise groups of metadata by elements that are too long, too short, duplicated and so on which can be formatted and exported/visualised in Python, but would require additional coding if so.
Image compression – it is possible to use Python for image compression, but it will only save 5% or so of an image’s size if standard lossless compression is used. In most cases this won’t be enough; a free online service like tinyPNG can work more effectively in terms of saving bytes on images when a small number of images are being optimised. For companies with many images to work on (large ecommerce sites for example) a cloud-based, subscription service may be more appropriate.
Internal link analysis – Many existing tools will allow you to export all of the internal links on a site, which can then be visualised in Gephi (free), with arguably less effort than using Python would involve.
Scraping – For extracting basic data from sites via browser, this can be done with the scraper extension for Chrome, Google Sheets, etc. Many tools come with custom HTML search/extraction capabilities which can fulfill this purpose well out of the box.
Google Search Console data analysis – thanks to this excellent sheet from Hannah Rampton, analyzing data from GSC has never been easier – no complicated coding needed!
URL mapping – in most cases, when doing a site migration or when changing URLs en masse, the content on the pages will probably stay the same. If this is the case, simply exporting crawls of the old and new site and matching the URLs and their respective H1 tags in Excel should do the job.
Writing ALT text at scale with machine learning – OK, fair enough, this could be useful for very large sites. However, for ecommerce sites, simply setting the product name as the ALT text at product template level is perfectly acceptable. This is because ALT text is only a small factor in on-page optimisation.
As far as SEO is concerned, understanding other programming languages can be more useful
A solid knowledge of HTML and Javascript (and to a lesser extent CSS) in most scenarios will often have more day-to-day use than a knowledge of Python. Whilst Python is handy as a supporting language to help with research, workflows and so on, it doesn’t often have direct applications. Whereas, a knowledge of frontend languages can be very useful for a number of applications (as well as making it easier to communicate with development teams) for example:
HTML knowledge
Coding/editing – despite us all living in an age of freely available WYSIWYG editors that anyone can use, being able to actually work with and edit raw HTML can still come in useful. For example, when changes are needed to a page that the functionality of the CMS editor doesn’t cover.
Source code interpretation and error diagnosis – being able to understand the HTML behind a page, and the structure of it, can have a multitude of uses. For instance, identifying bloated code in document templates, identifying server-side code ending up in the HTML, issues arising from the <head> terminating early, etc.
Javascript knowledge
Rendering – understanding how pages are rendered with Javascript and the nuances of the process (for example, how Google treats onclick/onscroll events and client-side vs. server side rendering) can be very helpful for sites built heavily or entirely in Javascript. Although search engines are getting better at the process, it’s not an exact science and it’s still much more resource intensive and complicated than parsing raw HTML, which when it does go wrong can lead to all sorts of issues with crawl budget, pages not being processed correctly and so on.
Tracking – A solid knowledge of Javascript can also be helpful in setting up and debugging tracking scripts and working with Google Tag Manager, as it can otherwise cause considerable headaches on many non-standard platforms and site configurations when problems do arise.
The above is not to say that Python can’t be learned alongside other languages, but if you have to choose between initially learning say HTML or Python, it may make more sense to stick with the base languages at first.
[showmodule id=”8378″]
Garbage in, garbage out
One of Python’s uses is as a machine learning application, and it is a good one at that. However, one disadvantage of the democratisation of machine learning and data processing is that many new users, without experience or formal training, will not understand the crucial significance of ‘garbage in, garbage out’
When a machine learning model is set live, it will faithfully attempt to process the data provided and learn from it as instructed, regardless of the quality, type, or amount of data it is given. This can cause serious problems for self-learning algorithms when such data is not sufficiently ready for processing, which can result in ‘bad data’ usually because of the reasons below:
Not enough data
Machine-based models often require large amounts of data in order to function correctly. If not provided with enough, their conclusions may not be fully accurate. This is why Google’s automatic bidding on the Ads platform requires a certain number of conversions over the preceding period to be able to function.
For example, giving an algorithm 10 pages with which to correlate the presence or absence of an H1 tag with average page ranking position would be nothing like a sufficient sample size, but 10,000 or 100,000 pages may be enough. Determining the ‘right’ amount of data to use can be tricky, but in general the more the better, as long as it is accurate and relevant to the task in question. Which brings us neatly along to….
Irrelevant data
When working with machine learning models, it can be tempting to simply throw as many different variables/sets of data into the mix as possible. This can cause issues in terms of Python (or any other program for that matter) becoming confused and creating ‘overfitted’ results. Let’s take the above example of correlating SEO factors with rankings above. If we included the below in our training data (all of which are not ranking factors);
- Website traffic
- Meta description length
- Google Ads budget
- What page title separator is used (dash, pipe, etc)
This would almost certainly create false positives, where Python would, for example, find a correlation between Google Ads budget and ranking position which realistically is not likely to be true (this would also be referred to as a spurious correlation) in this case. Websites with higher Google Ads budgets are more likely to also have the following:
- Higher SEO and marketing/PR budgets
- Dedicated in-house teams
- They may be larger brands, who may be able to both actively and passively obtain more search real estate than smaller brands
- More data that they may be able to use (for example, conversion data on search terms which could then be used to optimise organic pages)
In this instance, the above factors are more likely to be the cause of superior search performance, as opposed to the alternative, and the more dubious conclusion of ‘a higher Google Ads budget correlates with higher rankings’.
Conclusion: Python is worth it, but it’s not a magic bullet
We don’t wish to pour water on the merry blaze of Python adoption by the SEO community. It is being put to excellent use by a number of individuals. However, it may be worth taking a step back before taking the plunge of learning the language to see if learning it now would be time well spent. To recap, for those looking to do tasks with Python who don’t already, you could ask yourself:
- Can I achieve the task with what I already have?
- Could knowledge be improved in more useful ways by studying elsewhere?
- Do I understand how the data works well enough to not generate bad results?