An idea, original, crazy some would say but which gradually, for more than 6 months, is making its way. How could a simple website help fight global warming?
To understand it, you have to immerse yourself in the world of the Web, search engines and how they work. Every day, new websites appear on the Internet that contribute to enriching, regardless of the content, the information available online. Thanks to search engines, whose mission is to crawl & index content in order to make it accessible online via simple requests, anyone, no matter where in the world they are, can access the most varied content, on their computer or cell phone. This is the beauty of search engines, of which Google is the most effective representative.
However, the process of crawling a site, which, it should be noted, is carried out continuously (all the time, or almost all the time) in order to make published information (including the most recent) accessible to anyone, is particularly energy consuming! This involves crawlers (spiders) visiting your website, from link to link, to detect the presence of new pages, content, changes to represent them as accurately as possible in search engine result pages (SERP). As such, Google, which has become in a few years the most widely used search engine in the world, excels in its ability to provide the most appropriate and relevant information according to the requests made by users. It is a clever mix of technology, algorithms, servers, power, etc. that allows your last article published on your site to be, in a few days, read, organized, referenced and made available in a few clicks to the first visitor interested in your subject.
And this work is titanic. To give you an idea, Google conducts more than 3.5 billion searches per day on behalf of its users, making them the main culprit, up to 40%, in the carbon footprint of the Web in general. In 2015, a study established that web activity in CO2 production (in terms of the use of millions of servers, cooling systems, etc.) was equivalent to the production of CO2 from the aviation industry worldwide.
In fact, for your information, in the few seconds it took you to read these first lines, Google will have already emitted more than 40 tons of CO2:
That is 500Kg of Co2 produced per second….
Even though Google is aware of its carbon footprint and its founders implemented less energy-intensive processes early on in the development of their data centers, including investments in clean energy and numerous carbon offset programs, Google’s infrastructure still emits a significant amount of CO2. And unfortunately, growing every year.
But what can you do about it? After all, you and your website cannot be held responsible for this. You are doing your part and a priori, you have not asked Google for anything even if, in reality, you depend on it considerably (for traffic on your site).
In fact, what you could do, which could impact globally (if all users did) the production of CO2 emitted by Google to read the Web, organize the information and allow users access to it, would simply be to simplify the work that Google has to do, through its indexing robots (crawlers), when they visit your website.
You may not know it, but your website is not limited to the pages you create with your content, nor to what is visible in search results. Your site contains an astronomical amount of internal links, intended strictly for its operation, to generate interactions between pages, to filter results, to organize content, to allow access to certain limited information (useful for your developer but not for your visitors), etc. And when your site is made available for crawling by search engines (so, concretely, when your site is published online), crawlers systematically try to visit all the links it contains in order to identify the presence of information, index it and make available. But, and this part is important, exploring/crawling your website (which, let’s remember, is almost continuous) requires power, a lot of energy, both from crawlers (search engine servers) and also from your own hosting server.
That’s why, very early on, in the continuous improvement processes of its crawling system, Google, for example, defined a limit to the number of links a robot could explore in a session. And the reason is simple. The power required by indexing robots to explore your site directly impacts the performance and efficiency of your website. In other words, your website, during the process in question, if your hosting is limited (which is very common), is slower to load the pages and content it contains when visited (the processor and RAM being directly impacted).
However, this limitation, called “crawl-budget” is not a general rule applied by all (known) search engines and certainly not by the thousands of “web” robots (“scrapers”) continuously visiting, copying, analyzing, the Web in general. Nowadays, more than 50% of the world’s traffic on the Web is generated by… robots. And not humans. So we are not alone.
Last but not least, it is very common, sometimes due to the colossal size that some websites can have (online store, etc.) that crawlers are “blocked” in certain parts of a site. In other words, indexing robots crawl a huge amount of links (sometimes infinite, if there are loops), non-essential, specific for example to online calendar features (where every day, month and year are pages, explorable) and from which it is no longer able to “leave”, due in particular to its limitation to the number of links it can explore, thus having the direct consequence of impacting the overall exploration of your site and finally, the accessibility of your important pages (articles, services, products, etc.). This may explain, for example, that even after several weeks, some recently published content is still not visible in the search results (crawlers, without precise instructions, explore all the links it finds, even if they have no interest whatsoever for you).
So, if you could accurately tell crawlers, whoever they are (Google, Bing, Yahoo, Baidu, etc.) what they can explore and what is not necessary for your visibility, you could both ensure better performance for your website but ABOVE ALL, significantly reduce the energy (and therefore power consumption) required by your hosting server, Google and all other exploration entities on the Web.
Admittedly, on an individual basis, this “optimization” represents only a small amount compared to the giants of the Web. But if everyone, on a global scale, with the hundreds of millions of websites available, participated in the movement, it would generate a real impact on the intangible that is the Web. And thus, reduce electricity consumption and ultimately CO2 emissions.
So what can we do?
This optimization mission is the one that PAGUP, a Canadian SEO agency specialized in search engine optimization systems, has adopted by creating a “plugin” specifically dedicated to WordPress (nowadays, nearly 28% of the Web is created from WordPress, 39% of all online shops on the Web are made from WooCommerce (WordPress) and WordPress represents almost 59% of the CMS market share worldwide) allowing FOR FREE, very simply and in a few clicks, the optimization of a file, called the Robots.txt.
The “robots.txt”. As incredible as it may seem, this whole crawling operation is done through a small file that each website (no matter what it is) has on its root directory (on the hosting server). This file has only one simple role, that of communicating with search engines. In fact, it is so important that when your site is displayed in a browser for a visitor, it is the very first file that is loaded. Just like when indexing robots explore your website, it is the first file that will be searched first, then read… to know exactly what to do with your site.
So, this file, quite simple, is of great use. So much so that a single line can totally exclude your website from exploration by ALL search engines.
Would you like to see your own content? To do this, simply go to your website and address bar, add “Robots.txt” after the “/” and there you go!
It’s in this file that everything is played out.
It is precisely by inserting precise “instructions” that it is possible to inform crawlers as to what they can read/index or not. The plugin in question, called Better Robots.txt, which had more than 10k downloads in 6 months, makes it possible, quite easily, to produce a robots.txt file optimized specifically for any WordPress site by refining the indexing work that exploration robots will have to perform for most search engines and a large number of other entities.
And it is free…
You now have the opportunity to maximize your website’s exploration work, reduce your site’s ecological footprint and, at your level, reduce the greenhouse gas (CO2) production inherent to its existence on the Web.
In any case, you & we all benefit.