You have just acquired Better Robots.txt PRO and you are probably wondering how to use it to get the most out of it… You have done well. And that is why this article is here for you!
The first thing is to do a proper installation..
Here is how to install it on your WordPress website;
An optimal configuration of Better Robots.txt can be done in 4 or 5 steps, depending on your level of knowledge of the Robots.txt file. However, before you even do your configuration, you need to become aware of one thing: Better Robots.txt will create a virtual robots.txt file on your site (in case you do not have one). If you have any questions about this, please check our FAQ.
First step: Instructions for search engines
The first task is simple. This is to identify for which search engine you want your website to get crawled. It may seem paradoxical that this type of query is done via a file present on your site. However, if, you have not done any listings with search engines (like Google search Console, Bing, etc.), allowing search engines to index your content will be a good thing to do.
In this case, by default, these search engines will follow the following basic rule:
… Which is really the most standard instruction base for a Robots.txt. By identifying the search engines that concern you, you will at least have the assurance that your site will be indexed by these search engines. Here are the instructions that will be added to your Robots.txt according to the choices you will have made:
Step two: your Sitemap index
At this stage, there can be 3 scenarios.
- Either, you are already using the YOAST SEO plugin and you have indeed activated the Sitemap feature. In that case, you will have nothing to do. In fact, if you see exactly the message displayed in the image above (in green), Better robots.txt will have already done the job for you. Specifically, it will have detected your Sitemap index (containing all sitemaps) and will have added its URL to the first line of Robots.txt. like this:
- Either you use YOAST SEO plugin but that you did not enable the Sitemap feature. In this case, you will see a message informing you to activate it by going to YOAST SEO > General settings > Features. Once done, go back to the “settings” page of Better Robots and you will then see the same message on the image above. Better robots.txt will have detected your Sitemap index and will have added it in the robots.txt (check here for YOAST)
- OR, for some reasons that strictly concern you, you do not use YOAST SEO plugin, and you make use of any other plugin to generate your sitemaps. In this case, you simply need to identify the URL of your Sitemap, copy and paste it in the field “sitemap” of Better Robots.txt plugin. After saving, it will be added directly to the robots.txt.
Third step: block “bad” robots (scrapers)
Better Robots.txt has identified nearly thirty robots considered malicious because they copy and republish your content illegally. These robots are known to be scrapers and nowadays it is more than advisable to protect yourself from them. Better robots.txt draws its information directly from an entity, whose reputation is well-known in the industry, known for analyzing and identifying all robots active on Web (Distill Network). By simply activating the “Bad bot blocker” button, Better robots.txt will inject precise instructions into the robots.txt file preventing these robots from reading the site’s content.
Here are some code ligns that will be added in the robots.txt file:
With time and future releases (sign up for updates), Better robots.txt will provide an ever more detailed list of these malicious robots to offer you optimal protection against any form of scrapping. If you want to know more about these malicious robots, go to the FAQ section of our plugin where you will find a detailed list of each of them and links to explanatory pages.
Step Four: Set a crawl-delay
By definition, most robots have their own indexing rules to analyze and read the content of your website (Crawl budget), as Google for example (the Crawl-delay is not a function recognized and followed by Google’s indexing robots). However, this is not the case for the majority of them. Defining a general “Crawl-delay” will allow in certain circumstances to avoid overloading your servers (especially the less efficient) by some abusive robots. By default, we recommend to set the value “5”.
Fifth step: personalized rules, for PROs.
At this stage, if you are not very comfortable with the code in general and the content of the robots.txt file, do not go any further. It’s about customizing your robots.txt by integrating more specific rules. Why is it risky? Simply because the robots.txt is the only file on your site that speaks directly to indexing robots. The slightest error, here, at this level, can cause dramatic results and destroy your ranking on search engines.
However, if you are comfortable with the robots.txt, given that each website is unique, you may want to incorporate additional rules to avoid, if necessary, parts of your site being indexed by search engines. To do so, simply use the editor created for this purpose:
After you have finished configuring the Better robots.txt plugin, we advise you to clear the “cache” of your Website, to allow you to see directly all changes made in the contents of robots. txt. To access it, you just have to go to your website, “https://monsite.com” and to add “/robots.txt” after the URL, like this: “https://mysite.com/robots.txt” and you will be redirected to the page in question.