, Better Robots.txt

Hello and welcome, Today I’m going to show you how to create robots.txt file and also sitemap.xml file for seo purpose... ▼

Hello and welcome, Today I’m going to show you how to create robots.txt file and also sitemap.xml file for seo purposes. Best way for me to demonstrate how to create these two particular files is to give you an example. I’ve got www.rankya.com.au open now if I was to simply type robots.txt file, you will see that it is simply a text file with some user agent directives. Now as far as we are concerned, we need to put this particular information into this file, because Google will honor this particular directive, now if we didn’t have this directive, then Googlebot will not honor it. What this is saying is to Googlebot, saying “ok when you come visit my website, I will disallow you to index these particular folders” now you can have files, you can disallow some file, as in some html files, php files and so on, or you can also, disallow some folders. Now why would you want to do that? Is because, let’s say that you have a privacy policy file, or you got some terms and conditions file on your website, which most websites do have, now I don’t really want to rank for these two particular files, you know they are important for my customers, but for ranking purposes, for seo purposes, they are not that important for Google, they are important for my potential customers, but once again we are in the business of Google ranking, therefore, we need to be intelligent, as to what we want to rank for, and what we want to disallow Google to rank us for and so on. Therefore we need to have this directive targeting Googlebot specifically, if not, Googlebot will simply index all your pages and so on. Another important element of robots.txt file is the URL for your sitemap, that’s why we will talk a little about sitemap file, how it is important and also we’ll look at how to create proper sitemap files. There are many online generators for creating sitemaps, but I will give you as usually, I will give you important knowledge for creating sitemaps. Therefore you need to have a sitemap; it is intelligent for your Google rankings to have a sitemap, if you haven’t got one, simply go online, do bit of research and generate your xml file, and then simply upload it on to your root folder of your website, ok? But this is saying, let’s go step by step, when Googlebot comes and visits your url, what it will do is, it will first look for this file, and when it does, it’s going to say “has it got any information for me? hmm, it has” therefore, it will honor this, right? But also, since it is still reading it, it’s going to say “hmm there is a sitemap here” see? and then it is aware of that url therefore it will then visit that particular url. So let me now quickly open that particular file, to show you, what this is all about right? Now, this particular file contains some information in the form of xml, and what I’m saying here is “I have a url which is my main url, it has been modified at this date, it gets changed weekly, and it has a priority setting set as 1, now that’s the highest that I can have, surely this is my home page, therefore, it is intelligent to have my home page as the most important page, therefore, I say it in that priority setting. Ok? now that’s kind of important, and also, a lot of people still make an honest mistake, as to, they think that if they were to have that information as daily because which you can, they think it is intelligent for them, now, the only time you should have that changefreq set to daily, is if you change you content daily, and if you don’t change it daily, and if you change it weekly, then you should have it weekly, and if you change it monthly, then you should have it set to monthly. It’s that simple. Also, as you can see there is another page, surely rankya.com.au at this stage is a small website; you may have hundreds of pages, nonetheless, sitemaps are important for you to create now, once again, when we look at this url set here, we need to have a look, we need to be smart here, when we look at this, we need to think “hmm ok, my home page is the most important, but the way that I structured rankya.com.au, as far as I’m concerned, this particular page is the second most important” therefore, I set that priority, as to point 9, that’s my most important page most important resource, and this is my second most important resource ok? And this is how we do it we say it here ok? And also, this particular url, I’m saying well then that one comes as far as priority settings are concerned. And if scroll down you will see that the way I structured it, is I’m going from 1 point 0 to lower ok? so that kind of important, as you can see I do not have my contact page here, as url set, now there is nothing wrong having that, but we are in the game of ranking, therefore we need to optimize these particular files in such way, that it serves their purpose, ok? And this is a, perhaps you know, quick example as to what these pages are all about, we need to be intelligent as far as setting these, a lot of sitemap generators you will come across, they will simply have information as to, it will have 0 point 5, and that’s like neutral it’s like baseline it’s like middle line, ok? Therefore it’s up to you to then go ahead and start thinking a little saying “ok I have this website” surely all your pages are important to you, but, you know once again, you are in the ranking game, right? Therefore, you then got to sit down and think “hmm, ok, I know that all my pages are important, but, which ones are more important than the others?” because when you start thinking like that, you’ll end up coming with some that you will set higher priority than the others, ok? So that’s very important. And once again, you can simply Google “robots.txt Google webmasters” to find out little bit more information about this particular file, and also, you can visit, sitemaps.org website and read more about it, because these are important elements for Google ranking, surely they are not the only part of seo, nonetheless, having these simple procedures set properly, will only benefit your Google rankings, ok? Also you may come across information as to, telling you these particular files are not important. I promise you and I assure you, do not listen to misinformation like that, because they are important. I hope you have enjoyed watching this particular video, and if you would like to find out more about search engine optimisation methods, simply visit us at rankya.com.au thank you very much had to create a robots.txt file for google hello and welcome back again as we now know that it is important for us to make Google's crawling process to find what we have on our site easy that's why we need to use robots not text file so that we can do that I am looking at google webmaster tools Help section which actually tells you more details about different user agent Google users on internet ok so you can read more better so let's go quickly and let's first copy this couple of lines of text here and when we do all we have to do is you know find our root folder as we've been using here and simply open up a notepad with Windows it comes prepackaged as we know and let's paste that what we've copied in it and let's save as let me put this here so we see robots dot txt file ok simple is that now we've created one as you can see we have one nail right so let's open that up and let's say ok why are we using this now we know why we're using it ok and what you're seeing on the screen now basically tells Googlebot you know what Googlebot when you come to my website I disallow you not any pages so therefore it can you know index everything ok but what if what if I put this forward slash here and then save this and then upload this robots.txt file to my server with this just one line of code I basically told Google to not index my entire website so therefore I've seen this happen meaning don't make an honest mistake too that forward slash and then upload okay because that's not how it works if you want Google to index everything simply leave it as you're seeing on the screen and that's it okay but what if you want to you know block Google to crawl let's say private folder okay let's say if you want to do that then then you put the forward slash and say okay you name you give the name of your private folder so if you then upload this file as you're seeing on your screen right now you're then telling Google ok Google you know what you can index my entire website but whatever is within this folder don't index it okay that's how you use this robots.txt file if you want to block certain things in this example it's the entire folder and whatever is within it it will not crawl it or index but you can also let's say you have a specific file let's say you have a local folder and within that you have a private HTML file well if you don't want Google to just not index that and crawl that then you can use it as such as well so now that's how we create and use this file if you want to block certain things but because we should not want to block certain things ok let's save it as such and that's the default then it will index everything and crawl everything now but as we've seen in the previous video session we can X actually say ok Google you know what now you came to fetch robots.txt file from my server just I'm gonna make your job easy and I'm gonna actually tell you where my sitemap is located and then let's save this so let's open up that file in Dreamweaver okay then what you're saying with this now is you say ok Google I allow you everything and here's my sitemap go and get that and then you put this online so once this is uploaded you can do so uploaded in your hosting account simply locate your file manager and then it has to be inside public underscore HTML folder right as you can see for it to work it has to be there now one more thing before we close this video session is you may have you know let's say you showcase photographs and you your online business revolves around if that's the case it's only intelligent for you to create image sitemaps right so you can actually give two different sitemap locations to Google in your robots.txt file but what if you know your website revolves around video tutorials and so on and why not then create a sitemap for your videos because Google can understand all that stuff right to make Google job easy is the aim that we should have when we want to rank our keywords so therefore using robots.txt file we are making Google's job of understanding our website easy that's why we use it thank you very much I'll talk to you in the next video session >> CUTTS: Okay. I wanted to talk to you today about robots.txt. One complaint that we often hear is, "I blocked Google from crawling this page and robots.txt and you clearly violated that robots.txt by crawling that page because it's showing up in Google search results." A very common complaint, and so, here's how you can debug that. We've had the same robots.txt handling for years and years and years. And we haven't found any bugs in it for several years, and so, most of the time, what's happening is this. When someone's saying, "I blocked example.com/go in robots.txt," it turns out that the snippets that we return in the search results looks like this. And you'll notice, unlike most search results, there's not some text here. Well the reason is that we didn't really crawl this page. We did abide by robots.txt. You told us this page is blocked so we did not fetch this page. Instead, this is an uncrawled URL. It's a URL reference. We saw a link to it, but we didn't fetch the page itself. And so, because we didn't fetch the page itself, that's why you don't see a description or some sort of snippet right in here. So it's kind of interesting because people often ask, "Well, why do you show uncrawled URLs? What's the possible use case for that?" And let me take you over here. At one point, the California Department of Motor Vehicles, which is www.dmv.ca.gov, had a robots.txt that blocked all search engines. Now in these days, pretty much every site is savvy enough, you know. At one point, the New York Times and eBay and a whole bunch of different sites would use robots.txt. So if someone comes to Google and they type in California DMV, there's pretty much one answer and this is what you want to be able to return. So even though they were using robots.txt to say, "You're not allowed to crawl this page," we still saw a lot of people linking into this page and they have the anchor text California DMV. So if someone comes to Google and they--they do the query, California DMV, it make sense that this is probably relevant to them. And we can return it even though we haven't crawled the page. So that's the particular policy reason why we can sometimes show uncrawled URL, because even though we didn't fetch the URL itself, we still know from the anchor text of all the people that point to it that this is probably going to be a useful result. Now the interesting thing is suppose you have a site like Nissan. For a long time, Nissan, also Metallica, use robots.txt and had blocked all sites from being crawled. This was years and years and years ago. Again, what we found is that we can go and find information in the open directory project where Nissan and metallica.com were both mentioned in the open directory project. And so sometimes, you'll see a snippet that looks almost like it was crawled. But this description does not really come from crawling the page. It comes from something like the Open Directory Project. So you can get--we are able to return something that can be very helpful to users without violating robots.txt by not crawling that page. Now if you truly don't want a page to show up, one of the best things that you can do is let us crawl it and then use a "no index" Meta Tag at the top of the page. When we see a "no index" tag, we'll drop it from our search results completely. Another option you have is you can also say, "Use the URL removal tool." So if you block a site completely in robots.txt, then you can use the URL removal tool and remove an entire site from Google's index. And then it will never show up in that way as well. But it turns out for users being able to return these uncrawled URLs can be very useful. That's the reason why we do it and most of the time probably 90% of the time when someone says, "You're violating my robots.txt. You've clearly crawled these pages." What's really happening is we're able to return that uncrawled URL reference. And--and so that's what's going on. It's not that we've crawled those pages. So those are a couple of easy ways that if you don't want your sites or your page to show up you can block us in robots.txt and use the URL removal tool, or on all the different pages, you can use a "no index" tag. And then once we crawl that page and see the "No Index" tag, we'll drop that page from our index completely. how to fix URLs blocked by robots.txt errors hello and welcome back again in this video session we're going to explore how we can easily fix the error messages that wind up getting when we log in to you webmaster tools now we have a tester here it's called robots.txt tester this is a great tool whereby you can quickly see that if a particular URL is being blocked you can actually test against different user agents for general web search its Googlebot let's press test now as you can see this directive here is causing that block as in spoken Google to crawl that page that we are seeing in this example right now what can I do here how can I fix this well I can log in to my web hosting manager go to file manager and locate robots or text file and in there I can go edit and simply delete that file as and delete that directive this directive so now when I deleted that if I save next time when Google comes and crawls my website everything will be ok as in this URL won't be blocked anymore but what if what if you don't want Google to access what's within that folder but you want Google to access a particular page if that's the case you can use a layout directive and say ok you you know what I disallow you everything within this folder but I allow you this URL now if I test it as you can see that URL won't be blocked anymore either so right now you have two different options you can either delete that directive altogether or you can keep that directive and give the path by using the layout directive so you simply choose either one and then update your file if you want to delay it altogether delete it altogether if you want to keep that but simply a layout access to that URL then you update your robots.txt file accordingly so both methods will fix them but what if when you log in to your web hosting account then you end up not seeing that foil well if that's the case then a plug-in that you're using is creating that robots.txt file dynamically on the fly so to speak so then you can't really go in and modify things from your web hosting account then the next option that you have then your content management system is using a plugin then you either have to go into the settings and tweak the settings so that URL is not blocked or simply remove that plug-in and create that file manually because it's just the text file I thank you very much for learning with me if you benefit from this video session please share it like it give me a comment so that other website owners can also benefit from it I'll talk with you in the next video session hello this is Cindy sponder with niche blogging for profits comm and what I'd like to show you today is how to use google webmaster tools to improve your robots.txt file in case you notice that the robots are searching things that you don't want them to so you would just go to webmaster tools you can google that Google welcome webmaster tools okay you put in your gmail account go ahead and sign in and then you'll see any of your websites that you have actually set up and verified and given a site map to I'll just pick one of these it doesn't really matter and if you go in here to tools you'll see that you're allowed to generate a robots.txt file and what you'll want to do is if you have certain files that you do not want the robots to index let's say that you have a PDF file that you're selling and you don't want the robots to find that file or index that file then you would put block all robots and let's just say that you've got it in a folder called private PDF okay so you just put that in and click on add and now it's saying ok user agent all robots allow everything except private PDF and you can continue down through there if you see that they are looking at your images and you really don't want them to waste your bandwidth for that you would say block robots all robots and let's say that you do have your images in a file called images you could just add that and you can continue to add things as you like for disallowing and that way the robots won't index them and they won't be using up your bandwidth with indexing things that nobody wants you don't want them to have access to to begin with then what you do is you just copy this you've got everything know that you want simply copy it let's put it into notepad paste it and then you want to save that file as let's just put it here and call it a new file robots.txt okay and that's all you would do is put it in there and I'm going to actually call the file robots.txt and once that has been saved as robots.txt you would simply upload it using your FTP account just upload that to your web space in the root directory and I've got another video that shows you how to do that if you don't know how and that way if there are areas of your website that you do not want indexed and do not want people to know are there you're able to do that let's say that you're developing a part of your website and you don't want it index just yet because it's not done then you could set disallow it for now and then go back in and allow it later so there's lots of different things you can do there there are lots of really cool tools at Google Webmaster Tools and I would recommend that you go in there and kind of explore around and see what all the different possibilities are hi everybody its Steven here from silver Joker at UK and today we're gonna have a quick and easy tutorial on how to change robots.txt file using Yoast SEO plugin for WordPress so it's probably most of you already know Yoast is one of the best SEO plugins they're in the market for WordPress websites if not the best one it is it is the best one in my opinion and with Yost you can basically pretty much button up every on-page and technical aspect of your website so I have a video as well where I show how to add a robots.txt file using cPanel and how to edit the file using cPanel as well if you are planning or if you are already using Yoast it will be much easier for you to create Robert 60 using the plugin so first of all the we need to log in to our main dashboard I have set out the test subdomain and I've loaded install the WordPress on it as well and let's get started so we've logged into the dashboard then we're gonna navigate to the SEO the Yoast settings and click on the tools then we're gonna go to the file editor and we'll be given two options here to edit the Robert States default and to edit the HD access file the one we need is the robots.txt default but what it says here is that we don't have a robots.txt file so we can create one here and we're just gonna press on create and that's going to create the default robots.txt file which you should really change but what I usually advise what I always do for my clients is I add the location of a XML sitemap so for in this case we're gonna have the following here so the test solve it so they call it UK and sitemap XML we're just gonna save this here that is pretty much it so is pretty pretty straightforward it takes few minutes to do and also if you're unsure of how the robots.txt file works and what to block and what not to block I've recently published a blog post on my main websites all vertical at UK the guide the beginner guide to the robots.txt file I'll put a link in the description below and let's do a quick test if we got the file installed correctly now and here we go so we've got the line here just this sitemap XML which is pretty good well we've got the crawl delay if you have watched more previous video about the robots.txt you shouldn't worry about the crawl delay because this line is put there automatically by web hosts and majority of them just put put the line there by default to kind of control and balance the bandwidth of the server but this is disregarded by Google so you shouldn't worry about that and that is pretty much it for today's video I hope you find it useful it's a quick you know an easy tutorial here and thanks for that and bye >>Matt Cutts: Today's question comes from Blind Five Year Old in San Francisco who wants to know, "Can I use robots.txt to optimize Googlebot's crawl? For example, can I disallow all but one section of a site, for one week, to ensure it is crawled, and then revert to a 'normal' robots.txt?" Oh, Blind Five Year Old, this is another one of those "Noooooo!" kind of videos. I swear I had completely brown hair until you asked this question and then suddenly grey just popped in [fingers snapping] like that. That's where the grey came from, really. So, no, please don't use robots.txt in a, in an attempt to sort of say, "Shunt Googlebot all the way over to one section of a website, but only for a week." Although we try to fetch robots.txt on a sort of daily basis or once every few hundred fetches to make sure we have an accurate copy of robots.txt, weird things can happen if you're trying to flail around and change your robots.txt really fast. The other thing is that's really not the best mechanism to handle it. Robots.txt is not the best way to do that. Suppose you want to make sure a section of say, ten pages, gets crawled well. It's much better to take those ten pages and link to them from your root page and say, "Hey, our featured category this week is red widgets instead of brown widgets, or blue widgets." And then just link to all of the ten red widget pages. That's because when all the page rank comes into the root page of your site, which is where most of your page rank typically comes in because most people typically link to the root of your website. If you put the links to the pages that you care about right up front and center on that root page, then page rank flows more so to those pages than to the rest of the pages on your site. They might be five or six or seven links away from your root page. So, what I would say is you could try using robots.txt. I really don't think it would work. You would be much more likely to shoot yourself in the foot by trying to jump around and swap out different robots.txt every week. What's much better is instead, to work on your site architecture to rearchitect things such that the sites that you want to highlight, the sites where, or the parts of your site, where you would like more page rank and more crawling, is linked to more directly or more closely from your root page. And that will lead Googlebot more into that part of your site. So, please, don't try to just swap in and out different robots.txt's and sort of say, "Ok, now you're gonna crawl to this part of the site this week, and this part of the site next week." You're much more likely just to confuse Googlebot and Googlebot might say, "You know what? Maybe I just won't crawl any of these pages. This seems very strange to me." So, that's the other way that I'd recommend is put it right at, change your site architecture and make your site more crawlable that way. what is going on ladies and gentlemen welcome back and in this video I'm gonna start talking to you guys about Google's webmaster guidelines now Google's webmaster guidelines are essentially just guidelines made by Google obviously for making a quality website and they have a couple different guidelines and I'm going to talk to you guys about all the different ones but for right now I just want to show you guys this real cool tool I found online and this is VA rvy com VAR v com I guess that's how you pronounce it and what you can do is you can actually just paste in the URL of your website and test it and what this is going to do is it's going to look at your website and it's going to go through and test for every single guideline so this one's good this one's good whenever you see a X it's basically saying hey this website doesn't follow this guideline so my website right now it doesn't have a sitemap and we'll talk a little bit about what that is but essentially it's a really useful tool and it's free so that's pretty cool so as we can see there's a bunch of different guidelines robots.txt on a sitemap make sure you have alt tags in your images yada yada so let's go ahead and break it down and we'll just start with the very first one for this video which is robots dot text so basically any time you make a website you should have a robots I say robots robots dot txt file now what this is is it's basically a plain text file and it goes in the root directory or right after the home directory of your website so whenever you're making one just go ahead and right click new file and just name it robots dot txt all right so what the heck is this this is basically a file that essentially instructs bots on what they're allowed to crawl and what they're not allowed to curl so usually by default what's going to happen is you're going to have some you know search engine BOTS whether it's from being Yahoo whatever and it's just going to start crawling your website every single page that it can possibly find however chances are that you have some resources or webpages that you don't want it to curl for example like the admin page or maybe the moderator panel or maybe just some I don't know like nude pictures that you have online by K this is actually for me so can you leave these out of your search results so basically the way it works is is really simple you only need to remember two things the first is user agent so what you can do is you can write star and that means that these rules that I'm going to tell you they apply to all search engine BOTS however you can also do something like this say that you only want to do make some rules for like Google's search engine BOTS well the name for that is Googlebot so you can do that but the majority of the time you just say user agent star so this means I'm making these rules for all search engine crawlers or BOTS whatever you want to call them now after this line let me pull this up again you essentially have a bunch of these disallowed this allowed disallow now disallow basically says all right you're allowed to crawl every single web page except the ones I'm telling you now if you just wanted to have the search engine mods ignore an entire directory or folder what you can do is write it like this so whenever you write this this means hey don't look at anything in my private directory those are like my personal things I don't want them indexed on your search engine now if you only want to say hey can you ignore one specific page you can do something like this private new pics the HTML or you can say like a admin the HTML and if you have multiple rules say we wanted to disallow admin and the entire private directory and I don't know like a personal password text then this would say hey whenever you're on my site leave these out so again it's just basically instructions on what the search engine but is not allowed to crawl now one other thing and this is actually a really important thing that I want to point out is that keep note that this file is actually used a lot by hackers and malicious people why is that well basically whenever someone goes your site they say hmm I wonder if they got any you know sensitive areas any you know web pages on here that they really don't want the public to know about well what they do is they just look for a robots.txt file and again the majority of the time this is just going to be used by search engines but it's actually readable by anyone so if they see a file or you know some directory in here like passwords or database or you know personal stuff then they're pretty much just going to go to it so again make sure that you never upload any sensitive information and also just be aware that hackers may look for these pages and these are pretty much um I know for a lot of people who don't know this it's basically like giving hackers the blueprint to what to attack on your website so holy moly Hot Tamale so there you go robots that text and since we have a little bit of time I'll show you guys this I created a patreon account I think that's how you pronounce it so this is basically for donations if you guys feel like donating my goal is for the end of the week I want to raise $25 and by the end of next week I want to raise 7 billion dollars so yeah we'll see how that goes and also a cool little thing I did this interview for a website called human fox and they basically just asked me you know how I got started programming um asked me about like education and what my dreams my passions are and you know some pretty cool stuff so if you guys want to check that out feel free human fox comm slash capsule slash bucky so there you go we learned a little bit about robots.txt files and in the next videos we're going to be covering the rest of these guidelines it's going to be shweet so i'll see you guys next time hey Claude Millan here affiliates starting line welcome this is a continuation of the series on the Google search engine optimization starter guide and this short video is going to be about making effective use of robots dot txt now what is that well you're gonna have certain areas on your site that you may want restricted from being crawled by the bots by Googlebot and by obviously some other maybe more nasty bots that are spamming type BOTS robots so how do you do that well you do that by creating a file called robots.txt and as it says here a robots.txt file tells search engines whether they can access and therefore crawl parts of your site so anywhere that you put in a certain piece of code that tells that bot whether it can go in there or not will allow you to determine what you want seen on your site by the robot and what you don't so that's what the purpose of a robots.txt does you may not want certain pages of your site crawled because they might not be useful to users if found you may they may have information you don't want seen by the bots if you want to prevent search engines from crawling your your pages Google Webmaster Tools has a robot doc txt generator to help you create this file if you're in WordPress you can put in a plug-in and I'm going to show you how to do that shortly so that's how you do it you want to put in a robots.txt function on your website to be able to control what the bots the robots to see and what they don't see now down here in the under best practices it does say use more secure methods for sensitive content just because you put a robots.txt designator on a piece of your site and the robot doesn't crawl it doesn't mean that they don't index that URL indeed they made index the URL without even showing the content so if you really want to make sure that none of their content is if you have sensitive content you want to find other ways of storing that content elsewhere so it can't be seen or finding other means of protecting it on your site but it doesn't tell you here how to do it so let's see what we're talking about okay so I'm in my blog word and WordPress trainer let's see how we set that up well what you want to do is come down to plugins and you're going to want to do add new and just type in robots.txt let's and see what happens and here are all your selections now I've looked at a few of these but the best one right now the 5-star that I that I use is this one PC robots.txt click install now activate it and you're done so we come down here on your list there it is P is the PC PC rhumba it's in your plugin list if you come under settings you'll see it right here PC robots.txt click on it and the settings here will show you what's going on now here's how this works in the vernacular of the robot ducts txt file user agent that's you you're you're telling you're setting up that you do not want a Lexi bot to just to be allowed to come into your site and to scan anything so disallow means disallow a Lexi pot from from scanning and it's counterintuitive if the disallow has to be followed by a forward slash so if you see the forward slash that means the disallow is activated and it's not allowing this bot in it's not allowing that went in it's not allowing this one in it's not allowing this one these are all this in this plugin when you go and read the creator of the plug-in he'll tell you that he has added a list of what he thinks are nasty BOTS spammy type BOTS who are looking for information just to create spam and he's added it to the disallow list so the fact that you see a forward slash and a disallow means none of these bots here are going to be allowed in now as you come down here you're gonna see will go down to the bottom of the list so far all of this on now you can go check and see whether these are bots that you like or you don't like but this is the default now if you see the disallow let's bring it down to the bottom here we go user agent appspot Google you have this alone with no forward slash same thing here for the Google bot this means these are allowed in this is allowed in so these bots here are going to be allowed in and I suppose bots that are not on the list are also allowed in so you can add certain ones on the Sun thinking of Yahoo and Bing here it's telling you which directories in your website are going to be part of the robots.txt which these are the the the directories on your website that you are not going to allow the bots to crawl and these are kind of standard files that you want you don't need to have crawled so these are the default settings you click Save Changes or soar and if you make any changes otherwise these are the default settings and your robots.txt code is placed everywhere it has to be placed in order to make this work so that is it that's how you set up if you want more information on robot dot how robots work you can come here to robots.txt org and then it'll explain to you how the system works and how the code works there's an official google webmaster central blog site that explains how robots work and here is how the definition of how the robots are blocked and remember and you remove pages using the robot txt file this is that again at the webmasters site google webmaster central so that's it that's how the robots.txt file works I hope this has been helpful this is Claude Polana an affiliate starting line stay with it stay well and we'll talk to you soon URLs blocked by robots.txt file for WordPress websites the error shown in search console in this video session I'm going to show you the steps that you should take to remedy URLs blocked by robots.txt file this morning working on valued clients site they end up seeing these URLs blocked by robots.txt file in search console under crawl section we have sitemaps here now these warnings will actually occur whether it is a well maintained site or not everything on them so if you're seeing things like that let's go and find out what we need to do and identify methods to fix this so what I've done is I've just logged out of my clients search console property for privacy reasons and logged in to mine now first thing to do when you're seeing that warning message is actually go to robots or Tech's tester and here let's imagine you have a sample URL shown - in search console as ABC page dot HTML and so on or just forward slash if you're using WordPress we'll be focusing on WordPress and we'll take things one day ok and then you should just test it test different user agents depending on the type of setup of your website but the web search is for Googlebot so here let's imagine this scenario let's imagine the sample URL was something like this and robots.txt tester may show you the directives within robots.txt file as an example let me see so then you can just check this URL pattern on your WordPress site to take a look at what you're seeing the and then go to the tester in search console and start testing experimenting to see do I have any directives within robots.txt file that is actually blocking certain URLs okay so that's the first step that we should do the next part is we can then go fetch as Google that's another feature that you would like to perhaps explore in that example you can safely fetch and render let's go and fetch and render that non-existent URL to see what happens as you can see fetch and render will also may give you hints as do you know what that's blocked then it's telling you to go back to robots or text test bar so you need to utilize search console and the tools available to you to identify you know how is those URLs block and let's imagine we've got this sitemap page because it's partially rendered here such as Google feature may also show certain things being blocked in this example it's actually an image so this is actually the most confusing part for most website owners that means let's imagine your wordpress site is using a popular X your plug-in light Yoast and you have XML sitemaps functionality enable then if you go to your sitemaps file and then you're looking at the sitemap side you know what all these URLs you know I can visit it and Google can see it so why is Google sitemap feature is complaining in the sitemap sign URLs block because you're looking at your sitemap and you're saying you know what these URLs are not blocked yeah particularly if you're using your sis your plug-in what you should be doing is pressing on ctrl you on your born together at the same time then you see the Yoast XML sitemap functionality actually has URLs for your images as well so then you need to double check the location for your images because you might not be blocking the normal post or pages or products but in your sitemap you may have URLs pointing to different images perhaps of different videos perhaps and they may be blocked using robots.txt directives make sense so this is the another kind of hidden place that most website owners do not kick so don't be thinking those URLs blocked by robots are specific to your normal permalinks because they are not they are also for your images or any other sitemaps that you may have okay so keep that in mind now what if you know you end up doing your test then you end up seeing some directives that's blocking in this example this particular directive was blocking that blocking Google to access the part that's been blocked what you can do then is especially if you're using yokes you can go to tools go to file editor and then locate that particular line as shown in the tester and just delete that line and save changes to robots or text file so you can follow that option if you are not using yokes then you would need to log into your web hosting and find robots.txt file and then modify it accordingly so that Google crawlers are not blocked because it's very important for Google to access your web pages as well as your images as well because they play an important part or a normal web page okay so when you see your LD blocked by robots.txt file for sitemaps you don't just look at the front end but look at the source code and say okay do I have any images being blocked and then robots or text Tesla is your friend so you can start experimenting you can also fetch and render as Google to see ok what else is being blocked because you know fetch and render in that example let's say this example if you're not seeing you know this image it's been blocked in fetch as Google feature well sometimes you can't really do much about that because that's an external site that you have no control over in this example it's coming from Facebook so if I request that URL now it's been redirected some down let's go to the main root of that and let's go fetch that particular URL to see what's there and as you can see that's Facebook so then if we say how about robots of take soil for facebook and you can see they are blocking certain things which may be on your website such as Facebook plugins share buttons all that stuff ok so you really can't do much about that because you don't control the robots or take spoil for Facebook make sense so when you see that don't be alarmed thinking oh that's block what do I need to do or you can't do anything for that particular example if it's blocked on an external side cap you can only control your own end as in your own website at what block there in your sitemaps whether you're using Yoast or not always double-check the source code to see what are the URLs you may be in there that Google is being blocked and then triple check robots or text file to remedy perhaps even remove certain lines and directives they need and then remember you want Google to access your website if you want people to rank your web pages I thank you very much for learning with me if you benefited from this video session please do give it a like and share it and if you've got any questions and comments use the commenting section of this video and I'll talk with you in the next videos

hello and welcome back to SEO crash course for WordPress users this video is about robots.txt and how you can edit the file with WordPress you need to use a robots.txt file to specify which sections of your site should and which should not be accessible to search engines for example you don't need the WP dash admin directory to be crawled and indexed by search engines because it's intended for internal use only robots.txt file is a plain text file and it should be placed in the root directory on your server which means you need to place it in the same folder where you have your website files and folders on the server you need to specifically call it robots.txt otherwise it's not going to work so if you go to webdesy.com and add forward slash robots.txt you're going to see the content on the file over here so as you can see it's right in the root directory on my server ok thing is WordPress uses a virtual robots.txt file that means you won't find this file on your FTP server for editing or whatnot because it's created dynamically each time a user visits your site though it's visible if you add forward slash robots.txt to your site URL as you can see here still it's not available on your server if you try to find it with the help of your FTP manager such as cute FTP FileZilla or cyberduck in case you want to have the option to specifically edit your robots.txt file manually you should install the WP robots.txt plugin or any other plugin that works that way it's going to allow you to edit your robots.txt file right in your WordPress dashboard so let's the plug-in and see how exactly it works while in your wordpress dashboard go to plugins and select add new now just type in the name of the plug-in which is WP robots.txt now hit the search plugins button having found a plug-in just click the Install Now link you should have a pop-up window now that the double-checks if you really want to install the plug-in just click OK now just click the activate plug-in option this point you can just expand the settings drop-down menu and select reading now just find the robots.txt content text field and what the field contains is the content of your actual robots.txt file as a matter of fact your code can be a bit different and that depends on the way you installed your WordPress for example if you installed your WordPress in a subdirectory on a server it can look like this so as you can see it's in the web disease - WP directory and then we have the WP dash admin directory so we're just specifying the path to this directory and the same holds true here the WP dash concludes directory is inside the year web visit - WP directory I'm just gonna delete the sub directory anyways it should block by default the following directories from indexing WP dash admin and WP dash includes though the default settings are workable as well it's still best WordPress SEO practices to modify just a bit so that your robots.txt file looks as follows the first time shows which exactly crawling robot you want to target and an asterisk means that you want to target all of them in other words you're saying hey all of you search robots act as follows alternatively you can point to specific crawlers such as Google bot Roger body etc so instead of user - agent Colin asterisk Google bot and that's how you can target Google specifically but I'm going to use the default setting for the first line the rest of the code just dis allows access to specify directories such as feed track banks and so on since you want to have the option to rank in search engines with a Content that is in the uploads directory actually WP - content - it loads the second to last line actually allows access to the subdirectory so that again you can rank with the content of the directory such as images or whatnot and the last line just points to the location of your sitemap XML file which Google and other search engines use for properly crawling your site in case you want to fine tune your robots.txt settings let me explain how to do it and before we go any further make sure that your robots.txt file does it not say the following disallow column forward slash because it means that you disallow access to all your sites so search engines will index nothing just keep in mind that you don't want that will in your robots.txt file speaking of which take a look at this SEO nerdy joke so there's a chick and there is a guy and some content from a robots.txt file so it says user - agent : lame guy Bob Lewis this guy disallow column forward slash which means everything in other words this chick disallows everything to the guy hopefully this joke can help you to better understand how disallow kahlan for / works okay let's move on so to target a specific directory just enclose its name with slashes for example disallow colon forward slash WP content forward slash so this way you disallow access to all the content in the WP dash content directory in order to target a specific file you just need to define a path to the file along with its name for example just add your dash file dot PHP so this is how we can disallow access to is specifically you are - file dot PHP that is located in the WP dash content directory you can actually point like that to all sorts of files so it can be an HTML file it can be an image file for example PNG also it can be a CSS file pretty much all kinds of files that said I'd like to mention one really widespread issue you may need to disable dynamic URL indexing the dynamic URL is one that contains the question mark such your Wells can cause all sorts of SEO wishes duplicate content to duplicate page title etc and that's actually why you usually want to disable search engines from indexing pages with the such your Wells dynamic URLs you can easily do it with the help of robots.txt file just by adding the following line disallow colon forward slash asterisk question mark thus you disallow access to your dynamic pages and search engines just will not index it so before I wrap up I just paste in the recommended code for WordPress robots.txt file again that's this one that that's the copy that you let you're supposed to use that done you need to save your changes by clicking the Save Changes button so now you know why you need your robots.txt file and how to use it in case you're watching this video on YouTube feel free to subscribe to the channel just click on subscribe also you may want to share this video and Facebook Twitter Google+ and whatever else works for you other than that feel free to leave your comments or ask questions I'll be more than happy to assist you in case you want to create a high-quality website and you want to do it fast you can just click on the link that you can see on your screen now it's going to redirect you to web is calm now click the ThemeForest Banner say you have a wordpress site just select WordPress just look through the awesome designs here so like the one they really like just click on it see the features that it has and if you really like what it offers in how it looks you can just click purchase and within approximately 10 minutes is going to be in your inbox and you'll be able to apply it to your either existing web site or just create a brand new one with your amazing design thanks for time have a great day Googlebot can't access your site error messages in Google Webmaster Tools hello and welcome in this video session I'm going to show you a couple of different methods that you can try to be able to fix the issues that Googlebot is heavy when it requests your website either way this error message should not be happening often meaning you know if you see this perhaps once a year that's okay because it could be that come tomorrow Googlebot will be able to access your site so then it's just a temporary thing and then you don't have to worry about it or do anything but if you're not seeing is an error messages often then let's go and see what we can do Google has recommendations but let me try and simplify what that means all these things mean now what if when you try to infect your robots.txt file you end up seeing robots or text fetch file error message in Google Webmaster Tools if you end up seeing something like this then you kind of get an idea as to what could be the culprit so to speak in this case it's robots or text file could be the culprit so when you log into your search console simply grab some sample URL from your website and just double Chaney as you can see at this stage the testing it says you know what Googlebot can access that URL here this sample URL but this is just a tester meaning I can drag in the india it doesn't matter it's something I'm doing because the physical file is on my server and let's go take a look at that as well simply log into your web hosting manager press on file manager and then you'll end up seeing this file here yeah if you don't see this oil at this stage then you know your content maintenance system is creating this spoil dynamically make sense so then you have to contact your web developer to find their how is that file is generated if you are managing your own content management system then it is more than likely that a plug-in is creating that file so perhaps you can disable that plug-in and so on so let's move through to see what we can do at this stage what you should be doing is just double making sure the permissions are set correctly for Google to access your file default 644 is okay you can right click change permissions and then simply check boxes here look what happens each time when I check these boxes the numbers change as you can see so you can perhaps tested as 755 change the permissions look what happens here now you know at this stage Google will definitely be able to access it so I'm not going to try and teach you how the permissions work but just to tell you is change the permissions make sure that it is minimum 644 okay because as you can see some others are not accessible and so on make sense so always check that then what you can do is you can view the file you can grab everything in that file or you can control I on your keyboard ctrl C on your keyboard so you copied it as you can see there is actually a problem then trying to go and fix this here you can create a text document named it robots or text file I've already done that here the reason I'm showing you these is because even when you're looking at your file here as in if you end up seeing robots or text file on your computer and you can't see anything wrong with in it but as you can see there is a problem happening here so what I can do is I can right click edit I can hit backspace can change encoding so it could be to do with encoding as well because your website may be in a different language okay so let me hit Save Changes let me view the file look what happened the the error message is gone okay so that'd be our M error it's rather complex I looked into it I don't hundred percent understand it it's to do with how encoding works on the foil save process ok so just make sure that you don't end up seeing that error message that we saw early on yeah you can as I've said hit backspace trying to fix it as such there are other methods such as there's a program called notepad plus plus which allows you to fix that B or M thing it's called white older right all the mark era and so you can look into it it's to do with encoding ok so I've shown you two different remedies one is permissions double make sure permissions are set correctly second that will make sure the the error message in that file is not there so you can utilize notepad plus plus or you can simply try it on your server level and hit backspace to delete anything in that file and if that doesn't work then you need to use notepad plus plus to make sure all is all kind I'm not going to cover that approach let me cover that approach while this one let's open up edit with notepad plus plus so I'm editing robots or text file and here we've got encoding character set as you can see you can convert everything within that file to you TFI and then save it ctrl s now I can upload this file knowing that I don't have that byte order mark error okay so you can follow their approaches well not that plus plus is an open source free program for you to use anyway what else can you do what you can do is double check your right H the excess foil so that could be the culprit as well as if there is in redirection going on perhaps you updated your dog htaccess foil and so on while here let me show you something as in if you're using WordPress at minimum you need this so you can grab everything delete everything and just have this in there but make sure you save what was in that file early on because you are troubleshooting the issues for google can't access your site and directives in HD excess foil could be causing some problems as well so at the end of the day you can troubleshoot this file as well this is a complex file and I've just showing you a simple method if you are not using WordPress simply copy what was in that file previously is in control C or grab everything copy make a local copy over just in case that while you're working you save this you can put this back in your server you can do the same thing from your file manager you can upload dot H the excess foil knowing you have access to the earlier one so you troubleshooting this as well because we can do that Israel now once you follow these methods and approaches if at that stage you still end up seeing Google can't access your site error messages and they have happen often then you have to contact your web hosting company to get them involved I thank you very much for learning with me and subscribing to Ranka YouTube channel and I'll talk with you in the next video session WordPress SEO ultimate robots.txt file for WordPress hello and welcome back again in this quick video session we're going to create robots.txt file for WordPress all you do windows if you using windows right click go to new create a new text document and name it as robots.txt file when we open it all we need to do is I use agent : Google when I hit return that's very important then I say this Alea : and that's it I save this file basically this file right now tells Google that it can come and crawl my entire website let's close that and let's learn about creating a better robots.txt file for WordPress site let's go online first of all wordpress.org forums tells us okay you know what ideal robots file basically shouldn't have directives at all that's what that's saying okay but we are not living in that ideal well plus the way the WordPress is structured Googlebot will actually end up you know crawling portions of your WordPress site that it shouldn't yeah the reason that it should and I'm gonna explain what that is all about in a minute as well so that's what the forum is saying well this particular file is extremely important if you want to better optimize your site right and we should disregard this insight because you're watching this video tutorial and I'm gonna show you a better way w3 standards talks about what that files all about you can read more about it no problem Google guidelines also tells us the names of the user agents that it uses also what you can do with it and what you cannot do with it and so on okay so you can read through that information as well so let's take a look at this particular site that's built on WordPress okay now if I turn around say okay so if you request this URL from your domain and if you end up seeing nothing that means you don't have robots or text file okay but if you're seeing something that's default perhaps because the theme that you're using it created a default file that looks different okay then no problems you're in luck let's go and learn all we need to be at this file and WordPress robots lock things for WordPress so let me open this up with Dreamweaver currently this is what I have if I put that number sign like the hash tag and anything that's past that number sign is a comment so I can comments in no problem so I can leave it as such no problems okay so let me delete that and let me copy all this control-c control-v now that's all pasted and let's talk about what's going on there these are all comments you can surely read through it but you can also delete it before you upload this file to your web server in the root directory of your web server right so this is the default robots.txt file directives that you should be sending out for Googlebot Googlebot is the name of Google's web search crawler okay but realize Google also has different user agents for different purposes one being Googlebot image you should always allow Google to index your images because Google is an image search provider as well now if you're using WordPress for a news related site make no mistake the insights that is within this document should work for any type of WordPress site because these files and these locations okay that's the way the WordPress is structured as we can see it's got admin folder it's got WordPress content folder it's got WordPress includes folder okay but your theme resides in WordPress content folder so therefore Google should not see the admin side of things right so you disallow as such yeah the way the wordpress is structured it's used for many things including blogs right but you know you end up having archives and so on also by default you should 110% have your own categories so you should not rely on wordpress is custom default categories because they are named as what we're seeing on the screen but keep in mind that if for some reason you know in your heart of hearts that you structured your WordPress site to include categories with the keyword category instead of having your custom categories is then simply removed because if you don't then you're basically telling Googlebot not to index these folders but even if you're using keywords as category in the URL what comes after it is the name of your file right HTML file name dot PHP or dot HTML or whatever right yeah well we say this layer this then we're saying to Google to disallow everything that comes past that point right so double making sure also you have by default word presses this categorize now you may for some unknown reason you know not categorize your posts well if that's the case then Google will turn around it may index those on categorize posts okay so tags shouldn't be allowed despite what some information online talks about tags and keywords for WordPress well tag the tags it's the way the wordpress is structured but you should not utilize tags as increasing your keyword count because it doesn't work like that right so you should 110% disallow portion of your site for tags as well now with this page stuff it's important if you've got pagination happening particularly yeah so you should disallow Google to see that as well comments comments fade well comments comments but the feed should not be indexed by Google yeah trackback is the same thing it shouldn't be indexed or else it's just gonna create duplicates now time time is important as that is very important for WordPress so therefore it archives posts by year and months and so on but that's not helping your Google rankings so therefore those mark eyes with Deitz in the URL should not be indexed by Google so disallow it as such print is for print it's not for web so therefore this layout that is one search is another important factor of WordPress sites and by the way it works is Google will turn around and index those search results as well if you really consider the search results well WordPress search is searching your site and any results in those search results page will only be duplicated so therefore they must be this layout this one now this particular directive here with a strict with the question mark afterwards that's for URL parameters yeah were the warning and caution if you have an e-commerce related site that means you have lots of different products and so on then delete that line but for the rest 99% of sites built on WordPress you should not have URL parameters with question mark in it because that's usually for search and all the rest right but if you have a ecommerce site that you know somehow you got products that is blurred they are from the database with question mark in the URL okay well if that's the case make sure you delete that one so we know that's a comment because it's past that number sign let me close that let me clean this up so you can actually see what you should have for your WordPress robots dot txt file directives as I've said this is for Googlebot it's for web search yeah let's close that let's cause that yeah I also have this directive which is called allow Googlebot where is it let's find it does obey the directive allow it also looks at conditional Alea whether it's full layer or conditionally you know what that means is conditional a layer is this is in this lar as such so what I'm saying here I'm allowing Googlebot to take a look at my uploads part of my side this is the default location for your images and so on or other multimedia and Foyle's that's where it gets uploaded so I'm double making sure so that Google web search can actually go and see that okay so I can use a layer as well now I've done these two directives here basically saying okay any files that has Jas or dot CSS extension should be layout yeah the reason I've done that is because Google's mobile-friendly test may show errors such as Oh your website is blocking JavaScript or CSS or therefore Google can't understand your site whether using mobile-friendly test whether using PageSpeed you might come across this error results yeah well if that's the case all you do is you simply copy all those URLs you can do that no problems and you can say okay let me paste it in there let me remove that portion in it as we can see I need to then find the exact location for this here but I think it's WordPress includes folder okay and so on so I can grab those URLs if you will can't access them right I can copy the exact URLs and simply start utilizing a layer directive if I do that then page speed or mobile-friendly test will not complain but what I've done is instead of going through all single URLs I basically allowed all files with dot J's and all files with dot with dot CSS to be allowed to be seen by Googlebot okay so I encourage you to do the same because you will can understand what's within these files anyway yeah Googlebot image surely Google is a search provider for images as well so I'm double making sure that I allow this particular location for Googlebot images that's a different user agent here as we can see so up to here that's for Googlebot what I tend to do is copy and find the other user agent directives and I simply like to pass all that stuff keeping in mind that's for all the other user agents okay that's what that Asterix is all about but when I specify the directives specifically for Googlebot then I know for a fact that it will obey these directives as that's what it says that it will and that's what it does but all the other search engines you target them as such okay if you have a wordpress site about news for your industry and so on you can still utilize these directives because they are basically as far as I'm concerned they are the ultimate robots.txt file for WordPress because none of these things are helping you if Google indexes them let me demonstrate that point let's take a look at this particular URL here that's built on WordPress but it's built on wordpress.com so it's not self hosted right now if I then turn around search google with site operator saying okay you know what going showed me what you have as far as the site is concerned now if you take a close look at the urls okay so these are data archives right as we can see author you know I don't want Google to index me for that URL because I don't want to rank for that URL as we can see that can stay there no problems but look at that here category this well it's not smart for me to have such things on my wordpress site yeah the reason that's happening for this site is because I'm always limited as far as modifying the robots.txt file because it's not self hosted it's wordpress.com hosted yeah it's clear out hosted so therefore I will end up having things like that so as we can see you wouldn't want to rank for data archives either right same is true for I have data archives happening on my site I shouldn't have that happening there okay so as we can see it's there but I don't want Google to index these if it does it's just not smart for me now same is true with tags as we can see this is actually a duplicate of this here right so let's up to here copy that let's go and check the power stay up and you'll see it comes from here make sense I wanna rank for this I don't wanna rank for this here so therefore the rule of thumb is when you think of what to Alea and what to disel a up just ask yourself okay yeah you are what do I want to rank for okay if I want to rank for this URL then whatever else that WordPress is creating by default should not be indexed by Google because it's just gonna be duplicate that is true with archives Deitz categories as far as the default categories meaning you should 110% have your own categories category is a very important part of WordPress and we should definitely have our own custom categories instead of leaving things by default because then you're really gonna see great ranking results Kathy so you should definitely have proper categories and so on but nonetheless remember this is the ultimate WordPress robots.txt file and it will help your Google rankings if you leave it as such and upload it to your web server in the root directory whether using an FTP ok it has to go under your site's public folder okay it has to be residing in the root folder which is public underscore HTML it has to be in there if it's in other locations if you place that robots not take spoil in other folders then Google's not gonna see or honor the directives within it I thank you very much for learning with me and if you haven't checked out the new membership site I encourage you to do that and I'll talk to you in the next video session

Powered by VidSEO

Wondering why this page has such a good ranking with no text ? Check here