homepage Welcome to WebmasterWorld Guest from 54.221.175.46
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Lost all rankings from Google - due to robots.txt
whatson




msg:4463767
 2:35 am on Jun 11, 2012 (gmt 0)

I have a developer (who I am about to fire) who put up a robots.txt file containing:

User-agent: *
Disallow: /

Am I right to assume this will be the cause of my substantial traffic drop and no longer being able to see my rankings in Google?
I have since removed the file, but still no recovery. Should my rankings return? If so how long should it take?

 

tedster




msg:4463768
 2:40 am on Jun 11, 2012 (gmt 0)

It's one of those mistakes that is much too easy to make (especially for the non-SEO) but quite disastrous in its effect. I had one client make this error (truly a household name) and even for them, it took about two weeks to get back to near normal Google traffic again.

shri




msg:4463774
 3:11 am on Jun 11, 2012 (gmt 0)

Here's what I've done in the past. Not sure if the individual elements work, but the combination has worked and got pretty much everything recrawled and ranks / traffic restored in a few days ( about 30K pages ).

- Test robots.txt again from the Webmaster central tool. This causes a refresh of what Google knows as your latest robots.txt
- Submit the sitemaps again into Google and other search engines.
- Increase the crawl speed.

Like I said, these three were done and worked for me in Feb. Not sure if one individual step is enough.

And needless to say, this all depends on how well your site usually gets crawled.

One more thing... we do not have individual pages sending out if-modified-since headers so not sure if this is also a factor in getting recrawled / ranked.

ChrisWilson




msg:4463778
 3:31 am on Jun 11, 2012 (gmt 0)

If you have been doing this long enough it can happen to anyone. Your rankings will be back to normal in a couple weeks. Try not to be to hard on the guy ;)

Zivush




msg:4463784
 4:37 am on Jun 11, 2012 (gmt 0)

In addition to what shri wrote, I always put this sitemap line in the robot.txt - simply add the following (first) line to your robots.txt file:
Sitemap: http://www.example.com/sitemap.xml

refer to - [webmasterworld.com...]

tedster




msg:4463857
 10:19 am on Jun 11, 2012 (gmt 0)

@Zivush, how does that sitemap line help override the Disallow rule? Is it just because adding it makes you manually inspect the robots.txt file so you catch the problem quickly?

deadsea




msg:4463867
 10:31 am on Jun 11, 2012 (gmt 0)

Make sure that when you fix this you don't simply delete the robots.txt file making it 404 not found. If you do that, Googlebot often just thinks that it should honor the last robots.txt that it found and your site will continue to be not crawled.

Instead remove the "Disallow /" from it leaving a robots.txt file that explicitly allows crawling:

User-agent: *

g1smd




msg:4463868
 10:44 am on Jun 11, 2012 (gmt 0)

Instead remove the "Disallow /" from it leaving a robots.txt file that explicitly allows crawling:

User-agent: *

The correct syntax is:

User-agent: *
Disallow:


with at least one blank line after the
Disallow directive.


Fix the robots.txt file. Wait 48 hours. Go to WMT and the use the "Fetch as Googlebot" function to retrieve your root page (www.example.com/) and then click on the "submit page and all linked pages" option.


I can't count the number of times I have accidentally uploaded the wrong robots.txt file to a site. However, the error has almost always been corrected within a few minutes. Even so, there have been a couple of occasions where Google had already grabbed the file seconds before the corrections were applied. In those cases it took 24 hours for Google to revisit and get the right version of the file. It's a shame there isn't a WMT button that says "I've messed up my robots.txt file; please discard the last version and grab the corrected one as soon as possible". If an incorrect file is corrected within 24 hours there appears to be no damage done.

netmeg




msg:4463887
 12:32 pm on Jun 11, 2012 (gmt 0)

I've done this. I bet most of us have.

whatson




msg:4464679
 3:52 am on Jun 13, 2012 (gmt 0)

deadsea I did exactly what you said I should not do.
What is the robots.txt file that should be there?

tedster




msg:4464702
 4:41 am on Jun 13, 2012 (gmt 0)

Looks like a misunderstanding. deadsea said that you SHOULD have a robots.txt file that says:

User-agent: * - but he forgot a line. g1smd then gave the correct syntax in his follow-up post. That syntax allows everything to be crawled. Then, if you have other needs, you can develop from there.

Sgt_Kickaxe




msg:4464716
 5:23 am on Jun 13, 2012 (gmt 0)

Google remembers the urls it already indexed and has a lot of data about them so when you fix the robots.txt file Google will be more quick about restoring rank and indexing than a fresh crawl.

I've made a similar mistake(robots.txt file from wrong site uploaded and, of course, it blocked entire sections of the receiving site) and was fully restored within eight days, I caught the mistake in one day.

whatson




msg:4464719
 5:33 am on Jun 13, 2012 (gmt 0)

Tedster what is the line, it should be:

User-agent: *
Allow:

Is that correct?
or just

User-agent: *

Robert Charlton




msg:4464722
 5:41 am on Jun 13, 2012 (gmt 0)

whatson - Note that g1smd has it correct, including the extra blank line after the Disallow directive.

I'd follow his instructions precisely.

g1smd




msg:4464727
 5:44 am on Jun 13, 2012 (gmt 0)

The correct syntax is:

User-agent: *
Disallow:


with at least one blank line after the
Disallow directive.
EvilSaint




msg:4464740
 7:02 am on Jun 13, 2012 (gmt 0)

Also, after the additional blank line after the Disallow: directive, add in the URL for the XML sitemap file for your website.

Test this with webmaster tools once its on, check that it is working correctly and also resubmit the sitemap file and the site itself as well just to be sure.

Good luck...

lucy24




msg:4464741
 7:03 am on Jun 13, 2012 (gmt 0)

Don't use "Allow" except in sections meant for specific, named robots that have explicitly said they know this word. The universally understood* word is "Disallow".

There are some awfully stupid robots out there ;) so don't get fancy.


* Understood. Not necessarily obeyed.

whatson




msg:4464813
 9:49 am on Jun 13, 2012 (gmt 0)

Now I am confused. You realize I want google to crawl my site right? Why would have disallow in there?

g1smd




msg:4464815
 9:54 am on Jun 13, 2012 (gmt 0)

Disallow: /

means disallow anything that begins with a slash (i.e. disallow everything).


Disallow:

means disallow nothing (i.e. allow everything).


As this is the "Robots Exclusion Protocol" everything hinges on this being a disallow list.

tedster




msg:4464844
 11:17 am on Jun 13, 2012 (gmt 0)

If you think about the robots.txt protocol from the point of view of programing a bot, the "Disallow" standard makes sense. You wouldn't usually want a potentially monster list of every URL you that were allowed to visit - just a few "keep out" notices.

Even though both Bing and Google say they now support a few extensions to the standard syntax, the actual current standard is explained here: [robotstxt.org...]

...and here is Google's Help page: [support.google.com...] If you start blocking some URLs or URL patterns, the details Google provides can become important for getting the exact results that you intended.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved