Welcome to WebmasterWorld Guest from 54.145.15.88

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Lost all rankings from Google - due to robots.txt

     
2:35 am on Jun 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2002
posts: 1541
votes: 0


I have a developer (who I am about to fire) who put up a robots.txt file containing:

User-agent: *
Disallow: /

Am I right to assume this will be the cause of my substantial traffic drop and no longer being able to see my rankings in Google?
I have since removed the file, but still no recovery. Should my rankings return? If so how long should it take?
2:40 am on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


It's one of those mistakes that is much too easy to make (especially for the non-SEO) but quite disastrous in its effect. I had one client make this error (truly a household name) and even for them, it took about two weeks to get back to near normal Google traffic again.
3:11 am on June 11, 2012 (gmt 0)

Senior Member from HK 

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 14, 2002
posts:2288
votes: 15


Here's what I've done in the past. Not sure if the individual elements work, but the combination has worked and got pretty much everything recrawled and ranks / traffic restored in a few days ( about 30K pages ).

- Test robots.txt again from the Webmaster central tool. This causes a refresh of what Google knows as your latest robots.txt
- Submit the sitemaps again into Google and other search engines.
- Increase the crawl speed.

Like I said, these three were done and worked for me in Feb. Not sure if one individual step is enough.

And needless to say, this all depends on how well your site usually gets crawled.

One more thing... we do not have individual pages sending out if-modified-since headers so not sure if this is also a factor in getting recrawled / ranked.
3:31 am on June 11, 2012 (gmt 0)

New User

5+ Year Member

joined:July 2, 2010
posts:34
votes: 1


If you have been doing this long enough it can happen to anyone. Your rankings will be back to normal in a couple weeks. Try not to be to hard on the guy ;)
4:37 am on June 11, 2012 (gmt 0)

Preferred Member

joined:June 10, 2011
posts:521
votes: 0


In addition to what shri wrote, I always put this sitemap line in the robot.txt - simply add the following (first) line to your robots.txt file:
Sitemap: http://www.example.com/sitemap.xml

refer to - [webmasterworld.com...]
10:19 am on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


@Zivush, how does that sitemap line help override the Disallow rule? Is it just because adding it makes you manually inspect the robots.txt file so you catch the problem quickly?
10:31 am on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 7, 2003
posts: 750
votes: 0


Make sure that when you fix this you don't simply delete the robots.txt file making it 404 not found. If you do that, Googlebot often just thinks that it should honor the last robots.txt that it found and your site will continue to be not crawled.

Instead remove the "Disallow /" from it leaving a robots.txt file that explicitly allows crawling:

User-agent: *
10:44 am on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Instead remove the "Disallow /" from it leaving a robots.txt file that explicitly allows crawling:

User-agent: *

The correct syntax is:

User-agent: *
Disallow:


with at least one blank line after the
Disallow
directive.


Fix the robots.txt file. Wait 48 hours. Go to WMT and the use the "Fetch as Googlebot" function to retrieve your root page (www.example.com/) and then click on the "submit page and all linked pages" option.


I can't count the number of times I have accidentally uploaded the wrong robots.txt file to a site. However, the error has almost always been corrected within a few minutes. Even so, there have been a couple of occasions where Google had already grabbed the file seconds before the corrections were applied. In those cases it took 24 hours for Google to revisit and get the right version of the file. It's a shame there isn't a WMT button that says "I've messed up my robots.txt file; please discard the last version and grab the corrected one as soon as possible". If an incorrect file is corrected within 24 hours there appears to be no damage done.
12:32 pm on June 11, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12734
votes: 159


I've done this. I bet most of us have.
3:52 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2002
posts: 1541
votes: 0


deadsea I did exactly what you said I should not do.
What is the robots.txt file that should be there?
4:41 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Looks like a misunderstanding. deadsea said that you SHOULD have a robots.txt file that says:

User-agent: * - but he forgot a line. g1smd then gave the correct syntax in his follow-up post. That syntax allows everything to be crawled. Then, if you have other needs, you can develop from there.
5:23 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


Google remembers the urls it already indexed and has a lot of data about them so when you fix the robots.txt file Google will be more quick about restoring rank and indexing than a fresh crawl.

I've made a similar mistake(robots.txt file from wrong site uploaded and, of course, it blocked entire sections of the receiving site) and was fully restored within eight days, I caught the mistake in one day.
5:33 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2002
posts: 1541
votes: 0


Tedster what is the line, it should be:

User-agent: *
Allow:

Is that correct?
or just

User-agent: *
5:41 am on June 13, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11418
votes: 197


whatson - Note that g1smd has it correct, including the extra blank line after the Disallow directive.

I'd follow his instructions precisely.
5:44 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The correct syntax is:

User-agent: *
Disallow:


with at least one blank line after the
Disallow
directive.
7:02 am on June 13, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 17, 2009
posts:107
votes: 0


Also, after the additional blank line after the Disallow: directive, add in the URL for the XML sitemap file for your website.

Test this with webmaster tools once its on, check that it is working correctly and also resubmit the sitemap file and the site itself as well just to be sure.

Good luck...
7:03 am on June 13, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12990
votes: 287


Don't use "Allow" except in sections meant for specific, named robots that have explicitly said they know this word. The universally understood* word is "Disallow".

There are some awfully stupid robots out there ;) so don't get fancy.


* Understood. Not necessarily obeyed.
9:49 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2002
posts: 1541
votes: 0


Now I am confused. You realize I want google to crawl my site right? Why would have disallow in there?
9:54 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /


means disallow anything that begins with a slash (i.e. disallow everything).


Disallow: 


means disallow nothing (i.e. allow everything).


As this is the "Robots Exclusion Protocol" everything hinges on this being a disallow list.
11:17 am on June 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


If you think about the robots.txt protocol from the point of view of programing a bot, the "Disallow" standard makes sense. You wouldn't usually want a potentially monster list of every URL you that were allowed to visit - just a few "keep out" notices.

Even though both Bing and Google say they now support a few extensions to the standard syntax, the actual current standard is explained here: [robotstxt.org...]

...and here is Google's Help page: [support.google.com...] If you start blocking some URLs or URL patterns, the details Google provides can become important for getting the exact results that you intended.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members