Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Robots.txt disallowed file shows up in SERPs & Google traffic drops

         

ianama

7:30 pm on May 14, 2009 (gmt 0)

10+ Year Member



In one of our sites, we have a Robots.txt file that disallows (for googlebot and googlebot-mobile) a particular directory containing a mirror site to which our affiliates send traffic. Doing a search for an affiliate name, Google SERPs returned a URL in this disallowed directory at spot #3 (albeit without title or snippet).

When I tested this URL against our Robots.txt in G's Webmaster Tools, it showed up as BLOCKED, yet it still shows up in SERPs.

In the meantime, we lost almost all even-remotely longtail traffic virtually overnight. I'm not sure yet if the two are related, but wanted to reach out and see if anyone else has seen anything similar.

UPDATE: Now hundreds of URLs in Disallowed directory showing up in Serps.

tedster

8:15 pm on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



albeit without title or snippet

That happens when backlinks give Google enough information about a url to show it in specific search results.

Robots.txt has prevented actual spidering, but these url-only results still may appear. If it's important not to show that result, use the url-removal tool in Webmaster Tools.

I'm pretty sure that it's not related to traffic loss.

ianama

8:27 pm on May 14, 2009 (gmt 0)

10+ Year Member



Many of the new ones we found actually contain all the info: title and snippet.

This is a bit scary.

tedster

9:02 pm on May 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google just had a major outage a few hours ago, and they are making some significant back-end changes apparently. Could well be a temporary bug.

Still, the same steps would be needed - make sure the robots.txt does what you intend and then use the url removal tool. Or, you could make sure everything in the blocked directory has a robots meta tag that says "noindex".

ianama

9:37 pm on May 14, 2009 (gmt 0)

10+ Year Member



Thanks Tedster. Will get moving on what you've suggested.

mcskoufis

9:55 pm on May 14, 2009 (gmt 0)

10+ Year Member



Just to report that I had exactly the same effect on my greek widgets site.

URLs which shouldn't be included and which I had removed via the GWT tool a few months ago started reappearing...

And they are duplicates of the international version of the site so suspect is the cause of the traffic drop (about 20% drop than the average last week).

pazang

11:43 am on Jun 8, 2009 (gmt 0)

10+ Year Member



Hi all

Yes we have seen a simliar problem on Friday, google has listed loads of pages which had been blocked - this has caused ranking to drop significantly as it looks like we have dupe content.

Any ideas what google r up to?

Robert Charlton

8:42 pm on Jun 8, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Note that if you use the meta robots tag on a page, do not also use robots.txt to block it.

robots.txt tells Google not to spider the page. The meta robots tag (in the following format) tells Google not to index the page or any references to it....

<meta name="robots" content="noindex">

If you use both robots.txt and the robots meta, Google won't spider the page and thus won't see the robots meta and won't know that the page shouldn't be indexed. In such a situation, if there is a reference (ie, link) to the page on an unblocked page, Google might index the reference to the page and return the url.

[edited by: Robert_Charlton at 8:43 pm (utc) on June 8, 2009]

fishfinger

7:32 am on Jun 9, 2009 (gmt 0)

10+ Year Member



A simple distinction, but one I never appreciated before. Thanks Robert.

tangor

7:57 am on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The robot rathole. All they need is a whisker to get what you don't want spidered... and they want to spider everything AND WILL. Put your locks on the CONTENT PROTECTED first (page level) then robots.txt second.

pazang

9:01 am on Jun 9, 2009 (gmt 0)

10+ Year Member



Ok so I am going to be a bit dim - whats content protected ? and how do you do it.

:-P

tangor

9:41 am on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, thought it obvious. Meta Robots on PAGE you want IGNORED.

edit:

No Archive, No Cache, No Index and no the bad bots are going to listen to any of those. It's a crap shoot. But g y and b seem to follow.