Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot discovers internal links to non-existent forums directory

         

indyank

5:28 pm on Mar 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In google webmaster tools, I am seeing one 404 error. this non-existing url has 100s of internal pages linking to it. The url is "http://domain.com/forums/.

I never had forums on this site but gbot is discovering this url from many internal pages almost daily! How could this happen?

one thing I noticed is that most of these pages that are reported to have this link have "noindex, follow" meta tag.

the only exception is the home page which is also reported as linking to it.

I even used the Fetch as googlebot tool to see whether any of these pages have any hidden link to the so called forums on my site.But I didn't find any. Why is this?

How else to find why googlebot is discovering this forums link on my site?

tedster

6:22 pm on Mar 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you tried running Xenu? Also, maybe try using grep on some raw server logs for any such hits to see a referer?

Shatner

6:36 pm on Mar 25, 2011 (gmt 0)

10+ Year Member



I've had this happen a lot. To the tune of tens of thousands of pages Googlebot crawled like this, pages which didn't exist.

I actually think that's why Panda penalized me, because all those bogus pages Google crawled, because they were bogus pages, were classified as "thin content".

No idea why it happened but I've gone crazy with disallows, nofollows, and canonical links in the past couple of weeks and have had some success with getting them to cut it out.

goodroi

7:24 pm on Mar 25, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Time to roll up your sleeves and start going through the raw log files.

If you are in rush to deal with the situation you can return a 410 code for the nonexistent url but that can introduces other issues so I wouldn't suggest you rush to use it.

indyank

1:03 am on Mar 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had tried Xenu and it doesn't report any such link.

Yes this is happening ever since panda arrived.But the pages on which google discovers this strange link are not bogus pages.They are search pages, subpages of homepage and category pages.Significantly all these carry "noindex, follow" meta tag.I cannot return a 410 or block some of them like this.

http;//domain.com/page/4/

Though I add noindex,follow to these subpages of home, they are needed for navigation.

I can definitely block the search pages.

indyank

6:05 am on Mar 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tedster, I did a grep for "GET /forums" and I got these two results for march 25.

14.99.***.54 - - [25/Mar/2011:12:31:12 -0500] "GET /forums/ HTTP/1.1" 404 9173 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.***.151 Safari/534.16"
89.151.***.132 - - [25/Mar/2011:12:31:18 -0500] "GET /forums/ HTTP/1.1" 404 9173 "-" "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"


One claims to be a Tweetmemebot but the ip is from poland.

[edited by: tedster at 7:02 am (utc) on Mar 26, 2011]
[edit reason] obfuscate the IP addresses [/edit]

tedster

7:04 am on Mar 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So either there is no referrer or your server logs are not set up to track referrers. Well, it is a clue, anyway. You might look to see if those user agents make other requests during their session.

[edited by: tedster at 6:33 pm (utc) on Mar 27, 2011]

leadegroot

10:26 am on Mar 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you checked the cache of google of one of the alleged referers to check you haven't been hacked, adding dead links? Could be!

indyank

3:45 pm on Mar 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



leadegroot, All the pages that are reported to have this link to the non-existent page (http://domain.com/forums/) carry "noindex,follow" meta tag. The only exception is the home page wich is also reported to have this link.

So, there isn't any google cache for them.But I did check those pages through GWT's "Fetch as googlebot" tool and the results don't have the word "forums" anywhere.

Tedster, there aren't any referrers as I think these are bots.otherwise, the server logs do have referrer information.