Welcome to WebmasterWorld Guest from 54.160.131.144

Forum Moderators: goodroi

Message Too Old, No Replies

Google indexing weird links

ones with // in them

     

netchicken1

6:44 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



I have been following the googlebot when it indexes the site and checking its paths for the Robots.txt file.

Having disallowed every instances where there might be dup content and feeling smug about it I got ....

Disallow: /xmb/post.php?
Disallow: /xmb/member.php?
Disallow: /xmb/memcp.php?
Disallow: /xmb/chat/
Disallow: /xmb/cp2.php?
Disallow: /xmb/xmb/chat/

Now however I am finding google getting creative with its bot and indexing links with double / in them.

These // don't even exist on my board structure at that level, yet they are allowing the bot to index dup pages, that the above code had stopped them!

/xmb//post.php?

The bots seem to just add an extra / when they feel like it

/xmb//viewthread.php

goodroi

9:36 pm on Oct 12, 2006 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



They probably found a link that had a typo (extra slash) in it. For those double slash pages that Google has found it would probably be a good idea if you did not return a 200 status code.

Also if your problem is only with Google, then you can use their wildcard option in your robots.txt. That might make your robots.txt simpler.

abates

11:33 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



I've been having getting that on my site as well. I've got several redirects using mod_rewrite, so I'm thinking it's possible that the rewrite might be causing it. I haven't seen any other spiders doing it, just googlebot.

I do note that none of the // URLs which Googlebot have been fetching have turned up in results and none appear if I do a site search.

jdMorgan

12:54 am on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If you're on Apache, and have permission to use mod_rewrite in .htaccess, message #3115787 in this thread [webmasterworld.com] contains some code to cure this and some other common "bad-URL" problems.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month