homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Google indexing weird links
ones with // in them

 6:44 pm on Oct 12, 2006 (gmt 0)

I have been following the googlebot when it indexes the site and checking its paths for the Robots.txt file.

Having disallowed every instances where there might be dup content and feeling smug about it I got ....

Disallow: /xmb/post.php?
Disallow: /xmb/member.php?
Disallow: /xmb/memcp.php?
Disallow: /xmb/chat/
Disallow: /xmb/cp2.php?
Disallow: /xmb/xmb/chat/

Now however I am finding google getting creative with its bot and indexing links with double / in them.

These // don't even exist on my board structure at that level, yet they are allowing the bot to index dup pages, that the above code had stopped them!


The bots seem to just add an extra / when they feel like it




 9:36 pm on Oct 12, 2006 (gmt 0)

They probably found a link that had a typo (extra slash) in it. For those double slash pages that Google has found it would probably be a good idea if you did not return a 200 status code.

Also if your problem is only with Google, then you can use their wildcard option in your robots.txt. That might make your robots.txt simpler.


 11:33 pm on Oct 12, 2006 (gmt 0)

I've been having getting that on my site as well. I've got several redirects using mod_rewrite, so I'm thinking it's possible that the rewrite might be causing it. I haven't seen any other spiders doing it, just googlebot.

I do note that none of the // URLs which Googlebot have been fetching have turned up in results and none appear if I do a site search.


 12:54 am on Oct 13, 2006 (gmt 0)

If you're on Apache, and have permission to use mod_rewrite in .htaccess, message #3115787 in this thread [webmasterworld.com] contains some code to cure this and some other common "bad-URL" problems.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved