homepage Welcome to WebmasterWorld Guest from 54.167.144.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google indexing weird links
ones with // in them
netchicken1

5+ Year Member



 
Msg#: 3118663 posted 6:44 pm on Oct 12, 2006 (gmt 0)

I have been following the googlebot when it indexes the site and checking its paths for the Robots.txt file.

Having disallowed every instances where there might be dup content and feeling smug about it I got ....

Disallow: /xmb/post.php?
Disallow: /xmb/member.php?
Disallow: /xmb/memcp.php?
Disallow: /xmb/chat/
Disallow: /xmb/cp2.php?
Disallow: /xmb/xmb/chat/

Now however I am finding google getting creative with its bot and indexing links with double / in them.

These // don't even exist on my board structure at that level, yet they are allowing the bot to index dup pages, that the above code had stopped them!

/xmb//post.php?

The bots seem to just add an extra / when they feel like it

/xmb//viewthread.php

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3118663 posted 9:36 pm on Oct 12, 2006 (gmt 0)

They probably found a link that had a typo (extra slash) in it. For those double slash pages that Google has found it would probably be a good idea if you did not return a 200 status code.

Also if your problem is only with Google, then you can use their wildcard option in your robots.txt. That might make your robots.txt simpler.

abates

10+ Year Member



 
Msg#: 3118663 posted 11:33 pm on Oct 12, 2006 (gmt 0)

I've been having getting that on my site as well. I've got several redirects using mod_rewrite, so I'm thinking it's possible that the rewrite might be causing it. I haven't seen any other spiders doing it, just googlebot.

I do note that none of the // URLs which Googlebot have been fetching have turned up in results and none appear if I do a site search.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3118663 posted 12:54 am on Oct 13, 2006 (gmt 0)

If you're on Apache, and have permission to use mod_rewrite in .htaccess, message #3115787 in this thread [webmasterworld.com] contains some code to cure this and some other common "bad-URL" problems.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved