homepage Welcome to WebmasterWorld Guest from 50.19.169.37
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt Pattern Matching
How Well Does It Work?
WebGuerrilla




msg:1528583
 11:55 pm on Mar 8, 2002 (gmt 0)


I'm finishing up some work on a Cold Fusion site where the URL's have been rewritten to reomove all ?'s. However, Google has already indexed about 80 pages with .cfm? in the URL.

Since the old URLS will still work, we don't want Googlebot to return and end up indexing the same page twice, so I'm thinking of using a robots.txt that looks like

User-Agent: Googlebot
Disallow: /*.cfm?*

The goal being that if Googlebot returns through a previously indexed url, it will drop the old one and end up picking up the page with the new one. (.cfm/)

Has anyone done this, and if so, how well has it worked?

 

rjohara




msg:1528584
 1:47 am on Mar 9, 2002 (gmt 0)

I use this to prevent indexing of images and it works fine:

User-Agent: Googlebot
Disallow: /*.gif$
Disallow: /*.jpg$

(The $ indicates end of the filename if I recall correctly. I haven't tried it with a wildcard at the end.)

WebGuerrilla




msg:1528585
 4:25 am on Mar 9, 2002 (gmt 0)


Yes, I've used it for images as well and it seems to work fine. But Google's help section doesn't show any examples of a wildcard being used after the file extension. It seems like it would work fine, but I'd like to know for sure before putting it up.

Brett_Tabke




msg:1528586
 10:42 am on Mar 10, 2002 (gmt 0)

I've used *.cgi with success. I've not used the ? or anything after it.

On the images rjohara, you know you can use:

User-Agent: Googlebot-Image
Disallow: /

Brett_Tabke




msg:1528587
 1:15 pm on Mar 15, 2002 (gmt 0)

Forwarded:

Patterns must begin with / because robots.txt patterns always match absolute URLs.
* matches zero or more of any character.
$ at the end of a pattern matches the end of the URL; elsewhere $ matches itself.
* at the end of a pattern is redundant, because robots.txt patterns always match any URL which begins with the pattern.

thanks

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved