homepage Welcome to WebmasterWorld Guest from 54.227.34.0
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt help
mikemcs

10+ Year Member



 
Msg#: 237 posted 1:05 pm on Jan 15, 2004 (gmt 0)

Google seems to be ignoring my robots.txt file am I doing something wrong?

--robots.txt---
User-agent: googlebot
Disallow: CCS/PFCatalog.php

User-agent: googlebot
Disallow: PCR/Catalog/PFCatalog.php

-- from log----
XX.68.82.144 - - [15/Jan/2004:06:17:28] "GET /http://www.******.com/CCS/PFCatalog.php?Cat=90 HTTP/1.1 " 200 200 "None" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

XX.68.82.181 - - [15/Jan/2004:06:25:57] "GET /http://www.******.com/CCS/PFCatalog.php?Cat=91 HTTP/1.1 " 200 200 "None" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

[edited by: DaveAtIFG at 4:58 pm (utc) on Jan. 15, 2004]
[edit reason] "Nuetered" IPs [/edit]

 

tschild

10+ Year Member



 
Msg#: 237 posted 2:16 pm on Jan 15, 2004 (gmt 0)

Some things to check:

1. Your robots.txt is unusual in that it has more than one record (group of lines separated by a blank line) for the one user-agent googlebot. It might well be that some robots interpret the file in a way that is not what was intended, e.g. only using the last record, or the first record, for Googlebot. Does your robots.txt include still more records for Googlebot?

2. The robots.txt specification [robotstxt.org] says:

Disallow

The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.

/CCS/PFCatalog.php?Cat=90 does not begin with CCS/PFCatalog.php . It begins with /CCS/PFCatalog.php .

You might try if the following works:


User-agent: *
insert paths to be disallowed to all spiders here

User-agent: googlebot
Disallow: /CCS/PFCatalog.php
Disallow: /PCR/Catalog/PFCatalog.php
insert other paths to be disallowed to Googlebot here


mikemcs

10+ Year Member



 
Msg#: 237 posted 2:35 pm on Jan 15, 2004 (gmt 0)

could I just then use to handle both

User-agent: googlebot
Disallow: /PFCatalog.php

le_gber

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 237 posted 8:47 am on Jan 16, 2004 (gmt 0)

Hi mikemcs,

no because the first / means that you go to the root of the site and your PFCatalog.php seems to be in directories below the root level.

You have to include the full path to the file/directory you want to be ignored.

Hope this helps

Leo

mikemcs

10+ Year Member



 
Msg#: 237 posted 1:04 pm on Jan 16, 2004 (gmt 0)

Thanks I will give it a try

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved