ukgimp

msg:1525712 | 9:01 am on Nov 3, 2003 (gmt 0) |
First thing to check: is you robots.txt valid: [searchengineworld.com...]
|
roddy

msg:1525713 | 11:46 am on Nov 3, 2003 (gmt 0) |
| I checked it carefully, validated it |
| Apologies, perhaps that wasn't clear enough. Yes I validated it. Twice. And again just now . . . Roddy
|
ukgimp

msg:1525714 | 11:55 am on Nov 3, 2003 (gmt 0) |
>>Apologies, perhaps that wasn't clear enough No you were, it was I that missed that bit :) I had a similar problem with images but I was still getting request from the G image serach. It seems to be less and less now. You have to wait for the index to update I would guess. So that all the old data has been replaced by new. What will the update mania I tend to wait and see how things progress. I know that is no real helkp for you right now but I am sure it will work in the end. Ahh, do you see request for the robots.txt in your logs. If you do then bingo, you know it is read, you will just have to wait it out.
|
roddy

msg:1525715 | 12:04 pm on Nov 3, 2003 (gmt 0) |
So I've just got to wait, unless they've requested robots.txt, in which case I've . . . just got to wait. Which I've already done, for one week, which is 7 times as long as Google says it should take to be registered. Hmmmmmm. Roddy
|
Nick_W

msg:1525716 | 12:06 pm on Nov 3, 2003 (gmt 0) |
Are the pages it's requesting dyname? - like privmsg.php?x=y&a=b? I think bots see that as a different page. Nick
|
roddy

msg:1525717 | 12:09 pm on Nov 3, 2003 (gmt 0) |
Yes, they are dynamic. I've disallowed (for example) posting.php Would that still allow posting.php?t=123? Roddy
|
Nick_W

msg:1525718 | 12:12 pm on Nov 3, 2003 (gmt 0) |
I think so yes. Check out the last couple of msgs here: [webmasterworld.com...] Although I've not checked my logs in a few days, the last time I looked those pages were still beign picked up despite following Googles own advice.. Nick
|
roddy

msg:1525719 | 12:17 pm on Nov 3, 2003 (gmt 0) |
Looks useful, but this will prevent google crawling ALL dynamic pages - I only want to prevent crawling of certain pages. (actually I'm quite happy to treat all bots the same, but Google is the only one taking any significant bandwidth) Roddy
|
Nick_W

msg:1525720 | 12:22 pm on Nov 3, 2003 (gmt 0) |
Yes, sorry: didn't think! - I rewrite my urls. You might have to cloak them to return 404's ot bots. Nick
|
jdMorgan

msg:1525721 | 3:02 am on Nov 5, 2003 (gmt 0) |
roddy, May I suggest: User-agent: * Disallow: /privmsg.php Disallow: /search.php Disallow: /faq.php Disallow: /memberlist.php Disallow: /groupcp.php Disallow: /profile.php Disallow: /login.php Disallow: /posting.php Disallow: /viewonline.php Ref: [robotstxt.org...] The robots.txt validator -- like most other validators -- indicates that the 'code' is valid, and not that it will do what you desire it to do. Disallowing /xyz.php will also disallow /xyz.php?anything Jim
|
roddy

msg:1525722 | 7:10 am on Nov 5, 2003 (gmt 0) |
Thanks for that. Actually the last 24 hours seem to have seen Googlebot calm down and pay attention to the robots.txt. I'll need to wait a while to be really sure, and if I have any more problems I'll try your suggestion. Thanks for all the help Roddy
|
|