homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Ask - Teoma
Forum Library, Charter, Moderator: open

Ask - Teoma Forum

    
Jeeves/Teoma and Robots.txt
Jeeves/Teoma does not follow the robots exclusion protocol
renee

10+ Year Member



 
Msg#: 207 posted 5:32 pm on Dec 11, 2002 (gmt 0)

Jeeves/Teoma has been spidering my site including directories excluded through my robots.txt file. Is anybody else seeing this or just me? I sent an inquiry to their support people and never got a response.

 

caine

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 207 posted 3:50 pm on Dec 16, 2002 (gmt 0)

renee,

sorry for such a late reply, its wroth checking your robots.txt file, with Brett's tool at Search Engine World. Also worth a read of the article as well.

Robots.txt exclusions [searchengineworld.com]

btw, Welcome to webmasterworld

WebJoe

10+ Year Member



 
Msg#: 207 posted 12:11 am on Jan 6, 2003 (gmt 0)

Renee

I have observed the same thing and decided to ban the bot from my site entirely, after I havent heard from their support personel for several months.

I am aware that this will worsen the ranking of my site with Jeeves/Teoma, but I believe that since the content of my site is only German and written for a very specific group of people the impact will not have an affect to the number of visitors.

caine

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 207 posted 10:39 am on Jan 6, 2003 (gmt 0)

WebJoe, Welcome to Webmasterworld.

I think with a site written in German, that certainly as it stands, your decision is not going to have a profound affect with Teoma for a while.

AnswerGuy

10+ Year Member



 
Msg#: 207 posted 1:33 pm on Jan 7, 2003 (gmt 0)

You can email me any examples of failures. For this problem I need a sample of the URLs which were downloaded not respecting robots.txt.

acurtis@askjeeves.com

WebJoe

10+ Year Member



 
Msg#: 207 posted 5:06 pm on Jan 8, 2003 (gmt 0)

@caine: Thanx. There are other "regular" bots I banned too for I didn't approve with their behaviour and figured the negative impact (of not being listed) can be ignored...if anyone is interested on which UAs and IPs I have banned I can give the URL.

@AnswerGuy: Sent you a sticky mail with example URL...

EDIT: That was quick....I see you found the trap :)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 207 posted 5:50 pm on Jan 8, 2003 (gmt 0)

AnswerGuy,

First things first, unless you want your e-mail flooded... :)

What is the correct User-agent string for use in robots.txt?
And where is this information available on the AJ site?
How about Teoma? - Is the User-agent "Teoma-agent" retired now?

I too have had disallowed pages crawled, but have not yet made the assumption that I've got everything correct on my end. I strongly suggest you consider putting pages on your AJ and Teoma Web sites to deal with the robots exclusion protocol and specify your User-agent names on those pages. If such pages exist, I certainly can't find them. Webmasters cannot assume that the UA string for robots exclusion is the same as the UA string in their site access logs - that rule has been broken by too many search providers to be considered a rule, and demonstrably doesn't work with AJ.

The only way I've been able to keep "Ask Jeeves/Teoma" from getting caught in my bad-bot traps has been to feed that User-agent a blank link-to-home page instead of disallowed content. It adds complication to my sites, it is technically cloaking, it slows down the servers for all requests, and I don't like doing it.

For User-Agent strings in robot.txt, I've tried:
Ask Jeeves
Ask Jeeves/Teoma
ask jeeves
ask jeeves/teoma

I've had no problems with other legitimate robots and my robots.txt files validate with every validation tool I can find. However, they are non-trivial files, due to the different ways that search engines treat disallows in robots.txt and on-page meta-robots tag exclusions, and combinations of the two. So maybe it is on my end, and the UA string is the most suspect.

Please publish this info!

Thanks for your time,
Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Ask - Teoma
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved