Forum Moderators: open

Message Too Old, No Replies

Crawler from an Ask IP requested sitemap.xml

No referrer and no identity in UA

         

Mokita

11:20 pm on Nov 17, 2006 (gmt 0)

10+ Year Member



As it happens this particular site does have a (Google) sitemap.xml, but also has Mod_rewrite disallowing all Java crawlers except Google, so it got a 403.

If Ask wish to utilise Sitemaps, surely they should do so openly, not by stealth?

65.119.214.9 - - [18/Nov/2006:09:30:19 +1100] "GET /sitemap.xml HTTP/1.1" 403 - "-" "Java/1.5.0_07"

Anyone else seen it or have an opinion about this?

incrediBILL

9:01 pm on Nov 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you're overreacting as Ask is probably just running a prototype trying to catch up with the recent industry-wide sitemap adoption [webmasterworld.com] so this probably is nothing to be concerned with.

Mokita

11:00 pm on Nov 18, 2006 (gmt 0)

10+ Year Member



As far as I know, Google and Yahoo only request sitemap.xml once they have been invited to do so by the site owner submitting it. And when they do access it, they use a readily identifiable UA.

My gripe is that I didn't submit it to Ask, plus they are hiding behind a generic UA.

incrediBILL

1:41 am on Nov 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you have ASK allowed in your robots.txt?

If so, I think they did nothing wrong.

If not, I'll bring the lynch mob, you supply the beer.

wilderness

1:55 am on Nov 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If not, I'll bring the lynch mob, you supply the beer.

One of you will also need to add in tranportation fees for the mob between North America and Australia vice versa ;)
I hear those Aussie enjoy their brew (ale) so Bill may not be getting off so cheap ;)

volatilegx

3:32 pm on Nov 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think that if a search engine company such as Ask is spidering (for whatever reason), they ought to identify themselves in the user agent. There is no reason why they couldn't even in a prototype crawler.

thetrasher

4:35 pm on Nov 19, 2006 (gmt 0)

10+ Year Member



Cloaked robot from Ask.com: [webmasterworld.com ]

ext9.eds.jeeves.ask.info (no A record) requests my default page every week with
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)".
This bot doesn't read robots.txt.

Maybe cloaking detection?