homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Exalead
Thumbnail bot
fiestagirl

10+ Year Member



 
Msg#: 3293 posted 1:19 am on Jun 15, 2006 (gmt 0)

193.47.80.83-193.47.80.92

UA=mozilla/5.0 (compatible; konqueror/3.4; linux) khtml/3.4.1 (like gecko)
(previously NG/2.0)

Reverse DNS=thumb0.exabot.com to thumb9.exabot.com

Still opening a "preview" of your site in a frame and removing adsense code.
[webmasterworld.com...]

 

Mokita

5+ Year Member



 
Msg#: 3293 posted 5:32 am on Jun 15, 2006 (gmt 0)

I've been seeing Exabot-Images crawler occasionally recently. But I have Exabot disallowed in robots.txt plus my /images/ folder is disallowed and it has respected that consistently.

193.47.80.140 - - [15/Jun/2006:04:25:08 +1000] "GET /robots.txt HTTP/1.1" 200 1622 "-" "Exabot-Images/1.0"

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3293 posted 5:50 pm on Jun 15, 2006 (gmt 0)

fiestagirl, is the UA you provided exactly as-is? All lowercase and including the "previously" paren?

mozilla/5.0 (compatible; konqueror/3.4; linux) khtml/3.4.1 (like gecko) (previously NG/2.0)

I ask because I usually see generic Konqueror UAs as upper- and lower-cased, e.g.:

Mozilla/5.0 (compatible; Konqueror/2.0.1; X11); Supports MD5-Digest; Supports gzip encoding
Mozilla/5.0 (compatible; Konqueror/2.2.2; Linux 2.4.14-xfs; X11; i686)
Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.3 (like Gecko) (Kubuntu package 4:3.4.3-0ubuntu2)

Regardless, thank goodness for [NC,OR] :)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3293 posted 6:58 pm on Jun 15, 2006 (gmt 0)

Yahoo Image Search opens a 'preview" as well which disables certain features in MSIE:
[webmasterworld.com...]

I just installed a "fix" so my site detects Yahoo displays my page in preview and MSIE the "preview" being displayed says:

"Sorry, we can't display this page because Yahoo causes the page to malfunction.
CLICK HERE to see the page operating properly in a new window."

Now Google Image Search on the other hand does the same thing but my frame buster blasts out of their framing in both MSIE and Firefox, as well as breaks out of Yahoo in Firefox.

I'm looking at implementing the same trick for this search engine.

When exalead frames your site it uses a referrer like this:

"http://www.exalead.com/search?C=<gibberish>"

Opposed to when you click the link to open the site:

"http://www.exalead.com/search/C=<gibberish>q=www.yourdomain.com"

Should be easy enough to cloak this so visitors see something like Yahooligans now see.

fiestagirl

10+ Year Member



 
Msg#: 3293 posted 6:27 pm on Jun 16, 2006 (gmt 0)

The "previously" was my addition, sorry for the confusion. All lower case.

mozilla/5.0 (compatible; konqueror/3.4; linux) khtml/3.4.1 (like gecko)

I also noticed that when I previewed my site, it generated a visit from their ip group, with my user agent.

Query string in this form:
http ://www.exalead.com/search?C=0MlEA.....

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3293 posted 4:21 am on Jun 24, 2006 (gmt 0)

Finally got around to checking current and old logs. No sign of the all-lowercase variety, via UA or IP. FWIW...

crawl103.exabot.com
"Exabot-Images/1.0"
/robots.txt

tc-gw.exabot.com
"Exabot/2.0"
/robots.txt

Granted, I've never allowed Exava under its Host, its IP, under Become's IPs as sub-domains of exava.com, or any of these bots --

"Exalead NG/MimeLive Client (convert/http/0.143)"
"NG/2.0"
"Mozilla/4.7 [en](Exabot@exava.com)"
"Mozilla/4.7 [en](BecomeBot@exava.com)"
"Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
"Exabot/2.0"
"Exabot-Images/1.0"

-- yet the last/newest requested specific .gif files, including some from an all-bots-restricted area.

Too many bots.

Too many (attempted) crawls.

Toodle-oo.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved