homepage Welcome to WebmasterWorld Guest from 54.226.147.84
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
New spider that is actually good.
found a "good" new spider
jcrimi

5+ Year Member



 
Msg#: 3055113 posted 12:10 am on Aug 22, 2006 (gmt 0)

I use awstats to keep an eye on where my traffic is coming from. A few weeks ago I noticed the infamous Unknown robot (identified by 'crawl') line reserved for those bots the program doesn't recognize. As the days went on this sucker generated 150 megs of traffic. I'll be the first to admit I am a newbie in this realm, so you can imagine the thoughts going through my head of some evil entity raping my website. I knew I needed to find out what the heck this was and then learn how to block it.

Thanks to the informative info here on WW, I learned I needed to open my access log and do some investigating. First off, tell me there is a better way to look at this file because my eyes are still bleeding from the wall of text. After about 30 minutes of searching and comparing I found a recurring line of text ~ pronto.com/robots.html. I was looking for ip addresses that were the same but this thing kept coming in from different ones, which is why it took me so long to figure it out. ;-)

Anyways, this bot has been taking the product information from my website and adding them to a comparisson shopping directory which many of my competitors are also on. I checked the areas the bot had visited and sure enough, my products are listed. The site seems to be in beta so I guess they are just seeding it. Either way it's nice to get something free for a change. I'll have to keep an eye out for how much referring traffic/sales I get from it.

 

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved