Forum Moderators: phranque
Whenever I'm looking at my sites taffic report using awstats, I always look at the hosts section. Any host/IP that exceeds the amount of combined traffic from the administrators I look at very closely. Using this metric I've found scrapers, bad bots and even content thieves/media monitors.
These media monitors will monitor sites and suck down content and then email it to their subscribers. Some only suck and use a paragraph with a link back to your site, some will suck to your breasts hurt. The problem I have with these media monitors/scrapers is that they charge for your content and you get nothing in return.
For a few months now, Bacons Media has appeared on my radar. After visiting their website, I found that they seem to be a pretty respectable PR company and have been around before the internet.
Should I ban them? What are others opinions on this subject?
TIA!
Sorry, but we'll need more info to help you out with that one, e.g., what specifically about the visits makes them look suspicious? Rate/frequency? Bandwidth? File types? User-agent [psychedelix.com] (UA)? Etc. (Aside: If a mod edits the company name in your post because you may be seeing an innocent, if avid, visitor, you can still reply with info about the visits using "example.com" or "123.456.789.XXX" and the like.)
Also, seeing as how it looks like you're new to this site --
You'll find robot watchers (and lovers and haters) galore in the "The Search Engine World [webmasterworld.com]" forums. And for the most info (and opinions:) about the broadest range of bots -- a.k.a. crawlers, spiders, link-checkers, media monitors, server info seekers, page- and site-whackers, site-rippers, 'referer log'/guestbook spammers, offline downloaders, feed-seekers, speed-up extensions, pre-fetch or 'load links in background' settings, ANYTHING designed to ignore "robots.txt" and/or automatically suck things up and/or speed things down -- check out that area's "robots.txt [webmasterworld.com]" and "Search Engine Spider Identification [webmasterworld.com]" forums.
Plus if you're on an Apache webserver and you're mod_rewrite capable, check out the "Apache Web Server [webmasterworld.com]" forum for how to control and/or kick those bots not heeding robots.txt (of which there are literally thousands, alas). Ditto bots, etc., cloaked behind common UA strings, host names and IP addresses.
Generally speaking, after you spend time in those linked areas, you'll be able to decide if any suspect visitor, including any robot, is worth your bandwidth.
But one word of caution if you're newly bot-curious --
Bot-watching, like the Internet, is an addiction for which there is no cure:)