|Forum: Search Engine Spider and User Agent Identification|
|Displaying Topics 1 - 40 (55 total) Sorted by: Date-Last-Post, Direction: reverse|
|1:|| Server Farms IP Tracking Resource - February 2015|
WebmasterWorld members provide an extensive range of IP addresses and user agent names that can be used to keep vast numbers of scrapers off websites.
|Feb 23, 2015|
|2:|| At Home with the Robots: 2015 Edition|
An extensive review of robots, or web crawlers, and behaviour, good, friendly, or unhealthy.
|Feb 10, 2015|
|3:|| Blocking non-North American Traffic Made Simple|
Webmasters discuss how to make an amazingly small optimized IP block list that allows only North American traffic to access a website. The technique can easily be applied to other geographical areas.
|Apr 23, 2014|
|4:|| Googlebot Fails to Pass DNS Verification|
WebmasterWorld members have reported that an apparently valid Googlebot is failing DNS verification. Major impact for sites relying on Googlebot validation.
|Apr 2, 2014|
|5:|| The User Agent Whitelist|
WebmasterWorld members discuss methods for whitelisting good requests vs blacklisting bad requests.
|Feb 7, 2014|
|6:|| Dealing With WordPress Comment Spam Escalation|
"Just to see what would happen I enabled full comments on my WordPress blog and at first I just let the comments pile up in the WordPress moderation queue as I was curious how bad it would get since nothing ever got published.[br][br]It quickly ramped up from a few a day to 100s a day, peaking currently at over 500 spam posts a day."
|Jan 22, 2014|
|Oct 31, 2013|
|8:|| How to Identify and Block Fake BingBot Visits|
How do you identify and block fake BingBot visits to your sites.
|Apr 2, 2013|
|9:|| Filtering Out Really Hard To Find Bad Bots|
WebmasterWorld Members discuss how best to filter out unwanted, bad bots that are tough to find.
|Jan 18, 2013|
|10:|| Identifying Fake User Agent Strings|
User agents come in all shapes and sizes. Some, like the fake Googlebots, are easy to recognize, but what about those really long ones. WebmasterWorld Members help clarify the identification process.
|June 11, 2012|
|11:|| How To Block Thousands of Spambot IPs Hitting a Site|
WebmasterWorld members discuss the best methods of handling and blocking spambots with thousands of unique IP addresses hitting a site, causing bandwidth to rise from 1GB a month to 12GB a day.
|Dec 12, 2011|
|12:|| Microsoft Bot 157 Ranges Updated|
Microsoft's 157. range bots list updated.
|Nov 16, 2011|
|13:|| The Best Way to Keep All Spiders/Bots Out Of A Site|
WebmasterWorld Members discuss the issue of stopping bots from crawling a site, and keeping them out. It seems it's tougher than you might think.
|Oct 3, 2011|
|14:|| Yahoo! Slurp Ignoring robots.txt|
WebmasterWorld Members report that Yahoo's Slurp is ignoring robots.txt
|Sept 17, 2011|
This is the first time in nearly 9 years I've seen G blatantly disregard robots.txt and they're doing it with a GoogleBot UA.
|May 16, 2011|
|16:|| Stopping Scrapers From The Start|
"I'm putting a *huge* number of pages of content online. I'm looking to stop the scraping/copying/bots from the outset and I need bandwidth kept to a minimum."
|Feb 25, 2011|
|17:|| Google's Web Preview Spider|
"WebmasterWorld Members discuss the Web Preview Spider, whether it obeys robots.txt, and how to block it."
|Nov 19, 2010|
|18:|| Now Seeing Bingbot|
"Bingbot is now in the wild."
|Sept 29, 2010|
|19:|| Fresh IP's in MSN's Many Cloaked Bot Arsenal|
"No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven."
|Sept 3, 2010|
|20:|| Casper Bot Search Attempting To Infect Sites|
"Seen quite a few of these over the past few days, generally in groups of half a dozen-ish."
|July 7, 2010|
|21:|| MSNbot Changing to Bingbot on Oct.1, 2010|
"we will drop the beta designation from the Bing crawler and change the name of the crawler to reflect Microsoft's new brand for search."
|June 29, 2010|
|22:|| The Staggering Number of Tweet Chasing Bots|
Up to 20 bots now following twitter fire hose feed.
|May 8, 2010|
|23:|| Facebook Sues Data Scraper|
"Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web..."
|Apr 4, 2010|
|24:|| Updating The htaccess Bot Ban list|
"I'm sure most of us are familiar with the classic .httacces bad bot ban list for .htaccess that gets copied and pasted wholesale from web developer forum to forum (e.g.: http://www.webmasterworld.com/forum13/687.htm )"
|Dec 29, 2009|
|25:|| Comcast Launches Anti-Botnet Initiative "Constant Guard"|
"Comcast is taking a leadership role and making a huge step forward in the eradication of botnets."
|Oct 12, 2009|
|26:|| IP Banning Primer|
"I won't ask why you want to block IPs. But supposing you do, here's how to do it."
|Sept 21, 2009|
|27:|| Digsby IM Enables Web Crawlers Control of Your PC & Bandwidth|
Did Digsby just go darkside?
|Sept 8, 2009|
|28:|| MJ12bot Implements Ground-Breaking Validation Capability|
"...first distributed spider to provide validation for webmasters."
|Sept 3, 2009|
|29:|| Microsoft Launches Azure, an AWS Competitor||July 15, 2009|
|30:|| Microsoft Disables Live Search "Fake Referrers"|
"(Microsoft) are working on a fix for this."
|Apr 30, 2009|
|31:|| Bot-Blocking Methodology|
"WebmasterWorld Members discuss various bot-blocking methodologies."
|Dec 13, 2008|
|32:|| New Wave of SQL Injection Vulnerability Probes||Aug 29, 2008|
|33:|| AVG Stops Real-Time Scanning||July 7, 2008|
|34:|| Another Phorm Type Ad System Discovered?|
|June 20, 2008|
|35:|| AVG - Valid Security Tool or Malware - Part Two|
AVG anti virus latest update includes a pre-fetch link scanner tool that some are viewing as malware itself.
|June 14, 2008|
|36:|| AVG Toolbar Glitch May Be Causing Visitor Loss|
"Web sites with tight security are turning away AVG visitors with security toolbar broadcasting malformed HTTP headers and user agent strings."
|May 10, 2008|
|37:|| Yahoo! Slurp 3.0 Released on New IPs|
"The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for 'Yahoo! Slurp,' though it'll identify itself as Slurp 3.0 in your web logs."
|Apr 15, 2008|
|38:|| Default User Agents of Programming Libraries and Command Line Tools||Apr 13, 2008|
|39:|| Identifying And Analyzing Hostile & Friendly Bot Activity|
"The following items can be used to identity bots and slow down and stop most unwanted traffic if applied with proper due care."
|Mar 31, 2008|
|40:|| Quick primer on identifying bot activity.||Mar 29, 2008|