Forum Moderators: open

Message Too Old, No Replies

Purpose of this Crawler/Bot?

         

DXL

4:29 am on Jul 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I noticed a string of related IP addies showing up in my statistics program, all resolving to McColo. I've seen other complaints about them on this forum, but can someone explain to me why exactly they are crawling my site? Are they scraping for content, or something else?

And on a sidenote, I noticed 222.231.21.122 which resolves to Neowiz Corp in Korea accounts for most of the page hits on my site, and uses up 10 times as much bandwidth as any other host. There seems to be a debate over who they are and their purpose, should I ban them?

volatilegx

2:42 pm on Jul 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> 10 times as much bandwidth as any other host

Heh, I would, unless they accounted for 10x the sales of any other host.

wilderness

3:22 pm on Jul 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are they scraping for content, or something else?

DXL,
Unless there's a press release some place (hardly so in the instance of these non-mainstream bots) or a statement from the person administering the bot (again not likley), nobody except the bot owner has a substantial knowledge of what the spidering is for.

They are spidering (we all know that)!
They have NOT made any statement in a link accompnaying the UA as to their intent!

Each webmaster decides what is beneficial or detrimental to their website.
Each webmaster must also decide upon the appropiate time for action in denying a spider (for some it is far sooner and with a lower level of tolerance than others.)

Romeo

3:50 pm on Jul 31, 2006 (gmt 0)

10+ Year Member



... and to extend Wilderness's statement about individual webmaster's tolerance:

A good measurement is to ask this question:
"Will banning this crude rude bot hurt my rankings on public search engines like Google/Yahoo/Msn/Ask/(and a few others)?"
And in most cases the answer will be just "NO".
Exactly.

Kind regards,
R.

incrediBILL

8:47 am on Aug 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unless your site is of interest to asia, you may want to just consider blocking entire countries. I've been scraped so hard and long from a few countries asking for 100s of pages a second or more that it was just easier to drop them in the firewall than chase them from IP to IP every other day.

Cuts down on a ton of spam too if you block them at the firewall level ;)

DXL

6:55 pm on Aug 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I tried blocking their IP addy and my entire site crashed immediately afterwards (hosted on ipowerweb). I had to delete my .htaccess file for the site to go back up. ipower's support staff are generally pretty inept, they couldn't explain to me why that was happening, so I guess I can't block anyone without my site becoming one giant 404 error to anyone who visits.

wilderness

7:04 pm on Aug 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ipower's support staff are generally pretty inept

I had similar problems with my host (that I've been with for more than five years) when I began using htaccess.
Even made attempts later to assist others in hosts forum, however few were interested.

Hosts generally haven't used htaccess and really have no reason to understand it or even explore it's capbilites.

The reason your site crashed was because of a simple syntax error in your htaccess lines.
It happens to all of us in haste. It took me quite a while to make sure that I check that sites are functioning after making an adjustment in htaccess.

Perhaps if you provide the your file here (if it't not too big) others may assist you?

edited by wilderness.

Ipower has redirects in their control panel
[ipwhelp.com...]
[ipwhelp.com...]

There may aslo be a similar CP option for denying visitors (my host has one in their CP, although I never use it.)