Forum Moderators: open

Message Too Old, No Replies

Getting slammed with phony robots

Is it my site or niche? Are others seeing this?

         

larryhatch

12:54 am on Nov 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I check my error.log file regularly. Usually its a few hundred bytes long per day,
mostly mis-spelled filenames, or image name-changes to fend off hot-linkers.

For the last week or two, instead of maybe 500-200 KB in length, error.log is up in the tens of thousands.

Much or most of that is from bogus / home-brew spiders sucking down my whole site.
Those hits go into the error file because the weenies convert my URLs to all lower case,
while most of my URLs are actually in UPPER case.
My host servers are case sensitive so naturally all those get kicked out.

Doing WHOIS for DNS numbers finds a mixed bag: Vancouver, BC and Toronto in Canada .. Dallas, TX .. Oklahoma City .. Rural northern Washington State, and now Vienna Austria.

Checking my access.log files for the same period, about a week now, I see even more of the same,
only with valid filenames. These suck up all my .html and .txt files, but no images.

What is going on? Did somebody give away all kinds of site-ripper software or what?
Is anyone else seeing this, over the last week or two especially?

There are no indications of the software doing the spidering, so no help there. -Larry

Brett_Tabke

3:50 pm on Nov 26, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> Much or most of that is from bogus / home-brew spiders sucking down my whole site.

Welcome to the club [webmasterworld.com] Larry. Hot towels on the left - exec wash room key on the right - thursdays are league day - no spikes in the clubhouse ;-)

> Doing WHOIS for DNS numbers finds a mixed bag

Ya - proxy bots. someone took a reall hard core interest in your site. And now they are monitoring you.

larryhatch

7:38 pm on Nov 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello B_T: Very good to hear from you.

My main question is/was whether the most recent phony -bot activity was directed at my site or niche in particular.

I think it was a very gutsy move to kick out all robots, good and bad. Personally, I would not have done it.

Its so damned cute when Excite or Altavista come back from the dead. Who an I do deny then a little cat food?

It is the anonymous robots-from-hell that worry me.
If I knew what they were doing, I'd feel better.
I don't, so I don't. I also don't have or know of a magic pill for robots from nowhere.

- Larry

larryhatch

7:45 pm on Nov 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh, I forgot something:

May I take this as a piece of irony or (at worst) sarcarsm:

" Ya - proxy bots. someone took a real hard core interest in your site. And now they are monitoring you. "
.. or were you trying to tell me something else?

I can be really dense at times. I'm not into conspiracy theory, never was. Best wishes - Larry

Leosghost

8:22 pm on Nov 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One can buy a bad bot with hosting and all for a dollar ( no need to know how to make one or even to understand properly what a bot does ) from a lot of places out there ..some of them even offer for a few dollars more the deluxe service ..

moving "lairs" ..

specific "sic ems" ..

hand holding on entry to target by bot master ..

and all the other stuff that's been plaguing Brett ..

not counting all those compromised boxes that are hosting bots that dont even know that they are participating in this ..( cos the critters dont always come from the mother ship ..and mommy and aunties etc moves fast for money ) ..and breed rapidly ..

plenty of places that sell bots by the multiples with thousands of holes to come out of ..

think of em like insects ..

The darkside is full of insects ..

imagine if you will "starship trooper" ..

( and that is not a dig Larry )..

larryhatch

1:06 pm on Nov 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, anybody can run a cheap bot and suck down your site or mine at will.
What GOOD does it do for the weenie running the bot?
Unless somebody just loves the content and wants it online and off, I don't see the point. -Larry

DamonHD

6:39 pm on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi LH,

1) It seems that some people are too lazy or greedy to wait for the next page to load and so try to take the entire site without bothering to think how long it would take or if the site is even finite in size! It's being going on for years and I've had to have protection in place against it for years...

2) Spidering for email addresses. I have had my (current) email address(es) on the Web since '92/'93 and get up to 40,000 SPAMs per day. Also being going on for years. I now disallow most access from proxies and compromised machines (see the SPAMHAUS "xbl" list) to try to avoid the problem getting worse...

Rgds

Damon

larryhatch

9:59 pm on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Damon: Yes, all that is understood.

I'm wondering if there has been a sudden spate of such spidering in the last 2 to 3 weeks,
well above the background noise. That's what I see here and wonder if my niche / site is targeted. -LH

DamonHD

4:08 pm on Dec 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi LH,

OK, that question I cannot answer.

I am not seeing anything out of the ordinary at the moment beyond the expected rise in traffic at this time of year...

Rgds

Damon

larryhatch

4:57 pm on Dec 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Damon. I had to ask.

From here, it looked like every weenie from Tulsa to Timbuktoo got Site Ripper software for
their birthday present about 3 weeks ago, and are now starting to get bored with it.
The ripping is starting to slow down.

Oddly, in the middle of this, my site got into 'Yahoo Buzz'.

Whammo! Traffic up 800 percent for two days. That blew over and now its back to abnormal.

Another odd thing. Half of the rippers get nothing but 404 errors.
Most of my URLs have upper case filenames, but many ripper applications default to lower case.

I also see attempts to get into my non-existent "vti-ssomething-or-other", cgi-bin
and other stuff I don't have or use. I presume that's mostly email spammers.

Two Russians in London were trying to download /msn.text and /adsense.txt from my main directory!

Sorry Vladimir, say hello to Maxim. -Larry

larryhatch

2:24 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now some weenie in/near Sterling, VA with an AOL account downloaded the HEAD (only) for the same
15 KB .gif image, around 60 times in a row without stopping for air. Makes no sense to me.

I didn't thing an image HAD a head. -Larry

jdMorgan

2:40 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Quick note...

All AOL users show a Sterling, Va. location, because that is where their proxy server datacenter is located. I think you've been hotlinked, and the AOL proxy server does a HEAD request to see if the image has been updated. If not, it serves its cached copy of the image.

Jim

larryhatch

2:57 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi jdM: Good info. I didn't know AOL all went thru VA.

I understand about checking if an image has been updated, but why check scores of times in a row? -Larry

jdMorgan

3:03 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it the same IP address each time? (They have hundreds of these proxy servers)

Does you server send a "Cache-control: Must-revalidate" header [webmasterworld.com] on images?

Jim

larryhatch

7:18 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi jd:

Its the same IP, 60-80 times in a row, and all at once, then they went away.
I can't remember seeing this quirk before.

I have no idea what my server sends w/r to cache-control.
I certainly did not set up anything like a 'must revalidate' header.
I didn't even know images HAD headers! -Larry

DamonHD

11:47 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi LH,

A HEAD request should yield a response containing exactly the same headers that a GET request would, but without sending any of the actual data.

Typically it might be used by a spider or proxy or browser to check if the timestamp, length, etc, of a page or image had changed since the last fetch.

The image or whatever does not itself have a "head" per se for these purposes. It's just your Web server saying what it *would* return if a full GET was done.

Rgds

Damon

larryhatch

12:15 pm on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Damon:

I take it then, that any engine can call the 'Head' of an image, see if its been updated or whatever.

What I don't see is why some weenie on AOL would ask for the same info 60-80 times
in the space of maybe 15 seconds. -Larry

kpaul

12:44 pm on Dec 4, 2005 (gmt 0)

10+ Year Member


i've had a theory that some are using it to 'take down' sites. kinda like a DDoS. i've been seeing it hardcore now for ... well, over 12 months. i wish they would stop.

i think i need to join that club, brett, and do some more reading to see if i'm missing anything in dealiing with it.

best of luck, larry. my guess is you were doing good and one of the spammers decided to start causing problems. if you're spending time trying to stop them, you're not making new content, etc.

jdMorgan

9:49 pm on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



larryhatch,

I gave you the link above to test your headers, image or otherwise.

If y'all are having problems with requests from AOL, report them to AOL. AOL has a tendency to shoot first and ask questions later (if ever). They're very uptight about their network security. If someone is abusing your site through AOL's proxy then that someone is also abusing AOL's proxy. While AOL might not act to protect you, they will certainly act to protect themselves...

Jim

DamonHD

9:54 pm on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi LH,

There is no good reason that I can think of to bomard you with repeated identical HEAD requests in a short space of time.

Maybe someone was using/testing/developing a broken bot? Any piece of crud that can lowercase/truncate URLs and expect them to work is easily capable of multiple redundant requests IMHO...

Rgds

Damon

larryhatch

2:14 am on Dec 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Damon: A broken or misused robot sounds very likely to me.

Anyone still using AOL is unlikely to be the most sophisticated of visitors.
You should see the spelling, syntax etc. in the 'fan mail' I get from AOL users! I cringe when those come in.
They ask questions nobody can answer, or are impossible to decipher in the first place. -LH

StupidScript

7:41 pm on Dec 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Think "AdSense" ...

How is it possible to rake in several thousand dollars per month with AdSense, as so many claim to do?

Put up hundreds or thousands of sites.

Where does the content come from, since the previous AdSense scammers favorite, 'directory' type link farms, are so easy to detect?

Grab a few hundred 'real' pages for AdSense to key off of.

Once you're set up, use the same rogue proxy network and some free and easily-found scripts to make the rounds, clicking on your own AdSense inserts.

Baddabing ... free money.

Google found you out and kicked you? Boo hoo. Another account, another round of scrapings, another round of clicks. Etc. etc. etc.

It's not 'you' they want ... they need content to cloak themselves in.

See the "Blocking Bad Bots" thread in the PHP forum for a neat set of concepts that may help out.