Forum Moderators: phranque

Message Too Old, No Replies

Possible Hijack or Scraper Attack

Fake but plausible URLs generating 404s in my server logs

         

scooby2

12:33 am on Mar 20, 2008 (gmt 0)

10+ Year Member



I need some help rather urgently as I think my site is being subjected to either a proxy or DNS hijack but not sure what.

Realise I can't give specifics so this is a made-up illustration.

Suppose my site is called best-beaches (it's not) and I might have urls like:-
foo-beaches/Australia/Bondi
foo-beaches/Spain/CostaBrava
etc.

What I've been seeing in my server logs for 2 days is a bunch of 404s for plausible looking (but non-existent) URLs where the referrer is also a page that does not and never did exist on my site. Like this

File does not exist; foo-beaches/Iceland, referrer foo-beaches/Iceland/Reykjavik

I'm getting hundreds, with numerous places and countries injected into a URL that looks like my real ones but that are nonsense (like beaches in Iceland, Antarctica, Vatican City - all made up but you get the idea).

Where do I start? Anyone recognise this form of attack and know how to prevent it? Obviously my key concern is that Google is also seeing these fake URLs somehow and starting to index whatever garbage is being placed there.

Many thanks in advance - really appreciate any help I can get on this.

Cheers.

phranque

3:46 am on Mar 20, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], scooby2!

have you noticed any pattern of user agents or ip addresses in your access logs?

Lorel

8:25 pm on Mar 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Try the apache forum for code you can put in your htaccess file to redirect any bogus urls to your home page.

scooby2

10:13 pm on Mar 20, 2008 (gmt 0)

10+ Year Member



Thanks for the replies guys.

The 404s are coming from a range of IPs, but there is one that stands out causing around 25% of them. Appears to be somewhere in the Phillipines (not a likely market for my site). I've added a firewall rule to drop packets from that particular IP but am still seeing the 404s from other IPs.

The User Agent for *all* the requests are set to "Opera/9" (from 9.1 to 9.17).

Still puzzling over why someone should bother - it's like a dictionary attack to guess URLs of my site but using place names; except the place names are unrelated to the subject of my site...

Have Opera users been infected with a bizarre geographical guessing redirect trojan... (;^) ? Or does this indicate some kind of distributed scraping (from multiple IPs).

If I redirect the bogus URLs to my home page, is there a chance Google will associate whatever it might have found at those bogus URLs (assuming they ever actually got resolved) with my site?

Might there be mileage in redirecting them to a less valuable page, or even a page specifically created for that purpose that I could apply a meta noindex to?