Forum Moderators: phranque

Message Too Old, No Replies

Can a hacker make your site unspiderable?

server breach robots.txt file alterations

         

Whitey

6:38 am on Mar 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We operate a cgi perl application on a MySQL/Linux platform operating a central db serving dynamic pages to around 15 web sites at the time of the following event.

Around 25 - 31 Jul 05 our server was attacked by a hacker who breached our security [ we had weak passwords at the time] . The hacker demonstrated some familiarity with our industry and the software that we use by the use of system knowledge of navigating it.

We identified the hacker's activity, who used dynamic IP address' to try and conceal their identity, over a prolonged period of 5-6 days re enter and make alterations comprising of the application of robots.txt files into the many web sites. The hacker stopped when we upgraded security.

We first suspected a problem around 3 Aug 05 when our sites started to be eliminated from the Google and Yahoo caches. Yahoo kindly identified the application of robots.txt files to us [ upon seeking help ], Google provided a standard reply. The cache date of 25 Jul 05 is a consistant date of last caching over most of the site, with supplemental indexing of those pages [ mostly but not all ].

The matter was refered to the local police and is currently under a prolonged investigation by Interpol.

Until this date Google was caching us every few days. From this date onward all of our websites failed to cache, even when we removed the robots.txt file.

My suspicion is that something triggered Google to stop spidering us or upset the bots. Most of the pages are supplemental now, although we did have potentially some duplicate content on ths sites - which is now fixed.

Yahoo mostly returned to normal, although all pages are still not performing correctly.

Can anyone think of anything that may have been damaged or altered, that could have resulted in the collapse of our caches and therefore our results, or are we the victims of a supplemental indexing, either co incidentally, or because something has triggered the current "suppression".

Whitey

11:12 pm on Apr 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



sorry for the delay in answering your question - Tedster - but re crawling, it appears this is going OK

We're up to 178k of cached pages as of today. Pages are indexed. Old pages have retained their PR but not yet showing at former positions in the serps.

I'll keep this updated in case it's helpful for anyone to benchmark their recovery to.

Whitey

11:16 pm on Apr 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does Yahoo also have a robots.txt suspension period like Google?

tedster

12:15 am on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know for sure. Are you seeing Slurp in your server logs - asking for pages and not just robots.txt?

marcs

12:28 am on Apr 2, 2006 (gmt 0)

10+ Year Member



As you're being spidered again, you should be OK.

Just wanted to point out that modifying the robots.txt file is but one issue. Depending on the severity of server breach, a new firewall rule could be set up (or a few) to block spiders on that level.

Whitey

8:51 am on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



tedster - yes we have Yahoo Slurp moving through. Not as many pages as Google though.

Whitey

8:53 am on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Depending on the severity of server breach, a new firewall rule could be set up (or a few) to block spiders on that level.

Can you elaborate before i send this over to our server administrator?

This 36 message thread spans 2 pages: 36