Google indexes site under development - opens the door for hackers - Google Search and SEO forum at WebmasterWorld

I have a site in development, and I left the door open. I mean, I have been building the new site at www.example.com, on a live server, and didn't bother putting any password protection on it.

It does, in some places, have Lorem ipsum and debugging information exposed. Most is banal, but some could potentially provide malicious visitors with ammunition for hacking the site. Database Table names, PHP error messages, SQL error messages, class names, columns and cookie values, session values, and things like that.

Well guess who came to visit! Yes, the googlebot. I didn't expect them to find it, but they did, and they crawled the whole work-in-progress, and it's now visible in their index.

How did Googlebot wander in? Hmmmm, well here are some facts:

1) I can say WITH 100% CERTAINTY that no one has ever publicly linked to this site. I am the only person in the world who knows that the site exists. Well OK, just me and my registrar.

2) Looking in their index at "site:example.com", I see some interesting URLs. One of them surprised me:
www.example.com/search/?q=scubamonkey
"scubamonkey" is a word I use sometimes when I'm testing things. It's just a more colourful variant of "foo" or "bar".

Did I leave a link to that in the site somewhere as I was working in it? I don't think I did... how could that URL have gotten indexed? The Googlebot would have had to type "scubamonkey" in the search box, and submitted a form. How likely is that?

3) Are they fuzzing, too? I found indexes to pages that don't exist, with invalid URLs that return a 200 OK Status (gimme a break - after all, the site is in dev). They follow the pattern of a real URL on my site, but the data in the querystring is totally whack.

I suspect that Google was using my toolbar to mine for new URLs. They were shoulder surfing while I worked on the site, and they came in later and crawled it.

Now that it's in the Google index, anyone can find it. Some of the cached pages show PHP errors that I really wish were not exposed in public.

I enabled password protection on the site a few minutes ago. 2 little 2 late... I should have known better.

Google indexes site under development - opens the door for hackers

httpwebwitch

Robert Charlton

tedster

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week