Forum Moderators: open

Message Too Old, No Replies

I'm on Google (and i don't want to be)

         

andmunn

3:38 am on Sep 10, 2003 (gmt 0)

10+ Year Member



I've been working on my site, and it isn't live yet....Yet just today, i noticed i had about 10 visitors come from Google...i've never submitted my site to any search engines, because it isn't ready "to be seen yet"...

Now my site isn't completed yet, and i don't want links pointing to my page, as i don't want to make a "bad impression" on people..

How can i make search engines temporarily stop indexing my page? I thought i had to submit my site to them, but it appears somehow they found me through other means...Any advice? (ps...how did i get listed on google in the first place?)

THanks,
Andrew.

jdMorgan

3:55 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



andmunn,

> How did i get listed on google in the first place?

Two or three possibilities, maybe more...
1) Incoming links from other sites
2) Domain registration
3) Using the Google Toolbar

Google claims that using the Toolbar should not cause your site to be indexed, but there are some here who argue that it does. I'm staying out of that argument until I have proof either way.

How to stop it:
1) Use robots.txt to disallow search engine spiders from crawling the site.
2) Use the meta-robots tag on pages you don't want listed.
3) Use .htaccess or ISAPI filters (depends on your server) to block or redirect spiders.

You might want to try using the WebmasterWorld site search to find threads related to all of the above subjects - there are many, many of them.

Jim

Mark_A

3:59 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you were found by a link somewhere, perhaps a newsgroup, forum posting or other, perhaps someone has linked to you ..

Why upload an incomplete site?

Build it locally on your computer, upload it when it is finished.

If you want to test pages / sites online then password protect them using something like .htaccess on apache

delete the pages now and or protect them, that way only the google cache will remain for a few weeks.

hth

** If you dont want people to find and see it
** dont put it on the www.

andmunn

4:29 am on Sep 10, 2003 (gmt 0)

10+ Year Member



hey :)

the reason i uploaded it to the web is because it runs of PHP and a MySQL database, and unfortunately, i cannot test these things of my local computer.

And, as a side note, i guess i could password protect them =) But i didn't figure i would be listed "on google" unless i was submitted to google....

And yes, i just downloaded the toolbar yesterday..lol, coincidence? I dunno...

Mark_A

4:37 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hmm I do wonder about that toolbar :-)

onlineleben

9:28 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> i cannot test these things of my local computer <
its easy to install on Windows computers. You also can find complete installation suites consisting of Apache, PHP, MySQL

As said by Mark_A, test locally and upload when satisfied.

Mark_A

9:36 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



incidentally there are some nameless individuals and firms that monitor new domains to start their spamming games extra early!

If any of they are reading ...
[unmentionable insults follow]

It is also possible someone doing this somehow got your domain (if new) into the public net .. though perhaps not liklely.

the_nerd

10:41 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



IF you look through this forum you will find 100s of questions about how to get a site into google.

Why don't you just put up a dummy index-page with no links on it to follow telling people to call later? If you try too hard to keep Google out - maybe you'll miss it later?

Quadrille

11:54 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you try too hard to keep Google out - maybe you'll miss it later?
Spot on. Not worth the risk.

Use robots.txt for now, then ensure there are some incoming links from launch day and rewite the .txt

BlueSky

12:13 pm on Sep 10, 2003 (gmt 0)

10+ Year Member



I'm in the same position of building a new site. I password protected it via .htaccess to keep out everything until I'm ready to launch it. The problem with using robots.txt is only the good bots follow it. I don't want to review logs every day on an unopened site to make sure the bad ones as well as humans stay away too.

It definitely does not take a link for the SE bots to find a site. When I registered this domain, I had two of them knocking within a day of registering. Site had and still has zero links.

plasma

12:46 pm on Sep 10, 2003 (gmt 0)

10+ Year Member



Go here:

[services.google.com:8882...]

It took less than 1 day for me to get out of the index.

Small Website Guy

6:39 pm on Sep 10, 2003 (gmt 0)

10+ Year Member



Having created active database driven sites myself, you can't really be sure it will work the same way until you upload the actual site and test it live. There are subtle differences between your own computer and the hosting computer.

A second reason for uploading a partially completed site is so that you can get feedback from other people (friends, co-workers, or whomever).

GoogleGuy

7:01 pm on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yah:
- robots.txt
- meta tags
- password protection via .htaccess
- search for url removal on Google to get our automated system.

If you want people to see the site eventually, I personally would recommend against excluding a bot when you're relatively close though. Once we exclude a site, it can take some time for us to re-crawl it. If you can manage, I'd just leave what you've got up and add to it. Just my two cents..

plasma

7:19 pm on Sep 10, 2003 (gmt 0)

10+ Year Member



If you can manage, I'd just leave what you've got up and add to it.

But don't forget that there are more bots than google hanging around :)
The waybackmachine e.g. keeps track of how your site changed.

BlueSky

7:29 pm on Sep 10, 2003 (gmt 0)

10+ Year Member



Hey GoogleGuy,

Can you explain how Googlebot finds new sites that have no links yet? Does he get this info from the whois database or is there some algorithm used which generates domain names for him to go explore?

martinibuster

7:45 pm on Sep 10, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



3) Using the Google Toolbar

That's not a hard fact, it's speculation. Officially, from GG, the word is that this isn't so. The toolbar theory is speculation. I'm not saying that it doesn't work- I'm only pointing up that this isn't a fact.

Just a friendly heads-up. :)

Sharper

1:34 am on Sep 11, 2003 (gmt 0)

10+ Year Member



Suggestion:

If you're going to set up a new site on your server and don't want it spidered yet, just don't point the DNS for the domain name to the site yet. In this specific case, now that the site is out of the bag, you could just switch the DNS to a "site" with just a single "coming soon" type page so that Google still has some minimal spider food to check back on.

Then while you are working on it, just add the hostname to your computer's host file so that you can use the name, but everyone else would have to specifically go to that IP address to see anything.

(Note: this isn't as _secure_ as a password scheme, etc..., but it's simpler and just fine as long as you don't create some sort of security violation in your code while modifying/testing it.)