Google malfunction: Google is spidering Proxy-Server

Forum Moderators: open

Message Too Old, No Replies

Google malfunction: Google is spidering Proxy-Server

Danger?: Duplicate Pages?

hallooo

1:56 am on Apr 14, 2003 (gmt 0)

Some minutes before I noticed that the numbers of results for my company-name raised.

I checked the results and noticed that Google spidered some Proxy-Server.

Nearly all my 200+x Pages can be found under

www.MyDomain.de (as always)

and

belediye.nameltd.com/cgi-bin/nph-proxy.cgi/111110A/http/www.MyDomain.de

www.wakedogg.com/cgi-bin/nph-proxy.cgi/000110A/http/www.MyDomain.de

[proxy.citizenlab.org...]

www.vnphys.org/cgi-bin/nph-surfweb.cgi/11110/http/www.MyDomain.de

Now I am really worried that Google is checking for duplicate Pages and will find my homepage and the copies on the Proxy-Servers.

GoogleGuy – Protector of the ignorant and fearful.

Please help

jdMorgan

2:58 am on Apr 14, 2003 (gmt 0)

hallooo,

Welcome to WebmasterWorld [webmasterworld.com]!

While I've never had to deal with this kind of problem before, it seems to me that you could take action to stop it.

Set up your server to refuse connections from these proxies, or refuse referrals from them, whichever is appropriate. Without knowing how Google "sees" your site through the proxies, and what kind of log entry it leaves when it spiders through them, it's impossible to tell. But you should be able to do something about it on your server to prevent problems with search engines and to avoid other possible problems.

Jim

hallooo

8:58 am on Apr 14, 2003 (gmt 0)

Thanks djMorgan

But I don’t think this is not the Solution.

Reason 1:
Even if I find a method to refuse connections from these proxies, there are thousands of these proxies in the net.

Reason 2:
It’s not quick enough. The proxies will not delete the pages before the DeepBot will spider the pages.

Reason 3:
I am already in the Databank of Google, together with the duplicate pages of the proxies.

In the moment Google will start to look for duplicate pages I am lost!

Last month I changed the URL of 12 Pages.
I marked the old Pages with NOINDEX,FOLLOW an set a link from the old Page to the new page.

I thought that Google will find the old Page before the new Page and everything should be OK.

During the last 4 weeks the FreshBot visited each of these pages at least 10 times. What happened?
- 4 Pages got killed because of duplicate content (If Google compare pages he is probably ignoring NOINDEX or using (sometimes) old pages of his databank.)
- 4 Pages are still in the index with the old URL
- 4 Pages have changed correctly

The conclusion: Google is only comparing the pages in his databank. If he finds 2 similar pages he will react. I don’t think he is checking the actual status of the Pages before he is deleting the duplicate pages.

Help! What can I do? I am really anxious.

GoogleGuy

10:32 am on Apr 14, 2003 (gmt 0)

I'll check it out. Don't worry.

vodkabird

11:52 am on Apr 14, 2003 (gmt 0)

I've found the same problem for my site; not only the proxy servers but the pack.soksok.jp DoCoMo pages as well.

Will Google be addressing these problems?

Thanks

hallooo

10:38 am on Apr 22, 2003 (gmt 0)

I’m still worried that Google is checking for duplicate Pages and will find my homepage and the copies on the Proxy-Servers.

The situation hasn’t changed.