Forum Moderators: open

Message Too Old, No Replies

Googlebot indexing the wrong site

.nl site with no incoming links, looking for pages from the .com

         

HitProf

4:38 pm on Jun 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot is now spidering my .nl site, which does nothing but load the .com site in a frameset. The .nl domain is used only to serve Dutch visitors a Dutch starting page. All pages sit on the .com domain but Googlebot is trying to get them from the .nl domain. How is this possible? As far as I know the .nl domain has never been promoted and there is not a single link pointing to it. The .com site seems to be mostly ignored by Googlebot. It is relatively new but has a few incoming links.

Any advice? Thanks in advance.

takagi

1:04 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks like the problem mentioned in Wrong URL at my site listings [webmasterworld.com] about 2 sites being treated as one.

1. Same server (same IP)?
2. Is or was there a redirect (301 or Meta refresh) from one site to the other?

Dolemite

1:13 pm on Jun 14, 2003 (gmt 0)

10+ Year Member



Sounds like either a DNS or name-based hosting issue to me.

shaadi

1:17 pm on Jun 14, 2003 (gmt 0)

10+ Year Member



Wrong URL at my site listings [webmasterworld.com]

I still feel like its hand coded :( no matter how much I convince myself that its not and its a bug!

Back to the topic, HitProf I think its not much of a problem as both the sites are owned by you, I guess you have to put robot.txt to proper use.

HitProf

5:47 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the suggestions.

Takagi: they run on the same server and there are no redirects. Until recently the .nl has been redirecting to the .com but never the other way.

Shaadi: I could ban Googlebot from the .nl site (both sites are mine) but I want to split the site some time soon so I don't want to.

I think it's a bug in the way Googlebot handles frames. The .com is a small non-frames site. Most internal links are relative (yes, I know...) to make splitting easier later on. The .nl loads the Dutch part of the .com-site into the main frame. Googlebot should know another site was loaded and spider that one accordingly. In stead it is now requesting pages from the .nl domain as if this were the .com site.

HitProf

11:46 pm on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It looks like these false results are now showing on -ex.

HitProf

10:06 am on Jul 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I thought I had solved the problem by putting up a robots.txt with a disallow for the entire site but now Googlebot is walking right through the barrier :(

It's indexing pages from the wrong site again :(

Issued a report to the googlebot team, but I don't think that will help for this round.

HitProf

1:08 pm on Jul 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Grin!

It did help.

I've seen a lot of activity in my logs from various Google urls's during a couple of days. One of them had something like 'autobadurl' in it. Next morning all .nl pages were gone.

I laughed my butt off when I received this reply from Google:
"Upon reviewing the Google Search Results, your site was not found"

Anyway, problem solved.

Thanks guys (and girls) at the GooglePlex!

Now let's wait and see what happens when I lift the disallow in the robots.txt... :)