Forum Moderators: open
# http : //domain1
rewriteRule /location1/car-rental /index.asp?p={1's page number}
# http : //domain2
rewriteRule /location2/car-rental /index.asp?p={2's page number}
# http : //domain3
rewriteRule /location3/car-rental /index.asp?p={3's page number}
None of the "sites" pages links to any other site except through a "Visit our Location2 Homepage" link.
Google, however, has found a way. It has indexed
http : //domain1/location2/car-rental
http : //domain2/location3/car-rental
and so on, for every similar page in every domain.
This is causing quite a few problems. Is Google ignoring the first "folder" (/location1/) and just reading the last piece (car-rental), crosslinking on its own? What's going on here and how can we prevent it?
Thanks!
- Bill in Kansas City, Mo, USA
Measure with a micrometer. Mark with a crayon. Cut with an ax.
[edited by: phranque at 12:35 pm (utc) on Jan. 28, 2010]
[edit reason] No urls, please. See TOS [webmasterworld.com] [/edit]
i'm sure if i precisely understand your problem but it looks like you need to externally redirect (301) the request to the correct domain based on the subdirectory before you internally rewrite the request to the asp script.
the RewriteCond directive [isapirewrite.com] will help with this.
As a backup method you may also want to update the asp page to check the domain being used and if it is the wrong one redirect (Permenent) it to the correct one based on the path.
For posterity, I did find the solution to the original issue: the client had inadvertently introduced incorrect links very deep in the website where it wasn't immediately obvious. Google was simply doing what Google does.
But I have one website which runs multiple domain names and tweaks the content based on that bit of information. I had to become a bit more defensive with this websites do to scrapers and other assorted trouble makers.
So I use a 404 error trap to capture requests for Robots.txt, which I feed into a Asp.Net Module (rules engine). This code checks the domain name (among other things) and directs it to either the (i) standard version (free to grab everything, within certain directories) or (ii) the other (bug off don't read anything or be banned version).
But in your case you want multiple different versions. Each one will relate to only one domain name which will only allowing certain directories based on the domain name with the other directories forbidden. This should keep the duplicates from happening in the future from showing up in the index.
Also you may want to adjust your coding on the dynamic pages to check the domain and do a permanent redirect if the proper domain is not selected as well, which will further reduce this problem in the future.