Forum Moderators: open
Pretty standard stuff, really.
We have NO duplicate content, and we are not playing any other search engine games.
But the Yahoo robot seems to take what it learns about one website's structure and tries to access the web pages by the other domain names hosted on the IP address.
I do not understand.
The only links to our pages are by the appropriate domain name.
Here is an example:
domainA.com/contentA.html
domainB.com/contentB.html
domainA is never linked to contentB.
But Yahoo will try to look up all of the following permutations:
domainA.com/contentA.html (OK)
domainA.com/contentB.html (NOT OK) !
domainB.com/contentB.html (OK)
domainB.com/contentA.html (NOT OK) !
I am blocking these requests, but this is making me very angry.
We are on IIS, not that it should matter...
Is anyone else experiencing this offending Yahoo spider behavior?
I see a message here:
[help.yahoo.com...]
That says:
"Sites with numerous, unnecessary virtual hostnames" are considered unwanted.
I wonder too if this is somehow related?
Thanks
Are you perhaps using the same naming convention for html files in both sites, so what you perceived as a Slurp mixup of IPs could be just those 404 probes?
In my case (see posts in abovementioned link) the ratio of "probes" by Yahoo-Slurp for non-existat URLs were 1:10 of the "legit" requests, which raised a red flag.
It cost me a coupld of hours of debugging a problem that didn't exist.
Dimitris
There seem to be considerable bugs with the spider.
Some acknowledged and some still undisclosed.
This problem that I highlighted is in fact real because the pages are now making it into the Yahoo index. I just checked today.
So we will probably have the "duplicate content" penalty on some of our sites.
So I have to move all of the sites off this virtual hostnames approach.
Thanks Yahoo.
until someone can show Yahoo is not penalising sites simply on the basis of inbound links then i dont see the difference between that and any site on the same IP.
What do you mean "penalising sites on basis of inbound links"? "Many links" or "bad links" (ie bad neighbourhoods)? Any references to back this? I believe Yahoo team people here said that they view sites with many inbound links favorably.
Also IPs get re-allocated by hosting companies. How am I supposed to know if the IP I'm going to get for the next site isn't blacklisted due to previous owners?
I just see the overall goal from Yahoo as being to reduce their idea of spam and if doing that means huge collateral damage then so be it. This may be why they are now giving a route back in.
You mean, after excluding 1000s of sites from SERPs via algo, giving a manual (via humans) override? Only the very wealthy, or determined professional webmasters (and with lots of time on their hands to frequent here) would take that route.
None of this makes much sense to me.
It has been suggested before that yahoo are penalising on inbound links. I havent seen a denial of this. It could anything from a single bad link from a bad neighbourhood defined by Yahoo to a level of links before triping a filter. I have no idea. All i am saying is some guys have suggested inbound links can take you down and ive not seen it refuted. It may have been but i havent seen it.
I am not talking about shared hosting.
I am talking about having multiple domain names on the same IP and in the *same* web hosting account.
Kinda like domain pointers. Or even domain aliases.
Each domain pulls up unique content through the "magic" of my favorite scripting language.
And all sub links point to unique directories.
One domain name does _not_ share content with another, but if you were to type in a directory for another domain, the content would show.
This is what Yahoo is doing.
There are not following links. They are "guessing" quite correctly based onthe shared IP.
Now that I have thought about this.
This is not the best way to save a few bucks, but Yahoo is the only search engine that has forced my hand.
No other search engine "guesses" links like this.
And I think it is wrong.
Oh well. It is a lot of work ripping out domain names and their content. But our fault I guess...