Forum Moderators: Robert Charlton & goodroi
www.mydomain.com/
Similar pages
In addition, my Yahoo listing disappeared as well. I then did a Yahoo search for pages with my domain included and found most of my interior pages indexed but not my home page.
What happened? Is it possible my site was not ready to be crawled when Googlebot and Slurp robots visied my site - simultaneously?
This is a very "white hat" site - no tricks at all, just good content...
I went and manually requested my site be spidered on both Google and Yahoo, and sent an email to Yahoo requesting any explanation as well.
Is there anything else I can do? Any ideas of why this happened?
Let us see how much you will laugh by placing the address of your main website here, in this forum.
This time your pain will be associated with the grief of your site, not from laughter.
You fail to see the wood from the trees. No contribution and zero input into this problem other than a tumultuous burst of vilification.
Please place your site URL here, in this forum. I will place a few links pointing to your site, it should give you an increase in pagerank, right?
Let us see how much you will stick to your belief regarding the comments you made.
As for sending referrers when spidering a page, that doesn't really make sense. If you have 5000 inbound links, do you really want Google to request the same page 5000 times with different referrers? Robots aren't browsers and they generally don't follow links. They just add the linked-to URL to their database (if it's not already there) and it'll get spidered along with all the rest at some future date. The problem is that Google is associating the content returned for the target URL with that URL and all URLs that link to it, rather than simply recording the fact that the redirect URLs are redirects and ignoring them when generating search results.
Perhaps the Request URI can be checked on your site, if it does not match the page you serve a blank page, this could work, I'm going to go check if google actually sends the wrong request URI with hijacked pages.
Are you suggesting that a "referrer" only when googlebot has detected your site via a 302 redirect, is a bad idea?
Please explain again why you think it is a bad way forward.
I don't think a site has the right to pump 302 directives to a site that does not want them. Who wrote the scripts? Was it a computer whiz kid, is the script googlebot friendly? what else does the script do? is a meta refresh generated during this process?. I beg you to answer that one especially. Where you aware of that? And are you aware of the residue effect……I won’t explain that one, Please let us know your version.
So, really, you are saying that anybody can create a manipulative script, say a variant of NukeModule GO-PHP that has been modified to hijack your site through an intricate deceptive process that many people do not understand, and I doubt you do, as being OK to live with and we just sit back and watch our sites disappear into oblivion in results? Until google sorts it out, right?
We are trying to help find a solution, action, not speculative gestures, is what is needed.
Google must also allow the target site to remove the offending redirect. It can easily be done, very easily indeed and this may help thousands of people who have lost their hard earned cash and websites.
I don't know if this has been suggested before but perhaps the robots.txt directive could be extended to prevent crawling from a redirect. For example:
User-agent: *
Disallow: /do
Disallow-redirect: /
would disallow the do directory for crawlers following regular links and disallow the entire site if the crawler comes from a redirect.
A robots.txt based redirection filter would probably be easier to use for many webmasters since not everyone is using Apache or php on the server.
No contribution and zero input into this problem other than a tumultuous burst of vilification.
No contribution?
I explained to you that many large legitimate sites like search engines and alexa redirect to monitor actual click-thrus, other than Google all the rest seem to redirect.
I'm sorry if that blows your mind.
Because of an error on my part with a subdomain that I set up, google indexed 366 files under the subdomain. A few weeks ago I used the removal tool and submitted a request to de-index items at anyone.ourdomain.com/ Got this reply today.
"Thank you for your reply. This message is to notify you that anyone.ourdomain.com/ has been removed from the Google index. You do not need to take any further action. Please let us know if you have additional questions or concerns."
Thing is, as of 1 hr ago all 366 entries are still showing in the index when doing site:anyone.ourdomain.com
How long does it usally take before the files are washed out of the index and will not show up under site:anyone.ourdomain.com
And, will all entires be removed or just the index or default page for anyone.ourdomain.com/
The domain is now inactive in the DNS
If anyone knows please share.
Trawler
They rely should ban all redirecting sites and go from a clean www.domain.com as a index site and not ww.domain.comgo-php/blabla.., alwas look for ww.domain.com and go from there.
Well as said I dont think can do anything more, then wait and hope the best or we should start doing this our selfs if it is realy that kind of sites google wants, I dont wait 4 month more thats for sure, then we will see what a real webmaster can do with bad tricks, but which obvious loves.
sad sad sad
How do you make a copy of it with Google cache?
What kind of problems are you dealing with?
Use your Browser and do a "save as" it will save background colors, images, text, links and etc. You'll have to open in your browser to see it and anyone you send it to tell them to open in a browser.
The problems I was dealing with in that one instance was a client's competitor taking text right off my client's home page and put it on his own page or he would use that text in ads he was posting around the internet. Having documented proof with a date on it via a 3rd impartial party, proving who had the text up first convinced the sites/hosts that controled those pages/websites to take them down.
IncrediBILL
Your anxiety about Alex redirects is misguided as many search engines (like Yahoo, Lycos, Excite and others) and directory sites redirect to your site in order to count the traffic as people click on individual websites.
If they use a tracking script OK, but not a 302 redirect.
Gmiller:
Complaining about Alexa legitimately using 302 redirects is just silly.
Since when is posting a 302 redirect to my site silly?
You think that these directories and many search engines plainly and cleanly count clicks through their systems.
I think you must be living on another planet or in a utopian society where there is a total absence or crime inhabited by sinless altruists of divine proportions. And alexa and the likes revered as the proclaimed deities of proficiency.
Good job we are not talking about reciprocal link exchange, you would advocate that no link pages exist in robots denied pages, or java routed pages that bots cannot access etc. There are even more tricks out there than you may think.
The internet is a dirty playing field with no referees with many casualties that have nowhere to look other than to seek advice from us guys. We should help muster up something to read not muster up a mud fight like I am doing, sorry about that.
OK, Yes many engines like alexa redirect. So what do we have.
Let me tell you. A wild and uncontrolled mixture of deceptive redirects. And many of them cause googlebot problems because these scripts are deployed by anybody and everybody. Hell, I set up my own CGI pearl based that causes a very nasty 302. Done deliberately to retaliate if needed. Working on it I am sure I can get it to dynamically generate a ZERO second meta refresh as well to the targeted page.
Is the guy implementing the scripts at alexa in uniform to a particular directive regarding the serverside redirects?, and is his script compliant to INKTOMI, GOOGLEBOT, MSNBOT etc, etc. I certainly did not conform and I did it to cause a definitive 302 and I expect it to damage the target site I point to. I have never used it.
Do you know the person in alexa that oversees the scripting language that has been incorporated serverside to work the script directives. Do alexa describe anywhere that their 302 redirects are approved by google.
Why is alexa allowed to place a LOCATION PROTOCOL to your site in their server.
Google specifically indicates without a shadow of a doubt that they are against sneaky redirects. There are so many methods to create a 302.
The only good script is a script that bears the hallmark of google’s approval. Alexa must disclose whether their redirecting method is indeed been approved by google.
Look at it this way. It is googlebot that will be given the instruction on the server level. You simply cannot be sure that googlebot is not being manipulated at this critical point, whether intentionally or unintentionally. Yahoo had already defined an overview so are we to assume that the 2 bots are identical, why is there a difference then between them.
just throwing out ideas-add your own!
Redirect 301 /index.htm [wxy.com...]
Redirect 301 /abcdefg.htm [wxy.com...]
P.S. is the way that about.com links using the following format:
zzzzzz.about.com/gi/dynamic/offsite.htm?site=http://www.xyz.com%2Fxyz.htm
Considered a BAD 302?
[edited by: MikeNoLastName at 11:46 pm (utc) on Mar. 8, 2005]
I’m amazed this isn’t the topic of every Web newsletter on the net this week, but it isn’t. We’ve all heard the old saying “that will never happen to me”. After seeing a friend go through this a few months ago I foolishly said “that will never happen to me”. Well it has and I can assure you if your website or sites are good which I suspect they are it will happen to you in the very near future if it hasn’t already.
The ball is in Goggle’s court to do something. We can suggest ideas all day long however
It’s up to them to act. There’s power in numbers so please take five minutes of your evening tonight to drop Google a quick e-mail telling them how your site has been ruined by hijackers or if you’re one of the lucky ones who hasn’t been hijacked yet your concerns of being hijacked. Like I said earlier I never thought it would happen to me, but it did and I can’t describe to you the sick feeling I’ve had since I discovered my site was hijacked. Act now or act later it’s up to you.
Now it's past midnigt here, so i don't really have time to write a lot, other than that japanese is right in most of the stuff about how this works. There are some minor tech details and some wording issues, but by and large he's got it 100% right.
(please don't use so many capital letters, though, it hurt my eyes)
Also, IncrediBILL is right above - there are legitimate referrers and it would not be good banning those, as that way nobody could link to you. It's a very tricky situation and there's no apparent solution that the individual webmaster can use. I do know my .htaccess stuff ;)
It must be fixed by Google - they are the ones that have created this problem by indexing "pages" that don't exist [1]. So, they should fix it.
Before i'm off to bed, i will just point to this thread i started in May 2004: What about those redirects, copies and mirrors? [webmasterworld.com]
I have been following this and related issues since sometime in 2003, and they have not been fixed yet.
---
[1] Note: You're right here as well, japanese. I have used that term myself before as well. I'll post the specifics later.
Does anyone know if G has a problem indexing Doublely 301 redirected pages?
MikeNoLastName: You may be confusing one problem for another. I don't think the redirect from domain.com to www.domain.com is going to cause any problems. However, the permanent redirect to newdomain.com (assuming newdomain.com is really a brand new, just registered domain) is probably going to cause problems with the so-called 'sandbox' problem.
The sandbox problem prevents new sites from ranking for their terms, even though the site may be in the index. Your permanent redirect is essentially saying "dont go do olddomain.com/page.htm anymore, since it's moved permanently to newdomain.com/page.htm". Google removes oldomain.com/page.htm from the index, spiders newdomain.com/page.htm, and then puts it in the sandbox.
Matt Cutts had said at one of the conferences that Googlebot can follow multiple redirects fairly well, as long as they aren't four or five levels deep. I think you've got the sandbox working against you.
The only good script is a script that bears the hallmark of google’s approval.
I never asked google to approve my redirects, didn't know they needed to be consulted on this.
Alexa must disclose whether their redirecting method is indeed been approved by google.
I'm at a loss for words, hard to think while cachinnating
Many thanks for joining this thread. Your input will be much valued here.
I tried to join webmasterworld with the nickname of altavista but a .htaccess file called Brett presented me with a 403 access denied. So I changed my nickname to Japanese and was presented with a 200. I hope I am not seen here as a Trojan and marked for deletion..
I have read some fabulous posts in many different threads here but this problem with googlebots ability to generate pages when it encounters particular 302 redirects is probably going to dwarf the sandbox issue.
I think that it already has and more and more people are seeing their sites collapse into total oblivion at great expense to them. Very little reference is ever made to any other search engine. Google is the internet and many loyal fans are not happy.
I suggest we think in terms of how to arm ourselves as webmasters to prevent this happening on a local level to each and everyone of us personally.
Last night I had a friend of mine who is a programmer come over and we talked about a whole range of solutions which we will begin testing tonight, all of which can be implemented by a webmaster either on your site or server.
I think it is fundamental that we share ideas regardless of how silly they may seem.
In my particular case it is not just 1 web site that has a redirect pointed at 1 of my web sites, it is a network of web sites (at last count 50) that are deliberately using hidden redirects - even to the point of changing the statis bar to appear to point at the sites whose content they are stealing - in order to take advantage of the many hours of optimization so they can rocket to the top for every conceivable search possible.
In most of the cases they have their throwaway domains all on 1 server with similar IP addresses, I know their names and the name of who has the server. Can this information be used to "report" them or "denounce" them so they can no longer do so?
For example, could there be a "black list" (either by IP or or registrant) which can be verified by Google?
As far as a localized solution is concerned, how about implementing a script on our web sites that goes something like this:
If referral comes from X, or if referral string is longer than Y return 404. The reason why length of referral string is important is because the hijackers are using a long string to hide the redirect from the spider.
Actually what we are considering is returning the exact copy of the referrer string so that it creates a loop, but I'm not sure what effect it will have on the server I use.
googlebot doesn't carry referrers
Stargeek are you suggesting the script wouldn't work for Google? I think you are right. Google will continue to index my content as if it were the hijacker's content. The idea of the script however is to make it so the users have problems with the hijacker's sites, making it a problem for them to have a redirect to my site.
During the weekend, I sent an e-mail to a spammer:
"Spammer:"
"Please eliminate ASAP the following link:
spammer.com/cgi-bin/se/go.cgi?id=MY_SITE"
"Otherwise, I'll contact Google to report the address
spammer.com"
"You have 48 hours to eliminate the link."
Well, all links to MY_SITE were eliminated within the 48 hours.
Nevertheless, it's good to know Google does take action against spammers via reports.
Google rocks!