Lost in Google

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Lost in Google

rshandy

6:39 pm on Feb 27, 2005 (gmt 0)

My site has been in 1st page serps for many years. Just a few weeks ago, my listing went from a title display and decription to just this:

www.mydomain.com/
Similar pages

In addition, my Yahoo listing disappeared as well. I then did a Yahoo search for pages with my domain included and found most of my interior pages indexed but not my home page.

What happened? Is it possible my site was not ready to be crawled when Googlebot and Slurp robots visied my site - simultaneously?

This is a very "white hat" site - no tricks at all, just good content...

I went and manually requested my site be spidered on both Google and Yahoo, and sent an email to Yahoo requesting any explanation as well.

Is there anything else I can do? Any ideas of why this happened?

japanese

7:01 pm on Mar 8, 2005 (gmt 0)

incrediBILL

Let us see how much you will laugh by placing the address of your main website here, in this forum.

This time your pain will be associated with the grief of your site, not from laughter.

You fail to see the wood from the trees. No contribution and zero input into this problem other than a tumultuous burst of vilification.

Please place your site URL here, in this forum. I will place a few links pointing to your site, it should give you an increase in pagerank, right?

Let us see how much you will stick to your belief regarding the comments you made.

stargeek

7:36 pm on Mar 8, 2005 (gmt 0)

As for sending referrers when spidering a page, that doesn't really make sense. If you have 5000 inbound links, do you really want Google to request the same page 5000 times with different referrers? Robots aren't browsers and they generally don't follow links. They just add the linked-to URL to their database (if it's not already there) and it'll get spidered along with all the rest at some future date. The problem is that Google is associating the content returned for the target URL with that URL and all URLs that link to it, rather than simply recording the fact that the redirect URLs are redirects and ignoring them when generating search results.

Perhaps the Request URI can be checked on your site, if it does not match the page you serve a blank page, this could work, I'm going to go check if google actually sends the wrong request URI with hijacked pages.

japanese

7:43 pm on Mar 8, 2005 (gmt 0)

gmiller,

Are you suggesting that a "referrer" only when googlebot has detected your site via a 302 redirect, is a bad idea?

Please explain again why you think it is a bad way forward.

I don't think a site has the right to pump 302 directives to a site that does not want them. Who wrote the scripts? Was it a computer whiz kid, is the script googlebot friendly? what else does the script do? is a meta refresh generated during this process?. I beg you to answer that one especially. Where you aware of that? And are you aware of the residue effect��I won�t explain that one, Please let us know your version.

So, really, you are saying that anybody can create a manipulative script, say a variant of NukeModule GO-PHP that has been modified to hijack your site through an intricate deceptive process that many people do not understand, and I doubt you do, as being OK to live with and we just sit back and watch our sites disappear into oblivion in results? Until google sorts it out, right?

We are trying to help find a solution, action, not speculative gestures, is what is needed.

Go2

8:04 pm on Mar 8, 2005 (gmt 0)

Google must also allow the target site to remove the offending redirect. It can easily be done, very easily indeed and this may help thousands of people who have lost their hard earned cash and websites.

I don't know if this has been suggested before but perhaps the robots.txt directive could be extended to prevent crawling from a redirect. For example:


User-agent: * 
Disallow: /do 
Disallow-redirect: /

would disallow the do directory for crawlers following regular links and disallow the entire site if the crawler comes from a redirect.

A robots.txt based redirection filter would probably be easier to use for many webmasters since not everyone is using Apache or php on the server.

incrediBILL

8:17 pm on Mar 8, 2005 (gmt 0)

No contribution and zero input into this problem other than a tumultuous burst of vilification.

No contribution?

I explained to you that many large legitimate sites like search engines and alexa redirect to monitor actual click-thrus, other than Google all the rest seem to redirect.

I'm sorry if that blows your mind.

jddux

8:19 pm on Mar 8, 2005 (gmt 0)

I just added the domains of URLs redirecting to my site in the .htaccess file. Will have to see if it fixes my problems.

Trawler

8:39 pm on Mar 8, 2005 (gmt 0)

Question about Google removal tool. subdomain example anyone.ourdomain.com

Because of an error on my part with a subdomain that I set up, google indexed 366 files under the subdomain. A few weeks ago I used the removal tool and submitted a request to de-index items at anyone.ourdomain.com/ Got this reply today.

"Thank you for your reply. This message is to notify you that anyone.ourdomain.com/ has been removed from the Google index. You do not need to take any further action. Please let us know if you have additional questions or concerns."

Thing is, as of 1 hr ago all 366 entries are still showing in the index when doing site:anyone.ourdomain.com

How long does it usally take before the files are washed out of the index and will not show up under site:anyone.ourdomain.com
And, will all entires be removed or just the index or default page for anyone.ourdomain.com/

The domain is now inactive in the DNS

If anyone knows please share.

Trawler

Bobby

8:51 pm on Mar 8, 2005 (gmt 0)

Go2 that's an interesting idea, but it seems to me the problem is that Google is being fooled into not know that it is being redirected so I don't think it would work.

In any event the robots.txt should look more like this I believe:

User-agent: Googlebot
Disallow: /referrals

zeus

9:05 pm on Mar 8, 2005 (gmt 0)

I think Bobby is right here, I tried in the robots.txt with adding the last of the urls that where redirecting to me and activated the removal tool, a day late it said it was complete, but nothing has been done in the serps.

They rely should ban all redirecting sites and go from a clean www.domain.com as a index site and not ww.domain.comgo-php/blabla.., alwas look for ww.domain.com and go from there.

Well as said I dont think can do anything more, then wait and hope the best or we should start doing this our selfs if it is realy that kind of sites google wants, I dont wait 4 month more thats for sure, then we will see what a real webmaster can do with bad tricks, but which obvious loves.

sad sad sad

Lorel

9:06 pm on Mar 8, 2005 (gmt 0)

BOBBY,

How do you make a copy of it with Google cache?
What kind of problems are you dealing with?

Use your Browser and do a "save as" it will save background colors, images, text, links and etc. You'll have to open in your browser to see it and anyone you send it to tell them to open in a browser.

The problems I was dealing with in that one instance was a client's competitor taking text right off my client's home page and put it on his own page or he would use that text in ads he was posting around the internet. Having documented proof with a date on it via a 3rd impartial party, proving who had the text up first convinced the sites/hosts that controled those pages/websites to take them down.

IncrediBILL

Your anxiety about Alex redirects is misguided as many search engines (like Yahoo, Lycos, Excite and others) and directory sites redirect to your site in order to count the traffic as people click on individual websites.

If they use a tracking script OK, but not a 302 redirect.

Gmiller:

Complaining about Alexa legitimately using 302 redirects is just silly.

Since when is posting a 302 redirect to my site silly?

japanese

9:21 pm on Mar 8, 2005 (gmt 0)

incrediBill,

You think that these directories and many search engines plainly and cleanly count clicks through their systems.

I think you must be living on another planet or in a utopian society where there is a total absence or crime inhabited by sinless altruists of divine proportions. And alexa and the likes revered as the proclaimed deities of proficiency.

Good job we are not talking about reciprocal link exchange, you would advocate that no link pages exist in robots denied pages, or java routed pages that bots cannot access etc. There are even more tricks out there than you may think.

The internet is a dirty playing field with no referees with many casualties that have nowhere to look other than to seek advice from us guys. We should help muster up something to read not muster up a mud fight like I am doing, sorry about that.

OK, Yes many engines like alexa redirect. So what do we have.

Let me tell you. A wild and uncontrolled mixture of deceptive redirects. And many of them cause googlebot problems because these scripts are deployed by anybody and everybody. Hell, I set up my own CGI pearl based that causes a very nasty 302. Done deliberately to retaliate if needed. Working on it I am sure I can get it to dynamically generate a ZERO second meta refresh as well to the targeted page.

Is the guy implementing the scripts at alexa in uniform to a particular directive regarding the serverside redirects?, and is his script compliant to INKTOMI, GOOGLEBOT, MSNBOT etc, etc. I certainly did not conform and I did it to cause a definitive 302 and I expect it to damage the target site I point to. I have never used it.

Do you know the person in alexa that oversees the scripting language that has been incorporated serverside to work the script directives. Do alexa describe anywhere that their 302 redirects are approved by google.

Why is alexa allowed to place a LOCATION PROTOCOL to your site in their server.

Google specifically indicates without a shadow of a doubt that they are against sneaky redirects. There are so many methods to create a 302.

The only good script is a script that bears the hallmark of google�s approval. Alexa must disclose whether their redirecting method is indeed been approved by google.

Look at it this way. It is googlebot that will be given the instruction on the server level. You simply cannot be sure that googlebot is not being manipulated at this critical point, whether intentionally or unintentionally. Yahoo had already defined an overview so are we to assume that the 2 bots are identical, why is there a difference then between them.

Chard

9:35 pm on Mar 8, 2005 (gmt 0)

Strange that there is this debate on the possible evils of 302 redirects, when there is a thread on this forum :

How to handle outbound links in a directory?

- thats discussing the merits of creating them

Is it me?

1milehgh80210

9:47 pm on Mar 8, 2005 (gmt 0)

Thinking about some non-technical remedy, or prevention of this problem...
The motivation$ for these sites seems to be either..
----------
1) adsense (notify google that hi-jacking sites are running adsense)...or contact adsense content advertisers and let them know where their ads are running!)
2) affiliate ads (contact merchants and let them know what their affiliate is doing)
3) other (in my case, a malicious network attempts drive-by install of spyware/virus)notified G,... hope they care?!
4) ignorance (contact offending site -unlikely-, or host)
5) malicious revenge (no remedy?)

just throwing out ideas-add your own!

zeus

10:15 pm on Mar 8, 2005 (gmt 0)

milehigh - I have begone to tell adsense, but until now no changes, but they did reply, so Im not sure if they care.

MikeNoLastName

11:21 pm on Mar 8, 2005 (gmt 0)

Getting back to the original topic of this thread...
Does anyone know if G has a problem indexing Doublely 301 redirected pages? Our problem may have started 2 weeks ago when we added code to our .htaccess to 'rewrite' 301 redirect xyz.com to www.xyz.com per G's recommendation (we suddenly started getting double indexing under xyz.com and www.xyz.com a few weeks ago on one domain after another change by G, which was hurting our rankings so we added the code to all domains as a pre-emptive measure). Well we also happen to already have a number of pages (including the home page) redirected to our new domain wxy.com. So someone trying to access [xyz.com...] is going to get 301 redirected to www.xyz.com then 301'd to www.wxy.com. Is that a no-no? What else could we do? And why would G make it affect the entire domain?
The code added looks sorta like this:
RewriteEngine On
RewriteCond %{HTTP_HOST}!www.xyz.com
RewriteRule (.*) [xyz.com...] [R=301,L]

Redirect 301 /index.htm [wxy.com...]
Redirect 301 /abcdefg.htm [wxy.com...]

P.S. is the way that about.com links using the following format:
zzzzzz.about.com/gi/dynamic/offsite.htm?site=http://www.xyz.com%2Fxyz.htm

Considered a BAD 302?

[edited by: MikeNoLastName at 11:46 pm (utc) on Mar. 8, 2005]

Newwebguy1

11:30 pm on Mar 8, 2005 (gmt 0)

All, after 18 months of owning a website spending most of my time building content I don�t claim to know an eighth of what most of you know. What I do know is my site has been hijacked by three different sites and I now receive less than 5% of the traffic I used to and it appears to be dropping by the day.

I�m amazed this isn�t the topic of every Web newsletter on the net this week, but it isn�t. We�ve all heard the old saying �that will never happen to me�. After seeing a friend go through this a few months ago I foolishly said �that will never happen to me�. Well it has and I can assure you if your website or sites are good which I suspect they are it will happen to you in the very near future if it hasn�t already.

The ball is in Goggle�s court to do something. We can suggest ideas all day long however
It�s up to them to act. There�s power in numbers so please take five minutes of your evening tonight to drop Google a quick e-mail telling them how your site has been ruined by hijackers or if you�re one of the lucky ones who hasn�t been hijacked yet your concerns of being hijacked. Like I said earlier I never thought it would happen to me, but it did and I can�t describe to you the sick feeling I�ve had since I discovered my site was hijacked. Act now or act later it�s up to you.

dazzlindonna

11:58 pm on Mar 8, 2005 (gmt 0)

newwebguy1, this issue has been brought to google's attention over and over and over and over (u get the picture) again. end result: same problem, nothing done.

claus

11:59 pm on Mar 8, 2005 (gmt 0)

Finally i managed to get through the whole thread. Good work and nice posts everyone :)

Now it's past midnigt here, so i don't really have time to write a lot, other than that japanese is right in most of the stuff about how this works. There are some minor tech details and some wording issues, but by and large he's got it 100% right.

(please don't use so many capital letters, though, it hurt my eyes)

Also, IncrediBILL is right above - there are legitimate referrers and it would not be good banning those, as that way nobody could link to you. It's a very tricky situation and there's no apparent solution that the individual webmaster can use. I do know my .htaccess stuff ;)

It must be fixed by Google - they are the ones that have created this problem by indexing "pages" that don't exist [1]. So, they should fix it.

Before i'm off to bed, i will just point to this thread i started in May 2004: What about those redirects, copies and mirrors? [webmasterworld.com]

I have been following this and related issues since sometime in 2003, and they have not been fixed yet.

---
[1] Note: You're right here as well, japanese. I have used that term myself before as well. I'll post the specifics later.

jonrichd

12:13 am on Mar 9, 2005 (gmt 0)

Does anyone know if G has a problem indexing Doublely 301 redirected pages?

MikeNoLastName: You may be confusing one problem for another. I don't think the redirect from domain.com to www.domain.com is going to cause any problems. However, the permanent redirect to newdomain.com (assuming newdomain.com is really a brand new, just registered domain) is probably going to cause problems with the so-called 'sandbox' problem.

The sandbox problem prevents new sites from ranking for their terms, even though the site may be in the index. Your permanent redirect is essentially saying "dont go do olddomain.com/page.htm anymore, since it's moved permanently to newdomain.com/page.htm". Google removes oldomain.com/page.htm from the index, spiders newdomain.com/page.htm, and then puts it in the sandbox.

Matt Cutts had said at one of the conferences that Googlebot can follow multiple redirects fairly well, as long as they aren't four or five levels deep. I think you've got the sandbox working against you.

incrediBILL

1:46 am on Mar 9, 2005 (gmt 0)

The only good script is a script that bears the hallmark of google�s approval.

I never asked google to approve my redirects, didn't know they needed to be consulted on this.

Alexa must disclose whether their redirecting method is indeed been approved by google.

I'm at a loss for words, hard to think while cachinnating

MikeNoLastName

1:51 am on Mar 9, 2005 (gmt 0)

Hi Johnrichd,
Thanks for the suggestion. No in this case both domains have been around for a very long time and both were PR5's (now wxy.com is a PR6). None of the pages which disappeared were redirected (in fact they were the ones which I specifically had NOT yet moved to the 'new' domain since they were ranking so well (all top #1-5 in their keywords. Now they aren't even in the index and GBot hasn't visited the domain since 2/28 despite re-submits. (Is everyone else getting an "about.htm?zzzz not found" error when trying to submit on G?)
It's looking more to me like all the pages which recently disappeared are the same ones that clk.about.com has recently started framing.

incrediBILL

2:06 am on Mar 9, 2005 (gmt 0)

MikeNoLastName.

Thanks for the heads-up on About.com, just dropped in a frame buster script on my site to stop that nonsense.

japanese

6:36 am on Mar 9, 2005 (gmt 0)

Claus,

Many thanks for joining this thread. Your input will be much valued here.

I tried to join webmasterworld with the nickname of altavista but a .htaccess file called Brett presented me with a 403 access denied. So I changed my nickname to Japanese and was presented with a 200. I hope I am not seen here as a Trojan and marked for deletion..

I have read some fabulous posts in many different threads here but this problem with googlebots ability to generate pages when it encounters particular 302 redirects is probably going to dwarf the sandbox issue.

I think that it already has and more and more people are seeing their sites collapse into total oblivion at great expense to them. Very little reference is ever made to any other search engine. Google is the internet and many loyal fans are not happy.

Bobby

7:25 am on Mar 9, 2005 (gmt 0)

While I am confident that Google is well aware of the problem, waiting around for them to fix it is just not an option for those whose businesses have been hit hard. If it were my bread and butter site that got hit I'd be swimming in the mortgage and really up the creek.

I suggest we think in terms of how to arm ourselves as webmasters to prevent this happening on a local level to each and everyone of us personally.

Last night I had a friend of mine who is a programmer come over and we talked about a whole range of solutions which we will begin testing tonight, all of which can be implemented by a webmaster either on your site or server.

I think it is fundamental that we share ideas regardless of how silly they may seem.

In my particular case it is not just 1 web site that has a redirect pointed at 1 of my web sites, it is a network of web sites (at last count 50) that are deliberately using hidden redirects - even to the point of changing the statis bar to appear to point at the sites whose content they are stealing - in order to take advantage of the many hours of optimization so they can rocket to the top for every conceivable search possible.

In most of the cases they have their throwaway domains all on 1 server with similar IP addresses, I know their names and the name of who has the server. Can this information be used to "report" them or "denounce" them so they can no longer do so?

For example, could there be a "black list" (either by IP or or registrant) which can be verified by Google?

As far as a localized solution is concerned, how about implementing a script on our web sites that goes something like this:

If referral comes from X, or if referral string is longer than Y return 404. The reason why length of referral string is important is because the hijackers are using a long string to hide the redirect from the spider.

Actually what we are considering is returning the exact copy of the referrer string so that it creates a loop, but I'm not sure what effect it will have on the server I use.

stargeek

7:30 am on Mar 9, 2005 (gmt 0)

googlebot doesn't carry referrers.
it merely adds each url it finds to a database and then goes and indexes those urls at a later date.

Bobby

7:49 am on Mar 9, 2005 (gmt 0)

googlebot doesn't carry referrers

Stargeek are you suggesting the script wouldn't work for Google? I think you are right. Google will continue to index my content as if it were the hijacker's content. The idea of the script however is to make it so the users have problems with the hijacker's sites, making it a problem for them to have a redirect to my site.

kwngian

8:31 am on Mar 9, 2005 (gmt 0)

Good news. Google is doing something about site scrappers. The network of scrapper sites that I was referring to only afew days ago has been removed, completely, not even in the index.

Finally Google is banning sites again. If they're gaming the system, then they'll have to pay the price.

1milehgh80210

8:59 am on Mar 9, 2005 (gmt 0)

Great news here too! The network of redirecting subdomains I reported to google only 2 days ago is totally Gone from their index. Also the pseudo 'search engine' that tied it all together.
These sites also no longer appear in my allinurl: search.
Apparently they DO read their spam reports! )

eyezshine

9:28 am on Mar 9, 2005 (gmt 0)

I reported some a week ago and they are still there.

zafile

9:32 am on Mar 9, 2005 (gmt 0)

1milehgh80210, pretty good news!

During the weekend, I sent an e-mail to a spammer:

"Spammer:"

"Please eliminate ASAP the following link:
spammer.com/cgi-bin/se/go.cgi?id=MY_SITE"

"Otherwise, I'll contact Google to report the address
spammer.com"

"You have 48 hours to eliminate the link."

Well, all links to MY_SITE were eliminated within the 48 hours.

Nevertheless, it's good to know Google does take action against spammers via reports.

Google rocks!

This 206 message thread spans 7 pages: 206

Lost in Google

rshandy

japanese

stargeek

japanese

Go2

incrediBILL

jddux

Trawler

Bobby

zeus

Lorel

japanese

Chard

1milehgh80210

zeus

MikeNoLastName

Newwebguy1

dazzlindonna

claus

jonrichd

incrediBILL

MikeNoLastName

incrediBILL

japanese

Bobby

stargeek

Bobby

kwngian

1milehgh80210

eyezshine

zafile

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week