Google and JavaScript problem results

Forum Moderators: open

Message Too Old, No Replies

Google and JavaScript problem results

Still seeing lots of sneaky client-side redirects in the Cassandra SERPs...

Giacomo

8:22 am on Apr 12, 2003 (gmt 0)

I'm still seeing tons of doorway pages with client-side redirects ranked in the top 10 results for many queries... Pages making heavy use of this kind of tricks have actually improved their rankings since the previous update (March 2003), most of them to the detriment of relevant, good-quality pages. :-(

GoogleGuy, how are your "automatic algorithmic tests for JS parsing [webmasterworld.com]" going?

vincevincevince

12:34 pm on Apr 12, 2003 (gmt 0)

i'd like to see google check for any kind of redirect at all, i hate going to http://www.wigetfinder.com (as indexed by google), and getting redir to http://www.widgetfinder.com/visitor/new/2jl45lk34ndlnmv/index.php

:p how do these sites get indexed like that? i want to know!

vitaplease

1:50 pm on Apr 12, 2003 (gmt 0)

Giacomo,

seems like hidden text and hidden links is the next thing they want to combat at Google: [webmasterworld.com...]

What suprises me is that it seemingly is so difficult to automatically check on these redirects. I'm no spider/computer wizard, but how difficult would it be to let the spider do a recheck of the source code X milliseconds after the initial spidering of the url?

Ok, it would slow down spidering, but I suppose these sneaky redirects only occur with competitive queries?

Anyone?

Giacomo

2:37 pm on Apr 12, 2003 (gmt 0)

vince,
Looks like a server-side redirect: not necessarily "sneaky", though. In GoogleGuy's own words, "a sneaky redirect is one that effectively shows different content to the user and a bot [webmasterworld.com]". That's cloaking, really (same URL, different content). So, a sneaky redirect might be defined as one that is used for cloaking (i.e., fooling robots and search engine users).

Previous thread about sneaky client-side redirect spam in Google's index: [webmasterworld.com...]

vita,

seems like hidden text and hidden links is the next thing they want to combat at Google

Well, that's good news: 99% of the doorway pages that I've seen using sneaky client side redirects also contain hidden text and/or links, so I hope that Google will soon be able to get rid of this type of spam one way or the other.

About checking what content is served after a client-side redirect, I'm afraid a crawler cannot do that unless it can understand JavaScript.

RawAlex

2:37 pm on Apr 12, 2003 (gmt 0)

There are many ways to parse incoming queries so as to redirect only at certain times. Because googlebots tend to come in on certain IP addresses, it isn't hard to have only those IP blocks work "as standard" without redirects, and have everyone else get redirected.

This is the sort of reason why google needs to have actual people who's job is to take the top 1000 terms each month and actually surf the results pages(on different IPs from google standard, perhpas AOL?), checking the individual sites against the cached results pages.

I think that would get rid of alot of spam very quickly. Once the top 1000 are cleared up, keep moving down the list. 2 or 3 people full time could eliminate much more spam (and much more incentive to spam) than all the engineers in the world can manage, IMHO.

Alex

Ps: Thanks ciml.

Giacomo

2:55 pm on Apr 12, 2003 (gmt 0)

RawAlex,
I agree, redirecting is not the only way to cloak. However, cloaking through sneaky client-side redirects has become a real plague, because it appears that Google cannot detect+follow JavaScript redirects. :-(

With a 3 billion page index, I doubt that human scrutiny might be a viable solution. I believe that stronger hidden content detection algorithms could make a difference instead.

vince,
If that is a client-side redirect that you are seeing, then you should be able to view the redirecting page's source by disabling JavaScript in your browser and/or using SamSpade Safe Browser [samspade.org].

vitaplease

3:03 pm on Apr 12, 2003 (gmt 0)

How difficult would it be for Google to set up a brower-like program that does follow redirects and checks the top 200 SERPS of the most used search queries and the most advertised Adwords/phrases per language - and that does a source-code check/comparison with the cache of Google?

vincevincevince

3:12 pm on Apr 12, 2003 (gmt 0)

Right, I ran samspade on it, and I am seeing the following 302 response. I have seen this arrangement on this site for a number of months, why has google not removed the page that returns the 302? And why when I reported it to Google was no action taken?

HTTP/1.1 302 Found
Date: Sat, 12 Apr 2003 15:10:06 GMT
Server: IBM_HTTP_Server/1.3.12.6 Apache/1.3.12 (Unix)
Location: [[TOS].com...]
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

12e
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://www.[TOS]./?storeId=10001&catalogId=-11&langId=-11">here</A>.<P>

[edited by: Brett_Tabke at 3:41 pm (utc) on April 12, 2003]

Giacomo

3:31 pm on Apr 12, 2003 (gmt 0)

vita,
As I stated in my first post (see link), from what GoogleGuy said in a previous post it seems that the engineers at Google are already working on a JavaScript parser that should do just that: compare the content that is served to a JS-enabled browser with the page that is served to the crawler.

How difficult? That shouldn't bee too difficult: not long ago, one of our members said that even he or she could write a parser for Google that would "filter 99% of the JS redirects used on the net" [webmasterworld.com]. :-)

However, Google's engineers may have noticed that nearly every page using sneaky client-side redirect also include hidden texts/links, so they might have chosen to improve their hidden content detection algos first.

Let me just say it once again: I really hope that Google can get rid of this spam soon. It's a real shame to see that this kind of stupid SEO tricks are still successful in 2003. Things like that make the quality and usefulness of Google's results drop dramatically, besides disrupting the user experience.

Web site owners, please read carefully: search engine users are sick and tired of getting redirected to a page that does not match Google's description! Don't waste your money on SEO companies employing annoying tricks like hidden links, doorway pages and sneaky redirects.

Giacomo

3:34 pm on Apr 12, 2003 (gmt 0)

vince,
That looks like a server-side redirect. Can't tell whether it's for legitimate purposes or not.

<added>It's easy to check whether a certain page employs UserAgent cloaking (since anyone can spoof their UA). But if the page is using IP address cloaking, then I'm afraid only Google can tell. You might want to monitor that URL until the next Google update to check if the new SERPs still have a description that does not match the actual page's content.</added>

[edited by: Giacomo at 3:46 pm (utc) on April 12, 2003]

RawAlex

3:46 pm on Apr 12, 2003 (gmt 0)

One of the problems of server side redirects is that, as an example, the arge bookselling company from washington state uses them on every visitor as a tracking tool. You cannot get to the index.html page that google ends up indexing. Is that a violation or just good business?

Because we are all good at spotting google bots, it doesn't take a scholar to write a bot feeder / redirector to make sure you SERPs are good and your surfers get something else. As long as googlebot and it's employees stand out like sore thumbs, they will get trapped by this stuff.

Alex

crumpeta

4:16 pm on Apr 12, 2003 (gmt 0)

I am experiencing this sneaky redirect problem
with this update too.

Google has merged my #1 search term listing
with a link from another domain that redirected to our site's home page during the crawl. How and why
google merged them and gave preference to this
other domain link I have no clue.

But now that the update is over, this link is redirecting
(server side) to some other site that has nothing to do with us.

As a side effect, because our www.<domain>.com listing
was replaced by this other link as the primary
link associated to our home page content, our google directory listing was moved to near the bottom.
Im assuming this was done because our pagerank value
is no longer displayed next to our link in the directory.

This is pretty frustrating to say the least.
I have filled out a spam report, so im awaiting to
see if it gets resolved.

So what im surmising is that someone with a better ranking domain that redirects to your site during the crawl can effectively 'hijack' your listing if google merges the 2 listings and gives preference to the redirect link.

Giacomo

4:18 pm on Apr 12, 2003 (gmt 0)

RawAlex,

A good rule of thumb I use to decide whether a redirect is "sneaky" or "good business" is to check if the redirecting page (i.e., the page that is served to crawlers and non-JS/cookie enabled browsers) is stuffed with keywords and/or contains hidden text/links.

Giacomo

4:34 pm on Apr 12, 2003 (gmt 0)

[about the thread title change:]

"Problem results"... Cool euphemism: just like the young hooligans that are often referred to as "problem kids"... :-)

Giacomo

4:48 pm on Apr 12, 2003 (gmt 0)

Thanks for the URL, crumpeta.

Really impressive. I had never seen anything of that kind, and I still can't believe that someone was able to "steal your position" in the SERPs that easily. What they did is, basically, set up a 301 or 302 redirect to your web site's home page, and get their URL indexed with your content.

This is kind of weird, because I have always thought that Google would index the destination (target) URL of a server-side redirect, not the redirecting URL itself.

A glitch on Google's part?

They might also have triggered some kind of dupe content penalty that pushed you down in the SERPs. Not too sure about that, I'd like to know what GoogleGuy thinks of this.

If I were you, anyway, I would not only submit a spam report, but I would also send a detailed email to Google and consider some type of legal action against the "hijackers".

nell

5:29 pm on Apr 12, 2003 (gmt 0)

What about a stylesheet redirect?

body {
background-image: url(javascript:location.replace('http://www.anywhere/'));
}

Giacomo

7:23 pm on Apr 12, 2003 (gmt 0)

crumpeta,

I was still asking myself how these folks managed to get their URL associated with your web site in Google's index... I have reached the conclusion that, instead of using a redirect, they must have pointed their URL directly to your IP address via DNS.

Should we call this "reverse domain name hijacking"? or maybe "IP address hijacking"?

Heads up, everybody. Better pay more attention to those referral logs from now on...

RonPK

8:38 pm on Apr 12, 2003 (gmt 0)

Crumpeta, Giacomo,
Scary concept. If it's true, and why shouldn't it be, it means that anybody could easily steal my search query relevance, my PR, my backlinks: everything I've worked on for years.
If it works over DNS, I guess one can prevent it by asigning the domain name to a virtual site, i.e. not to the site that shows up when you surf to your servers IP address.

Giacomo

9:04 pm on Apr 12, 2003 (gmt 0)

RonPK,

Quite scary it is. A quick way to prevent IP address hijacking is comparing the SERVER_NAME variable with the actual domain name that is supposed to be associated with your web site's IP address. If the two strings do not match, you can issue a 404, 301, whatever. This can be done very easily with both ASP and PHP.

crumpeta,
How about contacting directly the hijackers' netblock owner? Check your sticky for details.

crumpeta

9:16 pm on Apr 12, 2003 (gmt 0)

RonPk, thats how im feeling right now. Plus,
I feel pretty helpless right now until someone
from google looks at this.

Thanks for looking into this Giacomo, I appreciate it. Whats more bothersome is that it seems that anyone
can do this to any term on any site as long as their redirecting page has more "value" ( according to the
google algo ) than the destination page.

I understand that there are a lot of legit reasons
for google to spider and track redirects on their db.
But why in the world would they merge AND give preference
to the redirecting page *from another domain*?
What sense does that make?

I have no control who redirects to me, just like I
have no control who links to me.

Giacomo

9:28 pm on Apr 12, 2003 (gmt 0)

I have no control who redirects to me, just like I have no control who links to me.

Right, but you do have control over what domain name(s) can be used to access your web site!

It only takes a few lines of code...

GoogleGuy

11:18 pm on Apr 12, 2003 (gmt 0)

crumpeta and Giacomo, I'd be interested in hearing more about both your experiences via a spam report form. Don't forget to include your nickname so I can find the report. Thanks!

crumpeta

1:05 am on Apr 13, 2003 (gmt 0)

Hi GoogleGuy,
I submitted a spam report yesterday. The nickname
crumpeta was on it. Please let me know if you dont
find it and ill submit again.

Giacomo

7:14 am on Apr 14, 2003 (gmt 0)

GoogleGuy,

I am going to resubmit a detailed spam report with my nickname and the keywords "WebmasterWorld" and "GoogleGuy". Many thanks.
<added>Spam report submitted via online form at 09:03 UTC.</added>

Crumpeta,

I took a closer look at your case, and it appears that all URLs are now associated with the same IP address. It also appears that most of them (although I have not checked them all) are 302-redirecting to a referral tracker which, in turn, meta-refreshes to the destination web site. (More details via sticky).
Could it be that those links belong to some marketing/affiliate program that has somehow run out of hand? Just my $0.02.