Forum Moderators: open
GoogleGuy, how are your "automatic algorithmic tests for JS parsing [webmasterworld.com]" going?
seems like hidden text and hidden links is the next thing they want to combat at Google: [webmasterworld.com...]
What suprises me is that it seemingly is so difficult to automatically check on these redirects. I'm no spider/computer wizard, but how difficult would it be to let the spider do a recheck of the source code X milliseconds after the initial spidering of the url?
Ok, it would slow down spidering, but I suppose these sneaky redirects only occur with competitive queries?
Anyone?
Previous thread about sneaky client-side redirect spam in Google's index: [webmasterworld.com...]
vita,
seems like hidden text and hidden links is the next thing they want to combat at GoogleWell, that's good news: 99% of the doorway pages that I've seen using sneaky client side redirects also contain hidden text and/or links, so I hope that Google will soon be able to get rid of this type of spam one way or the other.
About checking what content is served after a client-side redirect, I'm afraid a crawler cannot do that unless it can understand JavaScript.
This is the sort of reason why google needs to have actual people who's job is to take the top 1000 terms each month and actually surf the results pages(on different IPs from google standard, perhpas AOL?), checking the individual sites against the cached results pages.
I think that would get rid of alot of spam very quickly. Once the top 1000 are cleared up, keep moving down the list. 2 or 3 people full time could eliminate much more spam (and much more incentive to spam) than all the engineers in the world can manage, IMHO.
Alex
Ps: Thanks ciml.
With a 3 billion page index, I doubt that human scrutiny might be a viable solution. I believe that stronger hidden content detection algorithms could make a difference instead.
vince,
If that is a client-side redirect that you are seeing, then you should be able to view the redirecting page's source by disabling JavaScript in your browser and/or using SamSpade Safe Browser [samspade.org].
HTTP/1.1 302 Found
Date: Sat, 12 Apr 2003 15:10:06 GMT
Server: IBM_HTTP_Server/1.3.12.6 Apache/1.3.12 (Unix)
Location: [[TOS].com...]
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
12e
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://www.[TOS]./?storeId=10001&catalogId=-11&langId=-11">here</A>.<P>
[edited by: Brett_Tabke at 3:41 pm (utc) on April 12, 2003]
How difficult? That shouldn't bee too difficult: not long ago, one of our members said that even he or she could write a parser for Google that would "filter 99% of the JS redirects used on the net" [webmasterworld.com]. :-)
However, Google's engineers may have noticed that nearly every page using sneaky client-side redirect also include hidden texts/links, so they might have chosen to improve their hidden content detection algos first.
Let me just say it once again: I really hope that Google can get rid of this spam soon. It's a real shame to see that this kind of stupid SEO tricks are still successful in 2003. Things like that make the quality and usefulness of Google's results drop dramatically, besides disrupting the user experience.
Web site owners, please read carefully: search engine users are sick and tired of getting redirected to a page that does not match Google's description! Don't waste your money on SEO companies employing annoying tricks like hidden links, doorway pages and sneaky redirects.
<added>It's easy to check whether a certain page employs UserAgent cloaking (since anyone can spoof their UA). But if the page is using IP address cloaking, then I'm afraid only Google can tell. You might want to monitor that URL until the next Google update to check if the new SERPs still have a description that does not match the actual page's content.</added>
[edited by: Giacomo at 3:46 pm (utc) on April 12, 2003]
Because we are all good at spotting google bots, it doesn't take a scholar to write a bot feeder / redirector to make sure you SERPs are good and your surfers get something else. As long as googlebot and it's employees stand out like sore thumbs, they will get trapped by this stuff.
Alex
Google has merged my #1 search term listing
with a link from another domain that redirected to our site's home page during the crawl. How and why
google merged them and gave preference to this
other domain link I have no clue.
But now that the update is over, this link is redirecting
(server side) to some other site that has nothing to do with us.
As a side effect, because our www.<domain>.com listing
was replaced by this other link as the primary
link associated to our home page content, our google directory listing was moved to near the bottom.
Im assuming this was done because our pagerank value
is no longer displayed next to our link in the directory.
This is pretty frustrating to say the least.
I have filled out a spam report, so im awaiting to
see if it gets resolved.
So what im surmising is that someone with a better ranking domain that redirects to your site during the crawl can effectively 'hijack' your listing if google merges the 2 listings and gives preference to the redirect link.
Really impressive. I had never seen anything of that kind, and I still can't believe that someone was able to "steal your position" in the SERPs that easily. What they did is, basically, set up a 301 or 302 redirect to your web site's home page, and get their URL indexed with your content.
This is kind of weird, because I have always thought that Google would index the destination (target) URL of a server-side redirect, not the redirecting URL itself.
A glitch on Google's part?
They might also have triggered some kind of dupe content penalty that pushed you down in the SERPs. Not too sure about that, I'd like to know what GoogleGuy thinks of this.
If I were you, anyway, I would not only submit a spam report, but I would also send a detailed email to Google and consider some type of legal action against the "hijackers".
I was still asking myself how these folks managed to get their URL associated with your web site in Google's index... I have reached the conclusion that, instead of using a redirect, they must have pointed their URL directly to your IP address via DNS.
Should we call this "reverse domain name hijacking"? or maybe "IP address hijacking"?
Heads up, everybody. Better pay more attention to those referral logs from now on...
Quite scary it is. A quick way to prevent IP address hijacking is comparing the SERVER_NAME variable with the actual domain name that is supposed to be associated with your web site's IP address. If the two strings do not match, you can issue a 404, 301, whatever. This can be done very easily with both ASP and PHP.
crumpeta,
How about contacting directly the hijackers' netblock owner? Check your sticky for details.
Thanks for looking into this Giacomo, I appreciate it. Whats more bothersome is that it seems that anyone
can do this to any term on any site as long as their redirecting page has more "value" ( according to the
google algo ) than the destination page.
I understand that there are a lot of legit reasons
for google to spider and track redirects on their db.
But why in the world would they merge AND give preference
to the redirecting page *from another domain*?
What sense does that make?
I have no control who redirects to me, just like I
have no control who links to me.
I am going to resubmit a detailed spam report with my nickname and the keywords "WebmasterWorld" and "GoogleGuy". Many thanks.
<added>Spam report submitted via online form at 09:03 UTC.</added>
Crumpeta,
I took a closer look at your case, and it appears that all URLs are now associated with the same IP address. It also appears that most of them (although I have not checked them all) are 302-redirecting to a referral tracker which, in turn, meta-refreshes to the destination web site. (More details via sticky).
Could it be that those links belong to some marketing/affiliate program that has somehow run out of hand? Just my $0.02.