homepage Welcome to WebmasterWorld Guest from 23.23.8.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
A suggestion of a new meta tag to prevent page hijacking
A new meta tag to prevent page hijacking
Pirates



 
Msg#: 3103952 posted 2:54 am on Oct 1, 2006 (gmt 0)

<meta name="pages" content="pages on this site=(number of pages on site)" />

This would of course only be necessary for sites with a large amount of content and would require monitoring of content especially with forums so that each post updated the page amount in a txt file that would be included in the meta tag. Search engines could then allow a small variation before alerting themselves to possible spam activity.

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3103952 posted 5:51 am on Oct 1, 2006 (gmt 0)

I can see how you're thinking -- but there is no technical definition for "pages".

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3103952 posted 11:01 am on Oct 1, 2006 (gmt 0)

If Google treated all redirects that cross domains simply as links, the issue of page-hijacking would be solved at little or no functional cost.

One of the problems with big corporations is that it can take months or years to fix problems - even when the solution is trivial. On the other hand, when a user reports a bug in any software that I've written, I usually fix it within 48 hours.

Kaled.

Pirates



 
Msg#: 3103952 posted 1:36 am on Oct 2, 2006 (gmt 0)

but there is no technical definition for "pages".

Perhaps there is an existing and recognised meta tag that could be used for this purpose that would be preferable as wouldn't want to create one as this would by these idiots be seen as a victory in itself..... Any suggestions


If Google treated all redirects that cross domains simply as links, the issue of page-hijacking would be solved at little or no functional cost.

Although this would be quite hilarious to see and I love the idea I think long term it would lead to more spam with loads of sites created and as soon as they hit a result 301 or 302 to a seo target site page, ( to be honest I am seeing this technique already).

I think there does need to be a mistrust of 301 and 302, that said if the page count I suggested in meta doesn't match (with a variable) the count of pages the search engine has maybe this could be a trigger to mistrust 301 and 302.

Also many sites affected maybe do not use 301 or 302, perhaps there could be a way of telling the search engine in meta or robots.txt to ignore any redirects.

AlgorithmGuy



 
Msg#: 3103952 posted 1:40 am on Oct 2, 2006 (gmt 0)

If Google treated all redirects that cross domains simply as links, the issue of page-hijacking would be solved at little or no functional cost.

Kaled,

How would your proposition protect against a proxy type redirect?

No content of that redirect would exist, but the target page will have content and the deepcrawl bot is out to get info regarding the generated link. No 404 or error page will confront the bot. It will be directed to the contents of the target site.

Your suggestion at least offers a halt to the bot when it reaches the vulnerable contents. Can you explain what you have in mind?

The best way to prevent this hijack is to detect and halt googlebots that have arrived via a redirect.

Yes a very simple solution could exist. All it takes is for google to get its bots to declare they have arrived via a redirect, and that it must obey the target sites meta whether it will allow or not. Bots arriving via a direct link would gain access.
.

Pirates



 
Msg#: 3103952 posted 2:03 am on Oct 2, 2006 (gmt 0)

Just thinking aloud here. Ok everyone balls up from time to time. Many webmaster know they've done it and are faced with a choice, 301 302 or 404 on the page left. But what if there was another choice. A type of redirect that said "follow this for the purpose of web users and architecture of the internet but ignore from search engine results". Just an idea but prefer first method I suggested.

CainIV

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3103952 posted 6:10 am on Oct 2, 2006 (gmt 0)

What do MSN and Yahoo use to apparently bypass this issue?

AlgorithmGuy



 
Msg#: 3103952 posted 9:00 am on Oct 2, 2006 (gmt 0)

A type of redirect that said "follow this for the purpose of web users and architecture of the internet but ignore from search engine results". Just an idea but prefer first method I suggested.

Pirates,

Unfortunately a lot of good suggestions have been ignored by google.

And believe it or not, this is a canonical issue.

Prevention of proxy IP's visiting your website is a very good idea. This is one of the most important things a webmaster should think about. Allowing proxy browsers to your URL can bring your website to its knees in google. Into oblivion. Probably never to recover ranking status again.

There are many ways to create the conditions that google's method of allocating content (canonicolization) is susceptible to. In other words google uses links that do not have content to apportion content to them. No evidence exists that google has used an existing site to alter that to contain contents of another website. It is always to do with an empty URL. Often a php, cgi, asp, proxy methods of serverside redirect, or html equivalent.

For instance, your website has content in your index page. A redirect has no index or content. Let us assume this is a temporary redirect. Google now has two options. To get the content for the redirect, thereby granting the redirect as the better canonical representative of the contents or to make no change and not apportion the contents to the redirect because it has deemed the site with the contents the better representative of the contents.

I don't believe any webmaster has intentionally achieved a hijacking of any website. This process is entirely due to google and its canonicalization process.

You can indeed create the conditions google needs but you cannot influence a result. You would have to know what criterion and process google determines in order to deliberately hijack a target website.

A mass of targeted sites may result in a hijack or two by maybe 1 in a thousand and no finger would have picked that out other than google deeming that site to be not a canonical worthy website to represent its contents.

Google says that another webmaster cannot harm another websites ranking. This is absurd.

You can indeed tank or elevate another websites rankings. You can bring down a competitor very easily, not overnight, but over a period of time.

Many, many websites have tanked in google because a surfer has visited their website with a proxy browser etc.

I've seen quite a few sites tank in this manner. If you think that you have resolved your website regarding duplicate content, think again because you have not. A visitor via proxy can create duplicate content of your website, so can any inbound temporary redirect. In 99.99% of cases, it is unintentional.

If google deems the residue that can be left behind a proxy to be a better canonical representative of your index or internal page visited, or the temporary redirect as better, then duplication of your website in another URL is unavoidable.

Deepcrawl googlebots do not carry a referer. I think they arrive directly from google instructed to get contents for a single to many links. And this can only be done after harvesting bots have informed google of the residue left by proxies and temporary redirects found in pages of the internet.

And this is very difficult to recover from.

MSN created many duplications and probably now found a way to improve matters. Yahoo came up with a working idea but not a solution. It looks as though it sorted it out but only by virtue that the duplication is not there and we have no idea how the final rankings in yahoo are deternined.

It really is crazy when you think of millions of webmasters spending days, weeks and months to resolve there websites only to find that yet more sinister ways exist that their websites can be duplicated.

Imagine the horror if an unethical webmaster did a sitemap of your entire site and added a proxy redirect before your URL. Every single page of your website is now at the mercy of the proxy. If that webmaster then used an automated software to submit all your pages to thousands of directories and link farms, the possibilities or your website tanking becomes very possible indeed. That webmaster may also be able to submit that deadly sitemap to google with all the pages in it. I'm not sure if google would not accept the sitemap because of the URL's looking they belong to the proxy webmaster.

.

[edited by: AlgorithmGuy at 9:37 am (utc) on Oct. 2, 2006]

AlgorithmGuy



 
Msg#: 3103952 posted 10:10 am on Oct 2, 2006 (gmt 0)

I wonder if a browser expert is avialable to help us out here.

I'm not too familiar with browsers, but.

Does a typical simple proxy browser actually make all the requests from a target server, then reproduce the contents on its very own server for viewing by the agent that requested the target site? Upon making a dynamic page for the user of the proxy, a redirect has to be made in order that the end user sees the contents.

If this is the case, this will indeed involve a severside temporary redirect.

This would also make the visited site vulnerable regarding duplication in google. Especially if the proxy URL is a high pagerank website. Since some of these proxies are a free service they may have many one way inbound links that give them a high pagerank.

I'm not 100% sure of the process, but this sounds very deadly to even a website that the webmaster has shed blood sweat and tears to resolve. Because his site can indeed be duplicated by simply applying the links involved to gain access via a proxy to google.

Google will see exactly what the end user sees via a temporary redirect. Making the target website a temporary holder of the contents that in turn tells google that the proxy redirect is the canonical owner of the targets sites contents.

EXAMPLE
[exampleproxy.com...]

The above will certainly create a temporary header to google's index page. No matter how one looks at it, it is a deadly method to access information based on google being able to hijack websites via this kind of proxy surfing.

To put it in laymans terms, an end user can unknowingly destroy hundreds of websites in his/her wake by simply using a proxy browser.

It could be your website. Even after you resolved it. This is not something a webmaster can afford to ignore.

That proxy procedure above can cause google to hijack a website and tank it into oblivion.

.

[edited by: AlgorithmGuy at 10:21 am (utc) on Oct. 2, 2006]

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3103952 posted 11:50 am on Oct 2, 2006 (gmt 0)

Guys, you're overthinking the problem...

As a programmer, I am stating as a fact that it's an easy fix.

Google, like any search engine, indexes urls (or URIs if you prefer). The problem arises because Google engineers are confused about what to do when an url points to another url (a redirect). The solution is ultrasimple... treat it as a simple html page with a single link on it. Do this and page-hijacking becomes utterly impossible.

However, http purists talk about temporary and permanent redirects and all sorts of other things and generally get themselves confused (along with other people). In a very few cases, the system I have proposed would result in pages being indexed under an url different to that desired by the webmaster, so, ideally, a means should be found to allow for this, but it is not essential.

Proxies and browsers, etc. are not relevant to the solution.

Kaled.

AlgorithmGuy



 
Msg#: 3103952 posted 12:34 pm on Oct 2, 2006 (gmt 0)

As a programmer, I am stating as a fact that it's an easy fix.

Kaled,

I agree with you very much.

Yep, your suggestion is a great idea.

But we pushed exactly what you are saying from "two years ago". We pushed it so much that it got buckled.

The only solution is for webmasters to be aware of how this anomily takes place. The more a webmaster knows, the better he can protect his website.

.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3103952 posted 1:48 pm on Oct 2, 2006 (gmt 0)

But that is what the 301 redirect does do already:

"Get the content from that URL over there... and index it using that URL over there too. Ignore this URL completely."

Supplemental Results for URLs that are redirects only show up in search results becuae the URL used to directly deliver that content but now no longer does so.

.

The problem with proxies is that they serve the content at their own URL with a "200 OK" as if the content were at that location. Proxies do not issue a redirect. They GET the content from the other site, and serve it to the browser directly as if it were their own. A simple robots disallow on the proxy domain would keep those proxy URLs out of the SERPs.

AlgorithmGuy



 
Msg#: 3103952 posted 2:13 pm on Oct 2, 2006 (gmt 0)

The problem with proxies is that they serve the content at their own URL with a "200 OK" as if the content were at that location. Proxies do not issue a redirect. They GET the content from the other site, and serve it to the browser directly as if it were their own. A simple robots disallow on the proxy domain would keep those proxy URLs out of the SERPs.

gIsmd,

Point noted, but.

Some, many in fact, serve the page like you describe but not with a 200 OK, but with a 302 temporary.

The writer of the cgi script for instance has created the proxy serverside software to give a 302.

That would be a redirect pointed at your site if google is presented with the link.

Although the contents are now being presented by the proxy, that content belongs to the site visited by the proxy. Your browser is given a 302 in the example I put forward in order for that contents to be viewed. That is exactly how googlebot will see it, a 302.

In this case, googlebot may never actually visit the target site, but its contents can be accessed.

We need a browser expert on this.

.

[edited by: AlgorithmGuy at 2:16 pm (utc) on Oct. 2, 2006]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved