Welcome to WebmasterWorld Guest from 54.226.62.251

Message Too Old, No Replies

A new method to steal PR?

faulty link that works

     

Lorel

3:57 pm on Aug 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I found a site linking to a client's site I manage in this manner.

[ClientExample.com...]

It looks like it should be a fauly link that doesn't work but it redirects to my client's site.

Here is another on the same page (wihout the slash) and they both work:

http://www.example.com?Hijacker.com

The server header checker shows this as a 200, normal link. All other outgoing links on the page appear to be normal.

Does the "?" mean this is run by software that runs a search?

jenkers

11:08 pm on Aug 7, 2006 (gmt 0)

10+ Year Member



I just found something very similar (posted a spam report to G immediately).

The page in question has an url in the format

http//www.example.com/nnnnnnn/querykeyword1,querykeyword2.php

where nnnnnn is a string of numbers, note the keywords are seperated in the url by a comma.

When you open the page up there is no content on the page apart from a context focussed overture ad.

If you look at the source of the page large snippets of text from the websites high in the serps for that query are jumbled and hidden in an iframe.

This site just jumped in at position 1 for this particular query I watch today.

Nikke

11:24 pm on Aug 7, 2006 (gmt 0)

10+ Year Member



Lorel,
Why do you think it is any kind of highjacking? I have used this method in the past when placing true links to partner sites that want a real link and an easy way to count incoming visists.

Since just about any server accepts an empty variable after a question mark, the link will work and return a 200. Noting strange there.

However, it's next to useless as far as PR goes, and not used as much these days. It can still be practical for counting purpouses where a site owner doesn't have full access to referral logs. Ask your client if they have had any previous collaboration with the site linking to them.

Lorel

1:59 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I consider any abberation of a normal link suspect of wrong doing until proven innocent. I'm not a programmer so not aware of what one can do to a link. But thanks for the info re this not passing PR. I suspected as much.

KenB

2:42 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I keep a seperate 404 log and see all kinds of goofy stuff at the end of URLs on a regular basis. Oftentimes it is caused by bad link generation (e.g. .html> instead of .html">). I see this so much I actually have a whole series of .htaccess instructions to get people on their way to the correct spot. Here are some of the rewrite rules I use:

RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html\.(.*) /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html>(.*) /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html-(.*) /$1.html [R=301,L]
RewriteRule ^([0-9Ša-zŠA-Z].*)\.html([0-9Ša-zŠA-Z].*) /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)Enviro(.*) /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)E /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)20E /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)20 /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)$ /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)/(.)target= /$1/ [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.html(.)target= /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)/default\.htm$ /$1/ [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.htm$ /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.ht$ /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.h$ /$1.html [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)\.$ /$1 [R=301,L]
RewriteRule ^([a-zŠA-ZŠ0-9].*)&gt$ /$1 [R=301,L]
RewriteRule ^&(.*) / [R=301,L]
RewriteRule ^([0-9Ša-zŠA-Z].*)/&(.*) /$1/ [R=301,L]
RewriteRule ^([0-9Ša-zŠA-Z].*).html&(.*) /$1.html [R=301,L]
RewriteRule ^([0-9Ša-zŠA-Z].*).htm&(.*) /$1.html [R=301,L]

Since all of my URLs start with a number or a letter I use the "[0-9Ša-zŠA-Z]" at the begining of my regular expression to prevent any funny stuff. I haven't yet come up with a good instruction to strip off query strings although I'd really like to.

jdMorgan

3:11 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



KenB,

> I haven't yet come up with a good instruction to strip off query strings...

If you add a "?" to the end of your substitution URL, any query string on the request will be cleared.

Demonstrating that, along with a generic rule you might use to replace the first ten of your ".html<plus-more>" rules:


RewriteRule ^([a-zŠ0-9].*)\.html.+$ http://www.example.com/$1.htm[b]l?[/b] [NC,R=301,L]

Note that [NC] makes the compare case-insensitive, and is more efficient than using [A-Za-z]. Also note that on many servers, the substitution will be required to be a canonical URL as shown here and below.

Lorel,

The "?" and characters following the URL are a query string, and won't do anything unless your site is dynamic, and the page-generation script that you use accepts that query string and processes it in some way to affect the page that it produces.

Otherwise, the only negative effect is that it produces a second URL by which your page can be accessed, thus creating a minor duplicate-content annoyance.

If you're on Apache, and your site is entirely static, you can remove the query string using this general-case rewriterule:


RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

Jim

jomaxx

3:21 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's not the best way to link, but it's valid and I think it could pass PR. It's a matter of Google recognizing that the two forms of the URL are functionally identical, which their algorithms can figure out even if it doesn't happen immediately.

KenB

4:32 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jdMorgan,

Plus one on your cleaner .htaccess fix.

Thanks, it worked like a charm.

I suspect I'll have to nuke the bogus query strings via a 301 redirect using PHP.

KenB

5:14 am on Aug 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a quick PHP code I threw together that strips the query string off of a request. I exempted out a contact form as it is common for me to feed query strings to the contact form for things like prepadded subjects. The code should be one of the first things PHP processes for a page request and must come before anything is output to the browser.

if($REQUEST_URI!="/email.html"){
$strURL="http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$arrayURL=parse_url($strURL);
if($arrayURL['query']!=""){
header("HTTP/1.1 301 Moved Permanently");
header("Location: ".$_SERVER['HTTP_HOST'].$arrayURL['path']);
exit();
}
}
 

Featured Threads

Hot Threads This Week

Hot Threads This Month