homepage Welcome to WebmasterWorld Guest from 54.227.20.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Pages added to index that shouldn't be appearing?
mihomes

10+ Year Member



 
Msg#: 4620512 posted 7:25 am on Nov 1, 2013 (gmt 0)

So just now I was using site: and noticed that a redirect script I use which should be blocked is showing up in Google's results. In the results though it is accompanied with 'A description for this result is not available because of this site's robots.txt learn more.' All variations of the script are showing in the index this way.

format of the script is link.php?p=word

link.php is disallowed in robots.txt

I am also, or thought, I was sending the correct headers so this would not happen in the script :


<?php

$p = $_GET['p'];
$link = array(

'linkone'=>'http://www.example.com/example.htm',

'linktwo'=>'http://www.example.com/example2.htm',

);

/*Send Headers*/

header('Content-Type: text/html; charset=utf-8');
header('X-Robots-Tag: noindex, nofollow, noarchive', true);

if (isset($link[$p]))
{
header('Location: '.$link[$p].''); // Valid URL
}
else
{
header('Location: /link/'); // Invalid URL
}

exit();
?>


I must be overlooking something... why are the different link.php?p='s showing in the Google index?

 

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4620512 posted 8:20 am on Nov 1, 2013 (gmt 0)

They're showing because there are links to them -- It's an interesting situation, because usually a redirect script that's block in robots.txt doesn't show, but they've decided to include it and the block in the robots.txt keeps them from seeing the headers, because they don't actually access the URL, so the headers you're serving really don't do any good even though they appear to be coded correctly.

I'd probably remove the robots.txt block, go with an .htaccess "fix" again for this one and just serve them a forbidden error. I think from my perspective I'd use it as much because they're getting annoying as because it works.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} googlebot [NC]
RewriteRule ^link\.php$ - [F]

mihomes

10+ Year Member



 
Msg#: 4620512 posted 8:45 am on Nov 1, 2013 (gmt 0)

Well that sucks... no wonder I wasn't seeing anything wrong with it... thanks for confirming.

As for using htacess I would think serving forbidden would hurt me since I use the links quite a bit within the site.

Then again its possible they are hurting me right now, as is, because they are in omitted results list.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4620512 posted 9:08 am on Nov 1, 2013 (gmt 0)

Oh, if you're running internal links through the redirects, then I'd remove the robots.txt and do it with PHP rather than .htaccess.

<?php

$p = $_GET['p'];
$link = array(

'linkone'=>'http://www.example.com/example.htm',

'linktwo'=>'http://www.example.com/example2.htm',

);

// If it's external
if(isset($link[$p]))
{

// If it's Googlebot or Bingbot serve a 403
if(stripos($_SERVER['HTTP_USER_AGENT'],'googlebot')!==FALSE ||
stripos($_SERVER['HTTP_USER_AGENT'],'bingbot')!==FALSE)
{ header('HTTP/1.1 403 Forbidden', true); exit; }

// Else send everyone along
else { header('Location: '.$link[$p],true,301); // Valid URL }
}

// Else it's invalid (or internal?)
// Not sure what you want to do with those,
// but I'm sure you get the point

else
{
header('Location: /link/'); // Invalid URL
}

exit();
?>

Added:
If the following redirects internally you'll likely want to set a status code, because without it you're serving a 302 rather than 301.

header('Location: /link/'); // Invalid URL

Like this:
header('Location: /link/',true,301); // Invalid URL

mihomes

10+ Year Member



 
Msg#: 4620512 posted 10:05 am on Nov 1, 2013 (gmt 0)

For the most part all links in the array are external pointing to sales links with my online processor. There are a few cases where I have internal links which are actually on the site. My original purpose was to easily change links site-wide from one file. Say if I wanted to run a promo or drastically changed a product which required a new link.

The invalid url link is an internal link that I am setting as a default. If someone were to enter a 'p' value that is non-existent in the array it defaults to this page.

I'm following what you wrote, but I am so tired I'll have to look at again. The last thing I want to do is hurt myself... passing forbidden on purchase links for Google and Bing doesn't sound right at the moment.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4620512 posted 11:01 am on Nov 1, 2013 (gmt 0)

You should probably nofollow the links too, but getting the URLs they're wanting to index out means you either have to let them follow the redirect and see where it goes, or serve them an error so they remove the URL from the index.

You're not going to be able to get the URLs out with a robots.txt block [obviously] and you're not going to be able to noindex the PHP URL since they'll just follow the redirect if you serve the PHP file to them without an error -- About the only option I can see you having is to serve them an error, and I can't think of a status other than 403 that's correct.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved