Welcome to WebmasterWorld Guest from 54.147.10.72

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Noindex, Nofollow on PHP script

     

vphoner

12:45 am on Jun 30, 2011 (gmt 0)

10+ Year Member



I have a php script that passes a product number and then calls an external site with that product number. The script works fine. The one problem is that search engines (google in particular) are indexing all these calls to the script as separate pages and can generate hundreds of pages. I would say these could count as "thin" content pages. My goal is to prevent these pages from being indexed, but you cannot insert the standard <meta name="robots" content="noindex,nofollow"> in the script. (I tried and the script hung up and never went to the external link).

The question is, other than blocking a directory in robots.txt, is there a way to get google to stop indexing these pages? Is it a good idea to not have these indexed (given the recent changes in google's algorithms).

Is there another way to call the external link other than the way I have coded it, and also block indexing of the link?

Here is an example of the php script:

<?php
session_start();
//product number is passed through the script
$PRODUCTNUM1234 = $_GET['PRODUCTNUM1234'];

//Go to External link with this product number

header("Location:http://www.external-site.com/product/". $PRODUCTNUM1234 ."/myid") ;

exit();
?>

penders

9:41 am on Jun 30, 2011 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Is it the URL of your actual script that you want to prevent Google from indexing? Or www.external-site.com/product/... ?

By default the
header("Location:...");
redirect returns a 302 (Found) Status Code, so the search engine is still likely to index the URL. I would have said that you should send a 301 (Moved Permanently) instead.

header("Location:http://www.external-site.com/product/". $PRODUCTNUM1234 ."/myid",true,301);


However, this will then result in www.external-site.com/product/... from being indexed. But then that is down to external-site.com to block (robots.txt or robots META tag).

vphoner

2:48 am on Jul 1, 2011 (gmt 0)

10+ Year Member



Is it the URL of your actual script that you want to prevent Google from indexing? Or www.external-site.com/product/... ?


I would want to prevent the call to this php program on my site from being indexed, if the indexing of 1000 instances of this script would count as thin pages and hurt my site.

prevent indexing of pages generated by a php script like this.
[mysite.com...]

penders

9:23 pm on Jul 1, 2011 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



So you have anchors like this...?
<a href="http://www.mysite.com/getproduct.php?PRODUCTNUM=746463">something</a>


I would have thought returning a 301 status code in your script (as indicated) should prevent these URLs being indexed. However, I think external-site.com will then get indexed. (?)

You could also place a
rel="nofollow"
in these anchors to prevent pagerank from being passed. Is
rel="noindex"
supported by search engines on anchors?

vphoner

3:46 pm on Jul 2, 2011 (gmt 0)

10+ Year Member



So you have anchors like this...?<a href="http://www.mysite.com/getproduct.php?PRODUCTNUM=746463">something</a>


Yes, the anchors look like that. The goal is to get google to stop indexing the hundreds of dynamic pages generated by getproduct.php.

I guess the question is do these count as "thin" pages to google or are they ignored for ranking/pagerank, etc.

Normal pages you can put <meta name="robots" content="noindex,nofollow"> in the header. But php scripts like this that call an external site do not allow that.

You can use robots.txt to block a folder that contains getproduct.php, but I am not sure if search engines frown on that, or penalize you for blocking what would be hundreds of links from your site to this php script.

vphoner

10:17 pm on Jul 5, 2011 (gmt 0)

10+ Year Member



After reading many articles on the subject, I saw someone had put this in their robots.txt

User-agent: *
disallow: /*?*


What will this block? It looks like it would block any url with parameters starting with ?.

Would a php file get_info.php?ITEM=12345 be blocked from indexing with the above robot.txt directive?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month