homepage Welcome to WebmasterWorld Guest from 54.211.235.255
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Noindex, Nofollow on PHP script
vphoner




msg:4332825
 12:45 am on Jun 30, 2011 (gmt 0)

I have a php script that passes a product number and then calls an external site with that product number. The script works fine. The one problem is that search engines (google in particular) are indexing all these calls to the script as separate pages and can generate hundreds of pages. I would say these could count as "thin" content pages. My goal is to prevent these pages from being indexed, but you cannot insert the standard <meta name="robots" content="noindex,nofollow"> in the script. (I tried and the script hung up and never went to the external link).

The question is, other than blocking a directory in robots.txt, is there a way to get google to stop indexing these pages? Is it a good idea to not have these indexed (given the recent changes in google's algorithms).

Is there another way to call the external link other than the way I have coded it, and also block indexing of the link?

Here is an example of the php script:

<?php
session_start();
//product number is passed through the script
$PRODUCTNUM1234 = $_GET['PRODUCTNUM1234'];

//Go to External link with this product number

header("Location:http://www.external-site.com/product/". $PRODUCTNUM1234 ."/myid") ;

exit();
?>

 

penders




msg:4332961
 9:41 am on Jun 30, 2011 (gmt 0)

Is it the URL of your actual script that you want to prevent Google from indexing? Or www.external-site.com/product/... ?

By default the
header("Location:..."); redirect returns a 302 (Found) Status Code, so the search engine is still likely to index the URL. I would have said that you should send a 301 (Moved Permanently) instead.

header("Location:http://www.external-site.com/product/". $PRODUCTNUM1234 ."/myid",true,301);


However, this will then result in www.external-site.com/product/... from being indexed. But then that is down to external-site.com to block (robots.txt or robots META tag).

vphoner




msg:4333407
 2:48 am on Jul 1, 2011 (gmt 0)

Is it the URL of your actual script that you want to prevent Google from indexing? Or www.external-site.com/product/... ?


I would want to prevent the call to this php program on my site from being indexed, if the indexing of 1000 instances of this script would count as thin pages and hurt my site.

prevent indexing of pages generated by a php script like this.
[mysite.com...]

penders




msg:4333861
 9:23 pm on Jul 1, 2011 (gmt 0)

So you have anchors like this...?
<a href="http://www.mysite.com/getproduct.php?PRODUCTNUM=746463">something</a>

I would have thought returning a 301 status code in your script (as indicated) should prevent these URLs being indexed. However, I think external-site.com will then get indexed. (?)

You could also place a
rel="nofollow" in these anchors to prevent pagerank from being passed. Is rel="noindex" supported by search engines on anchors?
vphoner




msg:4334150
 3:46 pm on Jul 2, 2011 (gmt 0)

So you have anchors like this...?<a href="http://www.mysite.com/getproduct.php?PRODUCTNUM=746463">something</a>


Yes, the anchors look like that. The goal is to get google to stop indexing the hundreds of dynamic pages generated by getproduct.php.

I guess the question is do these count as "thin" pages to google or are they ignored for ranking/pagerank, etc.

Normal pages you can put <meta name="robots" content="noindex,nofollow"> in the header. But php scripts like this that call an external site do not allow that.

You can use robots.txt to block a folder that contains getproduct.php, but I am not sure if search engines frown on that, or penalize you for blocking what would be hundreds of links from your site to this php script.

vphoner




msg:4335431
 10:17 pm on Jul 5, 2011 (gmt 0)

After reading many articles on the subject, I saw someone had put this in their robots.txt

User-agent: *
disallow: /*?*


What will this block? It looks like it would block any url with parameters starting with ?.

Would a php file get_info.php?ITEM=12345 be blocked from indexing with the above robot.txt directive?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved