Welcome to WebmasterWorld Guest from 54.158.51.150

Forum Moderators: goodroi

Message Too Old, No Replies

Wildcards on robots.txt disallows

     
9:36 am on Nov 13, 2007 (gmt 0)

5+ Year Member



I run an oscommerce store, and I am trying to disallow google from indexing the product review pages. The reason is that it seems that some of the review pages for products appear on google on top of the actual product page, so a potential customer gets taken to an empty reviews page (yeah, there are no reviews on the products, but the reviews page is found on top of the products page, doh)

So I just wanted to disallow google from indexing my review pages, but I am using SEO URL, so I don't really know how to do it. My real reviews URL would be:

http://www.example.com/product_reviews.php?products_id=72

but with SEO URL, it is:

http://www.example.com/vga-splitter-duplicator-puertos-pc-monitores-pr-72.html

so I do not know how I could add that to my robots.txt. What all the reviews have in common at the end is the *-pr-number.html, their rewriterule is the following:

RewriteRule ^(.*)-pr-([0-9]+).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}

So I was wondering what would be the right way of disallowing all this review files, since I do not think that placing a "Disallow: /product_reviews.php" will do the trick.

Disallow: /*-pr-*

Would this work ok? I am not sure I am using the right sintax, and I do know that this will also disallow any other URLs that have -pr- on them (but I can live with that, I doubt I am going to use the word "pr" a lot around hehe)

Many thanks! :)

[edited by: jatar_k at 6:34 pm (utc) on Nov. 13, 2007]
[edit reason] please use example.com [/edit]

10:35 am on Nov 13, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], mindtwist!

you should examplify your urls in this forum (use example.com)

the correct way to exclude this would be "Disallow: /product_reviews.php" since wildcarding and globbing aren't officially supported for robot.txt.

11:25 am on Nov 13, 2007 (gmt 0)

5+ Year Member



Thx phranque :)

The problem is that the URLs I want to exclude are not /product_reviews.php?products_id=72 , but /vga-splitter-duplicator-puertos-pc-monitores-pr-72.html

I have a contribution installed on my store that changes the URLs to the second ones, to make them friendlier for search engines. SO I guess that if I add "Disallow: /product_reviews.php" to my robots.txt, URLs like example.com/vga-splitter-duplicator-puertos-pc-monitores-pr-72.html will still be spidered, which is what I want to avoid.

Thank you!
Aitor

12:43 pm on Nov 13, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's probably too late, but could you use either of the following formats for your reviews' friendly URLs?

/pr-vga-splitter-duplicator-puertos-pc-monitores-72.html
/pr-72-vga-splitter-duplicator-puertos-pc-monitores.html

URL systems should be designed with the limitations of robots.txt URL prefix-matching in mind.

Jim

5:28 pm on Nov 13, 2007 (gmt 0)

5+ Year Member



Mmmh yeah, I could probably just modify a little bit the "Ultimate SEO URL" contribution so it will not SEO the product_reviews.php.

After all, if I do not want them indexed, I couldn't care less on how nice they look like to search engines... I wll try to take this route, make it so they are not SEOed, disallow product_reviews.php on robots.txt, and wait for the old -pr- URLs to vanish from google.

Thanks! :D

5:32 pm on Nov 13, 2007 (gmt 0)

5+ Year Member



Mmmh just saw on another thread that this could be used to disallow files that have "cat_id" as an argument.

Disallow: /*cat_id=*

Would this work for me to disallow files that have -pr- on the URL? So Google would not index my reviews pages

Disallow: /*-pr-*

Thx!

9:14 pm on Nov 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am not sure on the robots.txt answer, but osCommerce is written in php, so if you find the include that generates the head section, you could use the following to achieve the same effect:

<?php

if(strstr($_SERVER['REQUEST_URI'],"-pr-") === TRUE) {
echo "<meta name=\"robots\" content=\"noindex,nofollow,noarchive\" />";
}

?>

Justin

<added>
If you want some SE credit for the pages, you might consider changing the line to:
echo "<meta name=\"robots\" content=\"noindex,follow,noarchive\" />";
</added>

10:49 pm on Nov 14, 2007 (gmt 0)

5+ Year Member



OMG... You really made my day now.

That ought to be such a simple solution that it just couldn't get any easier. No messing with URLs, no messing with .htaccess, and no messing with almost anything. I only had to create a new variable on my meta tags module so it will add the "noindex,follow,noarchive" to my product_reviews.php and product_reviews_info.php, and will just leave it "all" everywhere else.

Thank you! :D

11:01 pm on Nov 14, 2007 (gmt 0)

5+ Year Member



Up and working, I checked everywhere, checked source, and indeed I have "all" everywhere except on the reviews, where I have "noindex,follow,noarchive"

Now Google should start forgeting about those indexed pages over time, shouldn'it it? :)

6:28 pm on Nov 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You really made my day now.

Thanks.
It's nice to know when something I post actually helps someone out.

Now Google should start forgeting about those indexed pages over time, shouldn'it it?

Yes, they should be dropped from the index the next time they are spidered.

Justin

5:59 pm on Nov 27, 2007 (gmt 0)

5+ Year Member



Just came back to say that the solution worked great :D

I just checked with google webmaster tools, and I have a long list of URLs restricted by robots.txt (386). All the review pages seem to be coming out there, so people won't find any more review pages before the one for the product :D

Thx again! ^^

 

Featured Threads

Hot Threads This Week

Hot Threads This Month