Forum Moderators: phranque

Message Too Old, No Replies

question about robots text and dynamic filter drop downs result in dup

         

mitsu

3:44 pm on Mar 30, 2010 (gmt 0)

10+ Year Member



i have a number of drop down filters on my oscomemrce site that result in pages like these

http://www.example.com/shop/index.php?manufacturers_id=27&filter_id=&range_id=4&sort_id=0

and

http://www.example.com/shop/index.php?cPath=123&filter_id=&range_id=4&sort_id=0

http://www.example.com/shop/?manufacturers_id=44


now i dont want these page sindexed as they are duplicate content

so was hoping someone could assist me with the ciorrect disallow code?

was thinking this would work to stop all variations of above

User-agent: *
Disallow: /shop/index.php?*
Disallow: /shop/?*

obviousyl dont want to block the index.php file
just the filters off it that start with a ?
eg index.php?

any help appreciated!

M

[edited by: phranque at 2:11 am (utc) on Mar 31, 2010]
[edit reason] exemplified urls [/edit]

mitsu

3:46 pm on Mar 30, 2010 (gmt 0)

10+ Year Member



bah typo above

ps these links DONT WORK but were added as example of the structure of my links
thanks!
M

mitsu

5:11 pm on Mar 31, 2010 (gmt 0)

10+ Year Member



can any body help with this?

g1smd

9:17 am on Apr 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These will do what you want:

Disallow: /shop/index.php?*filter_id=

Disallow: /shop/index.php?*range_id=

Disallow: /shop/index.php?*sort_id=


They disallow anything containing those parameters.

Notice the * is in the middle part; and matches "anything".

Notice there is NO * after the = sign. Robots.txt directives are 'prefix' matches, matching anything that "begins" with the pattern.

phranque

2:45 pm on Apr 1, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



note that the robots.txt file can be used to block urls from being crawled but it won't prevent those urls from being indexed.

to prevent indexing requires that you allow crawling and provide either a robots noindex meta tag in the document head or a X-Robots-Tag HTTP Response header.

mitsu

1:56 am on Apr 2, 2010 (gmt 0)

10+ Year Member



thanks g1smd
so that would stop these URLs being crawled
http://www.example.com/shop/index.php?manufacturers_id=27&filter_id=&range_id=4&sort_id=0
but what syntax for this type
http://www.example.com/shop/?manufacturers_id=44

Disallow: /shop/?manufacturers*
?

thanks phranque, im not sure how to go about preventing them from being indexed though as its a page created on the fly. need a progrmamers hel;p there perhaps
might just start with this
just trying to avoid 100s of extra pages of every drop down filter combo
figured it would be duplicate content and get the site in trouble

thaks for all your help guys!

mitsu

10:12 am on Apr 6, 2010 (gmt 0)

10+ Year Member



anyone clarify this?
thanks

phranque

2:27 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if you want to exclude indexing of any url that contains one or more URL parameters, insert something like this in the head of your php document:
<?php
if(count($_GET) > 0){
echo '<meta name="robots" content="noindex" />';
}
?>

mitsu

2:42 pm on Apr 6, 2010 (gmt 0)

10+ Year Member



hi phranque
hmm sorry im a bit confused
how would i do that givne the pages i want to not be indexed above?

thanks for your hlp on this!

phranque

3:00 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if you put that code in your index.php then any requests such as the following will be indexed:
http://www.example.com/shop/index.php
http://www.example.com/shop/

and any requests such as the following will not be indexed:
http://www.example.com/shop/index.php?param=value
http://www.example.com/shop/?param1=value1&param2=value2

mitsu

3:18 pm on Apr 6, 2010 (gmt 0)

10+ Year Member



aah really
thats cool!
i had no idea by looking at it thats that what it could do
wow great
that sounds perfect!
will give it a go and report back
thanks!

mitsu

3:22 pm on Apr 6, 2010 (gmt 0)

10+ Year Member



hmm just quickly
its a store that has seo friendly urls
so will this affect say a URL like
http://www.example.com/shop/category1/product2
as i want to make sure thats indexed
just not the duplicatevariations that are being spidered from drop down list / filters etc
that arent using the seo friendly URLs

?
thanks again

phranque

4:25 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



it should do exactly what i said, which is to add a robots noindex meta tag for any url that contains one or more URL parameters.
if there is nothing following a question mark, there are no parameters in the url.
you can test it on various urls by looking at the source and checking for the tag in the head of the document.

mitsu

4:42 pm on Apr 6, 2010 (gmt 0)

10+ Year Member



hi great thanks
i did test it just then
and the "no follow" appears on pages that i want to be indexed
eg SEO friendly categtory pages
and product pages

this is in the head tag
<meta name="robots" content="noindex" />

sorry for my continual questions but i dont know much about this, but i do know that if it isnt fixed google will end up indexing all the different variations of the sdrop down filters...and loads of dupe content will result...

thanks again for your help

any more advice is very much appreciated

g1smd

6:12 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Roobts.txt uses a 'prefix matching' system.

The final character is NEVER a * here.

If the final character is a * simply OMIT it.

phranque

1:45 am on Apr 7, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



do the URLs that you want to be indexed include any URL parameters?

mitsu

2:59 am on Apr 7, 2010 (gmt 0)

10+ Year Member



hmm no
dont think so

there are info pages id like indexed
like
http://www.example.com/shop/information.php/info_id/1

but beleive everything is seo flattened urls

phranque

7:16 am on Apr 7, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



are you using mod_rewrite to internally rewrite requested "seo" urls to "ugly" urls with URL parameters?

such as:
http://www.example.com/shop/information.php?info_id=1