Forum Moderators: goodroi
I soon As I found out about the problem I wrote a script that did a 301 Redirect, but it does not seem to solve the dilemma. Rewriting the content is not an option too.
So my strategy is to create a big robots.txt file to battle the problem.
Question: what is Maximum size for robots.txt since it will have to be about 3000 lines long or may be there is a better way to handle it
Thanks for your input.
You can't use wildcards or pattern matching as its compliant with current de facto standart.
It might be better to split site into two logical areas one of which (to avoid suspect duplicate content) will be specifically disallowed in robots.txt. Make sure however that its not the area where everyone is or will be linking to!
You can't use wildcards or pattern matching as its compliant with current de facto standart.
If you read carefully my reply, you'll notice that I was referring to the implicit wildchar "*" that the Robots Exclusion Standard "puts" at the end of any Disallow: path.
Sometimes webmasters can take advantage of this feature to reduce the amount of disallow lines in the file.
That's why I asked to the original poster for an example: to understand if the implicit wildchar can be used in this specifc case to limit the number of disallow lines.
product_page.cfm/prodid/100.cfm <-this one i want to keep
then there is
old_product_page.cfm?prodid=100 has bunch of a lot back-links from scrapper sites I have nothing to do with thouse
product_page.cfm?prodid=100 <-goes as 301 to the one above, but is in the index of Y and G and for some reason I cant get them to disappear fro 5 month already
product_page.cfm/prodid/100/item/widget-name.cfm
and
product_page.cfm/item/widget-name/prodid/100.cfm
now widget name is be different
in MS-SE we are doing great top 5 on about 100 keyword phrases, and is shows about 1700 pages indexed. With Google 4800 Pages indexed where about 3000 do not exist any more or duplicate content with different urls for the same products.
Duplicate content with different urls for the same site I did not mean to do it at all.
I am at the point to contact Google and ask them to drop the site from the index completely and then re-index. Dont know how far I will get with the request. G-Bot rolls in 2-3 times a week at least and does pretty good job on caching pages.
When I do site: command it even returns separate urls for pages with
product_page.cfm
and Product_Page.cfm case sensitive P and p
the end result main keyword phrase we are not in first 1000 on Google and we have more widgets than any retail competitor on the market sad
I am going to post URL of the site in the profile, I am here to listen and learn.
Thanks