homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
URLs restricted by robots.txt (41) AGH!
I'm a bit stumped about what's best to do re my robots.txt
wrafter




msg:3300515
 9:41 pm on Apr 2, 2007 (gmt 0)

I have a site, which is doing OK on Google, but I'm afraid we could be stopping the site from doing better.

Somewhere along the development process we had a mix-up between whether the restaurant reviews on our site were to be at, for example, review.php?id=123 or reviews.php?id=123 - note the absence/presence of the 's'.

So we opted for one of them, which was the review.php?id=123 format and stuck with it. We then noticed that Google had indexed some restaurant reviews using the reviews.php?id=123 format, much to our horror, as it was resulting in 404s.

We responded by adding to our robots.txt file the following:

Disallow: /reviews.php

So Google stopped listing reviews.php?id=123 on the search returns, and all was well.

However, in Google Webmaster Tools, some 3 months after adding the above to robots.txt, I'm seeing today "URLs restricted by robots.txt (41)", and thereunder are listed 41 URLs (right up to yesterday's date), with reviews.php?id=123 format addresses.

Am I losing traffic from Google as a result of this?

Have I handle this correctly?

And is there anything I can do?

Appreciate the help of my peers. Thank you kindly.

 

jimbeetle




msg:3300536
 10:01 pm on Apr 2, 2007 (gmt 0)

The only problem I see here is that Google has apparently found links to the reviews.php pages at one time. As long as access to those pages is blocked by robots.txt, G just might keep thinking they exist for a very, very long time.

The best bets are to tell the bots that the pages are either gone or moved, so remove the directives in robot.txt and either serve a 404 or 410 for the reviews.php pages, or 301 redirect those pages to the proper review.php pages. I'd probably choose the 301.

Either way, G will most likely keep those pages hanging around in the supplemental index for six months to a year, but it should be no cause for concern.

g1smd




msg:3304471
 6:15 pm on Apr 6, 2007 (gmt 0)

If there is a one-to-one match for the reviews that have been inadvertently indexed vs. the reviews as they are now located at the new URLs, then the 301 redirect would be perfect. In this case the content really has moved.

In all other cases the custom 404 would be the way to go.

I would not use the robots.txt protocol for this, as you DO want Google to access the old URLs and see that the files have moved or gone.

wrafter




msg:3311035
 11:07 am on Apr 14, 2007 (gmt 0)

Thank you for your very considered responses, it's appreciated.

Can I ask one final question, following on from your advice.

If we opt to implement a 301 redirect, as you suggest.

In the .htaccess file, what do you suggest we use?

Note we have, according to Google Webmaster Tools Diagnostic, Web Crawl page, 54 instances of "URLs restricted by robots.txt"

Therefore, do I need to include 54 lines in my .htaccess file, or is there one line that I can use? All 54 instances of "URLs restricted by robots.txt" are of the same type

where thedomaininquestion.com/reviews.php?id=128 SHOULD BE thedomaininquestion.com/review.php?id=128

All 54 instances have this one and the same issue.

Therefore it's probable there's some way to include one line rather than a line like this one (following), 54 times, no?

redirect 301 /reviews.php?id=128 [thedomaininquestion.com...]

Thoughts?

Thanks in advance!

wrafter




msg:3319786
 9:47 am on Apr 24, 2007 (gmt 0)

Reminder. :)

goodroi




msg:3321213
 1:18 pm on Apr 25, 2007 (gmt 0)

hi wrafter,

to get good answers on questions about htaccess you might want to post your question here [webmasterworld.com...]

jay5r




msg:3321228
 1:31 pm on Apr 25, 2007 (gmt 0)

What you're looking for is this (just one line, not many):

redirect 301 /reviews.php [thedomaininquestion.com...]

That's worked for me in the past - test it to make sure it works for you.

wrafter




msg:3321730
 8:38 pm on Apr 25, 2007 (gmt 0)

Thank you kindly all. Will try that, but will also go to Apache section just to be sure to be sure to be sure.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved