|URLs restricted by robots.txt (41) AGH!|
I'm a bit stumped about what's best to do re my robots.txt
| 9:41 pm on Apr 2, 2007 (gmt 0)|
I have a site, which is doing OK on Google, but I'm afraid we could be stopping the site from doing better.
Somewhere along the development process we had a mix-up between whether the restaurant reviews on our site were to be at, for example, review.php?id=123 or reviews.php?id=123 - note the absence/presence of the 's'.
So we opted for one of them, which was the review.php?id=123 format and stuck with it. We then noticed that Google had indexed some restaurant reviews using the reviews.php?id=123 format, much to our horror, as it was resulting in 404s.
We responded by adding to our robots.txt file the following:
So Google stopped listing reviews.php?id=123 on the search returns, and all was well.
However, in Google Webmaster Tools, some 3 months after adding the above to robots.txt, I'm seeing today "URLs restricted by robots.txt (41)", and thereunder are listed 41 URLs (right up to yesterday's date), with reviews.php?id=123 format addresses.
Am I losing traffic from Google as a result of this?
Have I handle this correctly?
And is there anything I can do?
Appreciate the help of my peers. Thank you kindly.
| 10:01 pm on Apr 2, 2007 (gmt 0)|
The only problem I see here is that Google has apparently found links to the reviews.php pages at one time. As long as access to those pages is blocked by robots.txt, G just might keep thinking they exist for a very, very long time.
The best bets are to tell the bots that the pages are either gone or moved, so remove the directives in robot.txt and either serve a 404 or 410 for the reviews.php pages, or 301 redirect those pages to the proper review.php pages. I'd probably choose the 301.
Either way, G will most likely keep those pages hanging around in the supplemental index for six months to a year, but it should be no cause for concern.
| 6:15 pm on Apr 6, 2007 (gmt 0)|
If there is a one-to-one match for the reviews that have been inadvertently indexed vs. the reviews as they are now located at the new URLs, then the 301 redirect would be perfect. In this case the content really has moved.
In all other cases the custom 404 would be the way to go.
I would not use the robots.txt protocol for this, as you DO want Google to access the old URLs and see that the files have moved or gone.
| 11:07 am on Apr 14, 2007 (gmt 0)|
Thank you for your very considered responses, it's appreciated.
Can I ask one final question, following on from your advice.
If we opt to implement a 301 redirect, as you suggest.
In the .htaccess file, what do you suggest we use?
Note we have, according to Google Webmaster Tools Diagnostic, Web Crawl page, 54 instances of "URLs restricted by robots.txt"
Therefore, do I need to include 54 lines in my .htaccess file, or is there one line that I can use? All 54 instances of "URLs restricted by robots.txt" are of the same type
where thedomaininquestion.com/reviews.php?id=128 SHOULD BE thedomaininquestion.com/review.php?id=128
All 54 instances have this one and the same issue.
Therefore it's probable there's some way to include one line rather than a line like this one (following), 54 times, no?
redirect 301 /reviews.php?id=128 [thedomaininquestion.com...]
Thanks in advance!
| 9:47 am on Apr 24, 2007 (gmt 0)|
| 1:18 pm on Apr 25, 2007 (gmt 0)|
to get good answers on questions about htaccess you might want to post your question here [webmasterworld.com...]
| 1:31 pm on Apr 25, 2007 (gmt 0)|
What you're looking for is this (just one line, not many):
redirect 301 /reviews.php [thedomaininquestion.com...]
That's worked for me in the past - test it to make sure it works for you.
| 8:38 pm on Apr 25, 2007 (gmt 0)|
Thank you kindly all. Will try that, but will also go to Apache section just to be sure to be sure to be sure.