Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Meaning of "URLs restricted by robots.txt" - in Webmaster Tools

         

fraudcop

11:15 pm on Nov 3, 2006 (gmt 0)

10+ Year Member



inside google's webmasters tools

I reached (9800) Urls restricted by robots.txt
Now after almost 2 months they are (7407) Urls.

Can anyone explains what is the meanining of the lowered number?

thanks in advance

thecoalman

5:55 am on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It goes by whatever was resrticted in x amount of days, looking on my page it only goes back 2 weeks. The total will change each whenever the latest days are added.

fraudcop

11:05 am on Nov 4, 2006 (gmt 0)

10+ Year Member



thanks for the answer.

I'm wondering what happens with my double content pages
30 login pages- 70 registration pages etc,

how long will they stay in the index before they get deleted by google and stop doing harm.

g1smd

11:15 am on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If they are marked as Supplemental Results they can take a year to drop out.

If they have a noindex tag then they are already not causing a problem.

fraudcop

1:13 pm on Nov 4, 2006 (gmt 0)

10+ Year Member



thansk g1smd for the answer

is the disallow inside the robots.tx

User-agent: *
Disallow: /cgi-bin/Register
Disallow: /cgi-bin/Login

enough to stop givng problems or should I add a noindex tag inside each page code?

g1smd

1:38 pm on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The robots.txt disallow stops the page being spidered, but if another page links to that disallowed URL then it can still appear in Google results as a URL-only entry.

The meta robots noindex tag allows the page to be spidered, but says to not allow the content to appear in the SERPs at all. Nothing about that page will appear in the SERPs.

Use whichever one is appropriate. If you use both, then Google will not ever get to the page to see the meta tag.

hooter

1:43 pm on Nov 4, 2006 (gmt 0)

10+ Year Member



Just as an aside...the results given by google in the aforementioned "URLs restricted by robots.txt" is completely broken and worthless - I manage several sites for clients including using the Webmaster Tools interface...These sites are a mix of small static page sites, larger dynamic query URL type sites, and some using mod_rewrite for their URLs - all report 1,000s of restricted URLs yet in no way shape or form are these URLs restricted by their respective robots.txt file.

In fact, I can take any of these so-called "restricted" URLs and paste directly into google's OWN robots.txt analysis box under the Diagnostics tab, and they all come back as allowed.

thecoalman

6:22 pm on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a lot of supplementals and have been following the progress of them getting removed quite closely over the last month or better. Everything that shows up is legitatmely blocked in my case.

One thing that set the alarm bells off for me recently was a URL that at first glance should have been indexed but was denied. Upon further investigation it was a URL that went to page that I had removed from the public. The redirect was going to the login page which of course is denied in robots.txt

fraudcop

12:51 am on Nov 5, 2006 (gmt 0)

10+ Year Member




so in my case,
having to remove many duplicate pages (9000 pages)from the index it seems that

<META NAME="ROBOTS" CONTENT="noindex, NOFOLLOW">

is better than robots.txt