Welcome to WebmasterWorld Guest from 54.197.94.141

Message Too Old, No Replies

robots.txt & Disallow: / - What happens next?

   
9:46 pm on Oct 30, 2012 (gmt 0)

5+ Year Member



I accidentally copied across the wrong robots.txt to my live site around 10 days ago, and ended up disallowing everything (for around 4 days).

User-agent: *
Disallow: /

What is the above saying 'exactly'? What would you expect to happen to my existing pages?

Traffic has since dived around 50% and I can no-longer see some of my best pages in the serps.

Ekk! :D
5:35 am on Oct 31, 2012 (gmt 0)



It means you had block all search engine bots (not to crawl & index pages) for your whole site.

Either delete this file or make it:

User-agent: *
Allow: /

Reference: [google.com...]
7:06 am on Oct 31, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Errr, you need:

User-agent: *
Disallow:


for the widest compatibility.

It will take a week or two if you are lucky for your pages to come back into the SERPs. If you are unlucky it will be a lot longer.
2:49 pm on Oct 31, 2012 (gmt 0)

5+ Year Member



Errr, thanks :)

So, just to clarify... The disallow: / is saying delete the previously indexed pages from the serps?
3:20 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



robots.txt is about crawling not indexing.
the Disallow directive means if the URL matches the pattern from left-to-right, don't request that URL.
if the pattern is just a slash "/" it matches everything because the slash means the root directory and matches everything in it.
so "Disallow: /" means don't crawl anything which actually prevents you from providing any indexing control.
3:30 pm on Oct 31, 2012 (gmt 0)

5+ Year Member



Thanks phranque. So this also says 'de-index' what you have already, right?
3:44 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



As the end result, that's basically what happens.
3:45 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



So this also says 'de-index' what you have already, right?


As pharanque said, robots.txt controls crawling and not indexing. It does NOT mean "de-index what you have already".

However, without crawling, google does not know what is on the page. So it has to rely to off-page signals only. And further, off-page signals that come from other pages within your site are also lost. Therefore, the ranking will almost certainly drop - which is what has happened to you.

But the URL will still be indexed. You can verify this by using Google search site: operator or a combination of Google search site: and inurl: operators to check a particular URL.

Hopefully by now you have re-instated the correct robots.txt. Your rankings will almost certainly return to where they were, but you need to give Google the time to pick up the new robots.txt and then to re-crawl all URL it was forbidden to crawl.

Only after re-crawling all URLs that were blocked should your rankings return to where they were(*) since within the site one page often supports the other, so it is not enough that your page that got blocked is re-crawled, but also the page that links to it must be re-crawled too.

So all that you can do now is just wait and monitor.

*Disclaimer: assumes no other algo changes in that period which would affect you
3:48 pm on Oct 31, 2012 (gmt 0)

5+ Year Member



Perfect, thanks all.

aakk9999 - You're 100% right, I can see the page using inurl: just not search for "keyword1 keyword2 mysitename".

Thanks
5:28 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Congratulations, you've made bonehead mistake #237, up next is uploading a header file belonging to site A onto site B just before going to bed and confusing the heck out of your visitors for a night!

I don't think you can really call yourself a webmaster until you've made enough of these mistakes to learn from :)
5:47 pm on Oct 31, 2012 (gmt 0)

5+ Year Member



Heh, couldn't agree more. Made a few classic mistakes in the past - where does accidentally adding a noindex/nofollow for a couple of weeks rank in the bonehead index?

Jeez... what's #1, or is 1-10 reserved by Google?
5:53 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



But the URL will still be indexed.

Note that the above means only that there will be an entry in Google's database that records that a link to "http://www.example.com/somepage" has been seen at some time in the past.

A "URL being indexed" purely means that the URL has been noted. Actually requesting the page and looking at the content on it is a whole other matter.
6:22 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



A "URL being indexed" purely means that the URL has been noted.

I would like to add .. and is available to show in SERPs.
URL not indexed will not be shown in SERPs
URL not crawled, but "noted" and indexed *may* show in SERPs

In almost all cases however, URLs blocked by robots are as good as "not indexed" (i.e. useless) when searching for keyword1 keyword2 or similar* (although backlinks may give Google some clue).

* Perhaps this is what g1smd meant in his middle post further above.
10:33 pm on Oct 31, 2012 (gmt 0)



This discussion is related to some Disallow changes I recently added to avoid a faceted navigation spider trap - so I wanted to chime in with a related question. The disallowed urls stopped appearing in the site: inurl: within a week. This was for roughly a couple hundred thousand urls.

From a search engine perspective, is this enough keep those urls from counting for things like duplicate content checks? I've also added noindex, follow meta tag fwiw.

Thanks,
Tom
10:39 pm on Oct 31, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Once you disallow a page, it is never fetched again - so any tags that you add to the page will never be seen.
4:16 pm on Nov 10, 2012 (gmt 0)

5+ Year Member



Just an update...

After 2 weeks, I can now see the page again in #1 spot by searching for 'keyword1 keyword2 mysitename'.

However, I'm not in the first 50 pages if I search for 'keyword1 keyword2' whereas before I was #3.
10:54 pm on Nov 11, 2012 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



You need to give it more time.
12:29 am on Nov 12, 2012 (gmt 0)

5+ Year Member



I know, I'm just giving an update to those that posted earlier.

I'm treating the whole process as a learning experience :)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month