homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
robots.txt & Disallow: / - What happens next?
Karma




msg:4513944
 9:46 pm on Oct 30, 2012 (gmt 0)

I accidentally copied across the wrong robots.txt to my live site around 10 days ago, and ended up disallowing everything (for around 4 days).

User-agent: *
Disallow: /

What is the above saying 'exactly'? What would you expect to happen to my existing pages?

Traffic has since dived around 50% and I can no-longer see some of my best pages in the serps.

Ekk! :D

 

sunnyujjawal




msg:4514053
 5:35 am on Oct 31, 2012 (gmt 0)

It means you had block all search engine bots (not to crawl & index pages) for your whole site.

Either delete this file or make it:

User-agent: *
Allow: /

Reference: [google.com...]

g1smd




msg:4514072
 7:06 am on Oct 31, 2012 (gmt 0)

Errr, you need:

User-agent: *
Disallow:


for the widest compatibility.

It will take a week or two if you are lucky for your pages to come back into the SERPs. If you are unlucky it will be a lot longer.

Karma




msg:4514192
 2:49 pm on Oct 31, 2012 (gmt 0)

Errr, thanks :)

So, just to clarify... The disallow: / is saying delete the previously indexed pages from the serps?

phranque




msg:4514212
 3:20 pm on Oct 31, 2012 (gmt 0)

robots.txt is about crawling not indexing.
the Disallow directive means if the URL matches the pattern from left-to-right, don't request that URL.
if the pattern is just a slash "/" it matches everything because the slash means the root directory and matches everything in it.
so "Disallow: /" means don't crawl anything which actually prevents you from providing any indexing control.

Karma




msg:4514213
 3:30 pm on Oct 31, 2012 (gmt 0)

Thanks phranque. So this also says 'de-index' what you have already, right?

g1smd




msg:4514220
 3:44 pm on Oct 31, 2012 (gmt 0)

As the end result, that's basically what happens.

aakk9999




msg:4514222
 3:45 pm on Oct 31, 2012 (gmt 0)

So this also says 'de-index' what you have already, right?


As pharanque said, robots.txt controls crawling and not indexing. It does NOT mean "de-index what you have already".

However, without crawling, google does not know what is on the page. So it has to rely to off-page signals only. And further, off-page signals that come from other pages within your site are also lost. Therefore, the ranking will almost certainly drop - which is what has happened to you.

But the URL will still be indexed. You can verify this by using Google search site: operator or a combination of Google search site: and inurl: operators to check a particular URL.

Hopefully by now you have re-instated the correct robots.txt. Your rankings will almost certainly return to where they were, but you need to give Google the time to pick up the new robots.txt and then to re-crawl all URL it was forbidden to crawl.

Only after re-crawling all URLs that were blocked should your rankings return to where they were(*) since within the site one page often supports the other, so it is not enough that your page that got blocked is re-crawled, but also the page that links to it must be re-crawled too.

So all that you can do now is just wait and monitor.

*Disclaimer: assumes no other algo changes in that period which would affect you

Karma




msg:4514223
 3:48 pm on Oct 31, 2012 (gmt 0)

Perfect, thanks all.

aakk9999 - You're 100% right, I can see the page using inurl: just not search for "keyword1 keyword2 mysitename".

Thanks

Sgt_Kickaxe




msg:4514254
 5:28 pm on Oct 31, 2012 (gmt 0)

Congratulations, you've made bonehead mistake #237, up next is uploading a header file belonging to site A onto site B just before going to bed and confusing the heck out of your visitors for a night!

I don't think you can really call yourself a webmaster until you've made enough of these mistakes to learn from :)

Karma




msg:4514265
 5:47 pm on Oct 31, 2012 (gmt 0)

Heh, couldn't agree more. Made a few classic mistakes in the past - where does accidentally adding a noindex/nofollow for a couple of weeks rank in the bonehead index?

Jeez... what's #1, or is 1-10 reserved by Google?

g1smd




msg:4514267
 5:53 pm on Oct 31, 2012 (gmt 0)

But the URL will still be indexed.

Note that the above means only that there will be an entry in Google's database that records that a link to "http://www.example.com/somepage" has been seen at some time in the past.

A "URL being indexed" purely means that the URL has been noted. Actually requesting the page and looking at the content on it is a whole other matter.

aakk9999




msg:4514276
 6:22 pm on Oct 31, 2012 (gmt 0)

A "URL being indexed" purely means that the URL has been noted.

I would like to add .. and is available to show in SERPs.
URL not indexed will not be shown in SERPs
URL not crawled, but "noted" and indexed *may* show in SERPs

In almost all cases however, URLs blocked by robots are as good as "not indexed" (i.e. useless) when searching for keyword1 keyword2 or similar* (although backlinks may give Google some clue).

* Perhaps this is what g1smd meant in his middle post further above.

serpsup




msg:4514411
 10:33 pm on Oct 31, 2012 (gmt 0)

This discussion is related to some Disallow changes I recently added to avoid a faceted navigation spider trap - so I wanted to chime in with a related question. The disallowed urls stopped appearing in the site: inurl: within a week. This was for roughly a couple hundred thousand urls.

From a search engine perspective, is this enough keep those urls from counting for things like duplicate content checks? I've also added noindex, follow meta tag fwiw.

Thanks,
Tom

g1smd




msg:4514413
 10:39 pm on Oct 31, 2012 (gmt 0)

Once you disallow a page, it is never fetched again - so any tags that you add to the page will never be seen.

Karma




msg:4517802
 4:16 pm on Nov 10, 2012 (gmt 0)

Just an update...

After 2 weeks, I can now see the page again in #1 spot by searching for 'keyword1 keyword2 mysitename'.

However, I'm not in the first 50 pages if I search for 'keyword1 keyword2' whereas before I was #3.

aakk9999




msg:4518136
 10:54 pm on Nov 11, 2012 (gmt 0)

You need to give it more time.

Karma




msg:4518148
 12:29 am on Nov 12, 2012 (gmt 0)

I know, I'm just giving an update to those that posted earlier.

I'm treating the whole process as a learning experience :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved