Welcome to WebmasterWorld Guest from 18.207.136.184

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing robots.txt That Disallows All

     
3:58 am on Jan 23, 2017 (gmt 0)

Junior Member from GB 

10+ Year Member Top Contributors Of The Month

joined:Oct 16, 2002
posts: 182
votes: 3


I am about to start working with a client who has a fairly suzeable site, with a robots.txt that blocks everything. It has probably been in place gif ten years. My concern in removing it is that if I just delete it, suddenly there are going to be a whole load of pages made available to Google that were not previously, and may trigger a sandbox type penalty for the sudden big change.

I would like advice please, on the possible things to be concerned about, whether to be concerned about them and what course of action would be better if so.
7:34 pm on Jan 23, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3652
votes: 369


Before you make any decisions, I suggest that you take a look at the raw logs to see what the bots are doing now.

Also, if googlebot has been totally blocked for ten years, it would likely show a message about this in the search results, and in any case the algorithm wouldn't be able to rank the pages properly. So you might take a look at rankings and any google traffic it may be getting now.

So if you investigate these types of things, you might get a better picture of the situation as it stands now, which could help you decide what to do.
9:26 pm on Jan 23, 2017 (gmt 0)

Junior Member from GB 

10+ Year Member Top Contributors Of The Month

joined:Oct 16, 2002
posts: 182
votes: 3


Thanks. I'll do that when I can.

The only page listed on a "site:" search is the homepage with "A description for this result is not available because of this site's robots.txt" beneath it. None of the other pages are listed at all. It says "similar results were excluded". If I show them, I simply get another two entires for the homepage with different titles. All with the same robots.txt message.

Checking the logs is a good idea, I'll do that as soon as I get access to them. Webmaster tools will give useful info too when I can do that. They come up for a company name search but not expecting them to rank for much else.
10:06 pm on Jan 23, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1194
votes: 284


May be you should also identify why the robots.txt was blocking all the pages. There might be a reason, to take in consideration too.
10:29 pm on Jan 23, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15812
votes: 848


I simply get another two entires for the homepage with different titles

That's characteristic of any site that's fully roboted-out. Since they can't crawl, they don't know about the site's preferred name (with or without www) or protocol (https or not). They may even offer up a nonexistent /m/ version.

A "penalty" isn't really relevant, since there has never been anything to apply a penalty to. What you're really talking about is a site that was previously not indexed and is now entering the index--almost as if it's a brand-new site that just happens to have the same name as a pre-existing site. In fact, for all Google knows, it is a brand-new site.
10:57 pm on Jan 23, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11823
votes: 237


The server logs won't tell you much. Once googlebot has received the robots.txt file it will not request any URLs that are excluded.
11:05 pm on Jan 23, 2017 (gmt 0)

Junior Member from GB 

10+ Year Member Top Contributors Of The Month

joined:Oct 16, 2002
posts: 182
votes: 3


Thanks. So in terms of it being effectively a new site, is it better to just drop the robots.txt, or release the pages steadily over a period of time?
1:33 am on Jan 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3652
votes: 369


The server logs won't tell you much. Once googlebot has received the robots.txt file it will not request any URLs that are excluded.

My thought was that the logs would tell you how often googlebot comes around to check. If robots.txt hasn't changed in ten years, and the site hardly gets any traffic, then googlebot might not show up very often. This might be useful information
1:58 am on Jan 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:958
votes: 33


Don't worry about unblocking the site. Google isn't a giant game of "gotcha" with secret rules. It isn't an unheard-of situation for a website to have a complete block for an extended period and then later purposely become unblocked. But don't delete the robots.txt file. Change it to:

User-agent: *
Disallow:

so that you send an affirmative signal that you deliberately removed the blocking instruction and didn't just accidentally delete the robots.txt file. I'd follow this up by using the robots.txt Tester in the Crawl menu of GSC to have Google immediately fetch the updated file, and then use the Fetch As Google tool to fetch and submit the home page to the index.
2:12 am on Jan 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15812
votes: 848


If robots.txt hasn't changed in ten years, and the site hardly gets any traffic, then googlebot might not show up very often.

I checked logs for my test site, which is 100% roboted-out. On average, the Googlebot (and also the bingbot) continues to get robots.txt once every day or so. That's for a site that has existed in the same form for several years; I only looked at the past year's logs.

If you make a substantial change in robots.txt, it may take a day or so for the major search engines to notice--but when they do, you can expect a full top-to-bottom crawl almost immediately.
2:27 am on Jan 24, 2017 (gmt 0)

Junior Member from GB 

10+ Year Member Top Contributors Of The Month

joined:Oct 16, 2002
posts: 182
votes: 3


Thanks all for the advice.

@rainborick great plan, I'm going to follow that, thank you.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members