homepage Welcome to WebmasterWorld Guest from 54.226.136.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
After using robots.txt none of the bots are crawling my site! Help
robots.txt is not working properly
shamims



 
Msg#: 4088929 posted 5:32 pm on Feb 28, 2010 (gmt 0)

I have created a blog and using Wordpress on there. Just 1 page was crawled and I changed my theme. But I don't know why the robots.txt file was changed automatically. It looks like:

User-agent: *
Disallow: /


However i changed my robots.txt file as:
User-agent: *
Disallow:

It is the correct format of allowing all of the robots I think.

But the sad things is that......though i have added lots of pages and posts on my blog; none of the posts are indexing. It is worth-mentioning that i have changed this just 1 day before.

So my question is that how long will the Search Engines take time to acquainted of my changed robots.txt?
Is there any wrong else?

Help.

 

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4088929 posted 8:49 pm on Feb 28, 2010 (gmt 0)

Disallow means not allowing, blocking
* means everybody

so basically you are telling every bot NOT to access pages on your site.

If you want every bot to crawl your site just leave your robots.txt file empty.

shamims



 
Msg#: 4088929 posted 3:35 am on Mar 1, 2010 (gmt 0)

@ Staffa. Thanks for your quick response.

Suppose I changed my robots.txt file according to your tips. Now when the Google or other search engines will be notified that I have changed robots file ( as my site is new and just 1 page has been indexed )?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 10:24 am on Mar 1, 2010 (gmt 0)

so basically you are telling every bot NOT to access pages on your site.

not true!
the robots exclusion protocol uses the most specific rule and matches the pattern from left to right.
a blank pattern for Disallow means "disallow NOTHING".

from "The Web Robots Pages" [robotstxt.org] of the "official" REP site:
To allow all robots complete access

User-agent: *
Disallow:


(or just create an empty "/robots.txt" file, or don't use one at all)


you will find similar information on SE help pages regarding robots.txt syntax.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 12:26 pm on Mar 1, 2010 (gmt 0)

I would go with a whitelist of allowed bots, disallowing all others... Along the lines of:

# Whitelisted user-agents are allowed

User-agent: googlebot
Disallow: /cgi-bin

User-agent: msnbot
Disallow: /cgi-bin

User-agent: teoma
Disallow: /cgi-bin

User-agent: slurp
Disallow: /cgi-bin

User-agent: atomz
Disallow: /cgi-bin

# Disallow all others
User-agent: *
Disallow: /

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 9:33 am on Mar 3, 2010 (gmt 0)

I would go with a whitelist of allowed bots, disallowing all others...


So you allow very few robots? What about all the other 'good bots'? - or do they not count anything towards site promotion?

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 10:00 am on Mar 3, 2010 (gmt 0)

That is correct. I allow very few bots as most of the others do little toward site promotion. It also allows me to keep my last 13 hairs since I am not tempted to pull them out when things do not go well. :)

(and teoma is on a short list of things to get rid of...)

But that's just me!

Obviously one can whitelist what they like. The above is my list... and one of them is for an onsite search box that keeps the visitor on my site...

jameswsparker

5+ Year Member



 
Msg#: 4088929 posted 7:49 am on Mar 18, 2010 (gmt 0)

If you want Google to start indexing your site again, then add it into:

[google.co.uk...]

It'll normally start crawling it within the next few days, and like always takes up to a month to process it in it's search results.

martinibuster

WebmasterWorld Administrator martinibuster us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 9:12 am on Mar 18, 2010 (gmt 0)

phranque is correct. Re-read the original post. The OP states they changed the Robots.txt to the version that allows bots to access the site after they initially displayed the wrong version. But that's not what is under discussion, despite what the title of this discussion shows, which is the source of the confusion. Here is what the discussion is about:

So my question is that how long will the Search Engines take time to acquainted of my changed robots.txt?


This is why it's important to accurately describe what the discussion is about. ;)

As for the answer, keep building links. Bots revisit a site according to how many links you have and how worth crawling they are. Robots.txt will not slow down how often a bot visitors your site. So once a bot follows a link to find your site they will see the correct robots.txt and proceed to index your site.

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4088929 posted 12:11 pm on Mar 18, 2010 (gmt 0)

Google and the other search engines strongly prefer to crawl based on links and not url submissions. I would not spend any time submitting a url to a search engine. Even if a mistake has been made you will be automatically crawled by the search engines as long as you have links pointing to the content.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved