homepage Welcome to WebmasterWorld Guest from 54.197.65.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt
Does Google honour it....
Rusky

10+ Year Member



 
Msg#: 156 posted 2:43 pm on Jul 3, 2001 (gmt 0)

Hi All

I read in the past that Google did not play the game and ignored the robots.txt instructions is this still the case or has it started behaving ?

 

starec

10+ Year Member



 
Msg#: 156 posted 3:21 pm on Jul 3, 2001 (gmt 0)

Yes, Googlebot does understand and follows instructions of robots.txt. I don't know anything about its past behavior re robots.txt

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 156 posted 3:24 pm on Jul 3, 2001 (gmt 0)

They've gotten better this year. After you put a robots ban on something, expect 90-120days for it to be removed from the Google system. They don't remove it right away. Nor will it stop them from spidering the pages banned by the robots.

Rusky

10+ Year Member



 
Msg#: 156 posted 3:28 pm on Jul 3, 2001 (gmt 0)

Thanks.

What about the first time that you expose a site to its spidering Brett, I know that you have had some problems in the past with sites that are already indexed.

optimizing123

10+ Year Member



 
Msg#: 156 posted 3:02 am on Jul 4, 2001 (gmt 0)

It slipped up on my site about 2 months ago even though the robots.txt was in place and obeyed by other SE's. So I think it is still unreliable.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 156 posted 8:35 am on Jul 4, 2001 (gmt 0)

That way doesn't appear to bad rusky. It was in the next cycle.

Rusky

10+ Year Member



 
Msg#: 156 posted 8:42 am on Jul 4, 2001 (gmt 0)

OK, thanks for the info everybody

Son_House

10+ Year Member



 
Msg#: 156 posted 12:57 am on Jul 5, 2001 (gmt 0)

> Nor will it stop them from spidering the pages banned by the robots.

Brett, any idea why they still spider banned pages? I would think it would be a waste of time and bandwidth for them. I recently added a number of pages to the banned list and was hoping they would not spider them anymore. Well, as long as they don't index them, that's what counts.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 156 posted 1:32 am on Jul 5, 2001 (gmt 0)

Data mining.

2_much

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 156 posted 2:25 am on Jul 5, 2001 (gmt 0)

I've never seen this happen. I thought banned sites stop getting spidered. Any site that we have that has been spidered is either in the index or added in the next update.

I had assumed that not getting spidered is the only indication that a site is banned.

Son_House, how do you know that the pages are banned?

The other issue I've seen is that pages that have no inbound links aren't listed in the directory, but are still spidered. As soon as they get inbound links, then they're added to the database.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 156 posted 3:15 am on Jul 5, 2001 (gmt 0)

We were meaning pages excluded by Robots.txt 2_much. (not banned really, just blocked from indexing).

On another note, it is interesting to block a nonexistent directory with a robots.txt and watch some spiders try to spider it. You can really spot who isn't playing fair that way.

Son_House

10+ Year Member



 
Msg#: 156 posted 5:58 am on Jul 5, 2001 (gmt 0)

Thanks for the info Brett.

2_much, like Brett said, we were talking about pages added to the robots.txt that we don't want added to the index. I'm sorry I was not clearer on that. I need some rest :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved