Welcome to WebmasterWorld Guest from 107.20.54.98

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt

Does Google honour it....

     
2:43 pm on Jul 3, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 30, 2001
posts:191
votes: 0


Hi All

I read in the past that Google did not play the game and ignored the robots.txt instructions is this still the case or has it started behaving ?

3:21 pm on July 3, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 17, 2001
posts:409
votes: 0


Yes, Googlebot does understand and follows instructions of robots.txt. I don't know anything about its past behavior re robots.txt
3:24 pm on July 3, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


They've gotten better this year. After you put a robots ban on something, expect 90-120days for it to be removed from the Google system. They don't remove it right away. Nor will it stop them from spidering the pages banned by the robots.
3:28 pm on July 3, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 30, 2001
posts:191
votes: 0


Thanks.

What about the first time that you expose a site to its spidering Brett, I know that you have had some problems in the past with sites that are already indexed.

optimizing123

3:02 am on July 4, 2001 (gmt 0)

Inactive Member
Account Expired

 
 


It slipped up on my site about 2 months ago even though the robots.txt was in place and obeyed by other SE's. So I think it is still unreliable.
8:35 am on July 4, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


That way doesn't appear to bad rusky. It was in the next cycle.
8:42 am on July 4, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 30, 2001
posts:191
votes: 0


OK, thanks for the info everybody
12:57 am on July 5, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 1, 2001
posts:183
votes: 0


> Nor will it stop them from spidering the pages banned by the robots.

Brett, any idea why they still spider banned pages? I would think it would be a waste of time and bandwidth for them. I recently added a number of pages to the banned list and was hoping they would not spider them anymore. Well, as long as they don't index them, that's what counts.

1:32 am on July 5, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


Data mining.
2:25 am on July 5, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 12, 2000
posts:1808
votes: 0


I've never seen this happen. I thought banned sites stop getting spidered. Any site that we have that has been spidered is either in the index or added in the next update.

I had assumed that not getting spidered is the only indication that a site is banned.

Son_House, how do you know that the pages are banned?

The other issue I've seen is that pages that have no inbound links aren't listed in the directory, but are still spidered. As soon as they get inbound links, then they're added to the database.

3:15 am on July 5, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


We were meaning pages excluded by Robots.txt 2_much. (not banned really, just blocked from indexing).

On another note, it is interesting to block a nonexistent directory with a robots.txt and watch some spiders try to spider it. You can really spot who isn't playing fair that way.

5:58 am on July 5, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 1, 2001
posts:183
votes: 0


Thanks for the info Brett.

2_much, like Brett said, we were talking about pages added to the robots.txt that we don't want added to the index. I'm sorry I was not clearer on that. I need some rest :)