Welcome to WebmasterWorld Guest from 54.226.27.104

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt

Does Google honour it....

     

Rusky

2:43 pm on Jul 3, 2001 (gmt 0)

10+ Year Member



Hi All

I read in the past that Google did not play the game and ignored the robots.txt instructions is this still the case or has it started behaving ?

starec

3:21 pm on Jul 3, 2001 (gmt 0)

10+ Year Member



Yes, Googlebot does understand and follows instructions of robots.txt. I don't know anything about its past behavior re robots.txt

Brett_Tabke

3:24 pm on Jul 3, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



They've gotten better this year. After you put a robots ban on something, expect 90-120days for it to be removed from the Google system. They don't remove it right away. Nor will it stop them from spidering the pages banned by the robots.

Rusky

3:28 pm on Jul 3, 2001 (gmt 0)

10+ Year Member



Thanks.

What about the first time that you expose a site to its spidering Brett, I know that you have had some problems in the past with sites that are already indexed.

optimizing123

3:02 am on Jul 4, 2001 (gmt 0)

10+ Year Member



It slipped up on my site about 2 months ago even though the robots.txt was in place and obeyed by other SE's. So I think it is still unreliable.

Brett_Tabke

8:35 am on Jul 4, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That way doesn't appear to bad rusky. It was in the next cycle.

Rusky

8:42 am on Jul 4, 2001 (gmt 0)

10+ Year Member



OK, thanks for the info everybody

Son_House

12:57 am on Jul 5, 2001 (gmt 0)

10+ Year Member



> Nor will it stop them from spidering the pages banned by the robots.

Brett, any idea why they still spider banned pages? I would think it would be a waste of time and bandwidth for them. I recently added a number of pages to the banned list and was hoping they would not spider them anymore. Well, as long as they don't index them, that's what counts.

Brett_Tabke

1:32 am on Jul 5, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Data mining.

2_much

2:25 am on Jul 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've never seen this happen. I thought banned sites stop getting spidered. Any site that we have that has been spidered is either in the index or added in the next update.

I had assumed that not getting spidered is the only indication that a site is banned.

Son_House, how do you know that the pages are banned?

The other issue I've seen is that pages that have no inbound links aren't listed in the directory, but are still spidered. As soon as they get inbound links, then they're added to the database.

Brett_Tabke

3:15 am on Jul 5, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



We were meaning pages excluded by Robots.txt 2_much. (not banned really, just blocked from indexing).

On another note, it is interesting to block a nonexistent directory with a robots.txt and watch some spiders try to spider it. You can really spot who isn't playing fair that way.

Son_House

5:58 am on Jul 5, 2001 (gmt 0)

10+ Year Member



Thanks for the info Brett.

2_much, like Brett said, we were talking about pages added to the robots.txt that we don't want added to the index. I'm sorry I was not clearer on that. I need some rest :)

 

Featured Threads

Hot Threads This Week

Hot Threads This Month