starec

msg:1527156 | 3:21 pm on Jul 3, 2001 (gmt 0) |
Yes, Googlebot does understand and follows instructions of robots.txt. I don't know anything about its past behavior re robots.txt
|
Brett_Tabke

msg:1527157 | 3:24 pm on Jul 3, 2001 (gmt 0) |
They've gotten better this year. After you put a robots ban on something, expect 90-120days for it to be removed from the Google system. They don't remove it right away. Nor will it stop them from spidering the pages banned by the robots.
|
Rusky

msg:1527158 | 3:28 pm on Jul 3, 2001 (gmt 0) |
Thanks. What about the first time that you expose a site to its spidering Brett, I know that you have had some problems in the past with sites that are already indexed.
|
optimizing123

msg:1527159 | 3:02 am on Jul 4, 2001 (gmt 0) |
It slipped up on my site about 2 months ago even though the robots.txt was in place and obeyed by other SE's. So I think it is still unreliable.
|
Brett_Tabke

msg:1527160 | 8:35 am on Jul 4, 2001 (gmt 0) |
That way doesn't appear to bad rusky. It was in the next cycle.
|
Rusky

msg:1527161 | 8:42 am on Jul 4, 2001 (gmt 0) |
OK, thanks for the info everybody
|
Son_House

msg:1527162 | 12:57 am on Jul 5, 2001 (gmt 0) |
> Nor will it stop them from spidering the pages banned by the robots. Brett, any idea why they still spider banned pages? I would think it would be a waste of time and bandwidth for them. I recently added a number of pages to the banned list and was hoping they would not spider them anymore. Well, as long as they don't index them, that's what counts.
|
Brett_Tabke

msg:1527163 | 1:32 am on Jul 5, 2001 (gmt 0) |
Data mining.
|
2_much

msg:1527164 | 2:25 am on Jul 5, 2001 (gmt 0) |
I've never seen this happen. I thought banned sites stop getting spidered. Any site that we have that has been spidered is either in the index or added in the next update. I had assumed that not getting spidered is the only indication that a site is banned. Son_House, how do you know that the pages are banned? The other issue I've seen is that pages that have no inbound links aren't listed in the directory, but are still spidered. As soon as they get inbound links, then they're added to the database.
|
Brett_Tabke

msg:1527165 | 3:15 am on Jul 5, 2001 (gmt 0) |
We were meaning pages excluded by Robots.txt 2_much. (not banned really, just blocked from indexing). On another note, it is interesting to block a nonexistent directory with a robots.txt and watch some spiders try to spider it. You can really spot who isn't playing fair that way.
|
Son_House

msg:1527166 | 5:58 am on Jul 5, 2001 (gmt 0) |
Thanks for the info Brett. 2_much, like Brett said, we were talking about pages added to the robots.txt that we don't want added to the index. I'm sorry I was not clearer on that. I need some rest :)
|
|