Forum Moderators: open
The problem is what you are talking about is content that might represent a few thousand static pages that, because of dynamic creation, ends up with ridiculous numbers and Google's index gets clogged
A recent observation I made that may support your theory:
Session IDs Freshbot & Deepbot tend to favor [webmasterworld.com]
Now somewhere on Webmasterworld.com, I read this as 1.5 month.
What I feel is that it's almost unknown. Can GoogleGuy commend on this? It will really help the SEO Consultants if we know what is happending.
We just keep giiving vague ideas to our clients. Anybody there who thinks the same?
I seem to think that there is something new coming, but i think that every month?
The idea behind the site was that with 11,000 pages of fairly unique content, we'd blow the competition away.
Are you saying that this strategy might be dead? So far we've had about 20 static pages showing temporarily in Google and the site went live 2 weeks ago.
Even if G just picks up my 500 static pages I'll be very happy. Someone said elsewhere that it 3-4 months to have all their dynamic pages spidered so I've passed that nugget on to the client.
Its hard to give good advice right now 'cos as you say - Only God & GG (& maybe not even GG!) know :)
(This way the spider avoids looping on the page)
Granted that advice was only for SEs in general as opposed to just Deepbot. Is this advice also good for Deep?
I'm using a cheap windows host for this so no mod_rewrite. Theres the q.asp component but is there an easy way of achieving the same thing if you don't have IIS/server access? EG - asp code I can run on my dynamic template page of .com/product.asp?id=123 to turn it into ../product/123.html
TIA
J
216.239.45.4 - - [05/Jun/2003:23:19:54 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.239.45.4 - - [05/Jun/2003:23:19:56 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.239.45.4 - - [05/Jun/2003:23:20:00 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.239.45.4 - - [05/Jun/2003:23:20:00 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
And the bad news. She didn't stay very long.
It's a 2nd level page. A PR0 casualty, probably a real PR4.
You're right that it was the same page (URL replaced with XXXX as a forum courtesy)
I looked for the deepcrawler ip range with site search, but couldn't find it.
Been so long since I've seen anything starting with 216.239.
...did I jump the gun with undue excitement? :)
If it's not a deepie, then what is it?
216.239.45.4 - - [03/Apr/2003:11:18:13 -0800] "GET /XXXXXX HTTP/1.1" 200 22811 "http://www.corp.google.com/cgi-bin/asolovyova/feeds.cgi?id=FROOGLEID&user=dscotton&ticket=1943517&prompt=true&submit=Submit" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130"
It is not the googlebot, and I don't care what the user-agent says. For over a year this IP has been visiting my site about once a week, picking up tidbits that a bot never does: GIFs, a Java applet, etc. This is a live human. They even key characters into forms to do searches. If Google's robots were this smart, their index would be up to date by now.
If you're cloaking for Google based on the user-agent, you'd be better off reverse-resolving and cloaking for googlebot.com, but not for google.com.
Another one that behaves the same way is 216.239.44.189
All of the deep bot URLs I've ever seen have been 216.239.46.X, and I haven't seen it for weeks now.
And finally freshbot started to crawl pages I have up for about a year now :-)))
Backlinks are still the same as 3/4 year ago.
(It is a very, very specific site)
*bigfriendlyletters* DON'T PANIC */bigfriendlyletters* ;-)
73,
Captain / Austria