Once hijacked/penalized site now getting many deep crawls

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Once hijacked/penalized site now getting many deep crawls

How long til titles/descriptions appear?

crobb305

9:08 pm on Apr 30, 2005 (gmt 0)

Well, my once penalized/hijacked website has started getting hit hard by Gbot in the past week and a half. In 10 days, I have deep crawls (100% of pages spidered) on 4 seperate days, including last night. Prior to my reinclusion request 2 weeks ago, only robots and index were spidered for months.

My pages are still listed as url only, so does anyone know what the deep crawls indicate, and when title/desc may appear in serps?

Thanks!

AlexK

10:15 pm on May 8, 2005 (gmt 0)

walkman:
Slurp indexed them all...a day after letting the spiders in. I was shocked
Annoyed at not having proper stats ... copied the logs across ... installed AWStats. Now I'm shocked ... first report is for 4+ days (1-6 May) ... Search bots account for 30% of the hits and 23% of the bandwidth:

It gets worse. Bots from Abovenet Communications are crawling all over my site; they fill the top-10 Hosts. Even though they are clearly bots, AWStats reports them as (human) visitors (1+ months of reports):

64.124.85.79.become.com - 656/656 - 12.05 MB - 05 May 2005 - 13:35
64.124.85.76.become.com - 616/616 - 11.05 MB - 05 May 2005 - 13:43
64.124.85.78.become.com - 525/525 - 9.57 MB - 05 May 2005 - 11:39
64.124.85.206.become.com - 518/518 - 9.40 MB - 05 May 2005 - 12:18
64.124.85.207.become.com - 513/513 - 10.31 MB - 05 May 2005 - 12:14
64.124.85.205.become.com - 421/421 - 7.25 MB - 05 May 2005 - 13:28
64.124.85.77.become.com - 417/417 - 6.94 MB - 05 May 2005 - 13:40
64.124.85.202.become.com - 380/380 - 8.74 MB - 05 May 2005 - 13:08
64.124.85.204.become.com - 355/355 - 7.12 MB - 05 May 2005 - 13:31
64.124.85.203.become.com - 336/336 - 5.74 MB - 05 May 2005 - 10:26

There are actually 17 different IPs. Abovenet has 2 x Class-B IP ranges (64.124.0.0/15) and a fair proportion of it was devoted to hammering my site on May 4/5. All perfectly legal, of course. How much of a benefit this was to my site is a mute point.

It has taken quite some time, but the scale of bot-scraping current at this moment is beginning to sink in.

walkman

11:17 pm on May 8, 2005 (gmt 0)

"It has taken quite some time, but the scale of bot-scraping current at this moment is beginning to sink in."
some j-rk with Wget almost crashed my server last night. I have about 50 sitemaps (500+ links each) that take about 5 seconds and a lot of CPU time, and he kept pulling them about 10-15 times a second. The same file was pulled over 30 times by the time I stopped it by banning his IP. My CPU was at 99% constantly...

On Google's front: I'm back in in cache, but still can't find anything by searching (not even the longest phrase, or www.name.com) but I'll give it a few more days. The cache was from last night so it may need a day or two to spread. All the pages linked from the homepage were indexed, so I just added a random feature to pull 10-15 randon pages each time. Hopefully in a week, everything will be indexed.

theBear

12:32 am on May 9, 2005 (gmt 0)

AlexK,

Assuming the site in question is the one in your profile.

Do you also have the same content on another domain like .co.uk?

I went searching for some pages from your site and some of them actually appear in the serps but are marked supplemental (possibly duplicate content?).

It however looks like Google is slowly doing something with your pages.

You might also want to look at the various pages produced by the search script you are using, it looks like title issues.

I've got to actually hit the correct keys.

AlexK

3:59 am on May 9, 2005 (gmt 0)

walkman:

some j-rk with Wget ... (50 sitemaps) kept pulling them about 10-15 times a second ... pulled over 30 times by the time I stopped it by banning his IP. My CPU was at 99% constantly...

Look at this bot-blocking thread [webmasterworld.com] (msg 25). Also keep your eye on the thread, as I will be uploading a revised algorithm soon-ish.

AlexK

4:21 am on May 9, 2005 (gmt 0)

theBear:

Assuming the site in question is the one in your profile

It is.

Do you also have the same content on another domain like .co.uk?

Same content, no. Another domain, yes. However, html-only site, no dupe issues.

look at the various pages produced by the search script you are using, it looks like title issues.

That is very useful advice, thank you (never thought of the obvious).

Now, if you can only work out why so many of the site:mysite.com searches are url-only you will be my saviour! No title issues there (erm, I think).

BTW: isn't The Bear the creature that classicly gets 'a sore head'? Is that why you chose your nick?! I am getting towards your (assumed) age, but no genetic gift of arthritis for me yet. Heart attacks and a wayward imagination - yes. Arthritis - no.

Reid

9:40 am on May 9, 2005 (gmt 0)

AlexK

I took a look at your site. It looks to me like all is well.

6. Where is my page's title?
Unlike many search engines, Googlebot can return results for pages that are known but haven't been crawled yet. Since we haven't looked at those pages yet, their titles aren't shown; the Google results page displays the URL instead.
[google.com...]
Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.
[google.com...]
It looks to me like you have changed server location (or something) and it is just taking time to index it.

AlexK

11:09 am on May 9, 2005 (gmt 0)

Reid:

It looks to me like you have changed server location (or something) and it is just taking time to index it.

.com site has been in it's current format since 10 Nov 2002 with the same domain name throughout. Google took 3,035 pages last month (25 MB), and has crawled it virtually every day since conception. If they haven't got every page by now they never will.

I took a look at your site. It looks to me like all is well.

Well, I may agree with you (and thanks for looking), but a site:mysite.com Google search shows just 13 of the first 100 results with title + snippet. The other 87 are url-only. It also knows of 12,700 pages but only shows 908. Visitors from G have fallen by ~90% since last Dec. So, G thinks different.

AlexK

11:28 am on May 9, 2005 (gmt 0)

It gets worse. Bots from Abovenet Communications are crawling all over my site

Continuing the theme of heavy bot-crawls, how about the following (check out the timings):

fgrep "GET /robots.txt" /var/log/httpd/* � less
...
207.241.230.217 - - [18/Apr/2005:14:24:28 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.220 - - [18/Apr/2005:14:24:30 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.230.228 - - [18/Apr/2005:14:24:32 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.196 - - [18/Apr/2005:14:25:10 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.205 - - [18/Apr/2005:14:25:20 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.230.152 - - [18/Apr/2005:14:25:20 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.222 - - [18/Apr/2005:14:25:22 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.201 - - [18/Apr/2005:14:25:22 +0100] "GET /robots.txt HTTP/1.0" 301 326 "-" "ia_archiver-web.archive.org"
207.241.228.201 - - [18/Apr/2005:14:25:22 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.229.193 - - [18/Apr/2005:14:25:23 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.231.242 - - [18/Apr/2005:14:25:24 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.225 - - [18/Apr/2005:14:25:25 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.210 - - [18/Apr/2005:14:25:26 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.216 - - [18/Apr/2005:14:25:27 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.208 - - [18/Apr/2005:14:25:28 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.197 - - [18/Apr/2005:14:25:30 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.227 - - [18/Apr/2005:14:25:30 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.209 - - [18/Apr/2005:14:25:33 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.135 - - [18/Apr/2005:14:25:33 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.228 - - [18/Apr/2005:14:25:35 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.213 - - [18/Apr/2005:14:25:35 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.134 - - [18/Apr/2005:14:25:39 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
84.104.217.38 - - [18/Apr/2005:14:25:43 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "appie 1.1 (www.walhello.com)"
207.241.229.160 - - [18/Apr/2005:14:26:15 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.229.151 - - [18/Apr/2005:14:26:17 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.211 - - [18/Apr/2005:14:26:19 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.231.241 - - [18/Apr/2005:14:26:21 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.207 - - [18/Apr/2005:14:26:26 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.226 - - [18/Apr/2005:14:26:26 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.231 - - [18/Apr/2005:14:26:28 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.230.219 - - [18/Apr/2005:14:26:29 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.193 - - [18/Apr/2005:14:26:32 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.214 - - [18/Apr/2005:14:26:32 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.230.149 - - [18/Apr/2005:14:26:34 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.194 - - [18/Apr/2005:14:26:36 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.231.240 - - [18/Apr/2005:14:26:37 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"
207.241.228.202 - - [18/Apr/2005:14:26:38 +0100] "GET /robots.txt HTTP/1.0" 200 23 "-" "ia_archiver-web.archive.org"

I believe that British soccer-players call this "roasting".

theBear

1:14 pm on May 9, 2005 (gmt 0)

AlexK,

About the internet archive bot.

Send them an email showing the information from your log and politely ask them to look into the matter.

Explain to them that your next step if the bot isn't controlled will be a ban of the bot, but you are willing to allow them some time to make a change on their end first.

I've had very good luck with them. They listen and act when presented with hard evidence.

Now I sent you a sticky back a bit.

If I have further information I may be able to provide more help.

But I'll need specifics that don't work well when examplefied for the board.

AlexK

3:26 pm on May 9, 2005 (gmt 0)

theBear:

Now I sent you a sticky back a bit.

Stickies arn't working. I reported it yesterday.

'Recent Posts' arn't working either (I was looking at someone else's posts - not sure who).

Reid

4:17 am on May 11, 2005 (gmt 0)

AlexK I rechecked your site again.
I didn;t know the history last time so it seemed like it was just taking it's time.

There is something funny but I'm not sure what it is.
When I use 'poodle predictor' on your site it seems to render some strange results.

type in w*w.yoursite.com
and it seems to have numerous links to itself.
I don't know whats causing it but this could be the root of your problems.
This could explain multiple requests for robots.txt and maybe non-listings in google.

I don't know the cause - but it sure is producing strange results.

AlexK

12:48 am on May 12, 2005 (gmt 0)

Reid:

When I use 'poodle predictor' on your site it seems to render some strange results.

Many thanks for the reference - first time I have seen this.

I certainly agree that the Home page appears to have many links to itself (looks very strange on the `Predictor' page). Easily explained: they are internal links

<a href='#link' title='A link on this page'>

Very standard, and surely not a reason to be penalised?

Reid

1:04 am on May 12, 2005 (gmt 0)

duh - i should have known

i don't see anything wrong with having several links like that, it does look strange on the poodle tool though.
If each one of those 'counts as a link' it may cause some confusion though.

Oh well - hope you enjoy the tool

AlexK

3:26 am on May 12, 2005 (gmt 0)

Reid:

duh - i should have known

I welcome your input and criticism - if something is wrong with the coding on my script-based site then I desperately want to find (then fix) it. My concern is that the G-boffins have made a similar arsehole-level mistake with *their* coding, but no-one will ever find it. Just look at the to-do with the 302-redirects to understand what I'm talking about.

In the meantime, virtually all of the tens of thousands of pages on my site are URL-only.

theBear

3:28 am on May 12, 2005 (gmt 0)

AlexK,

Is your sticky mail still down?

AlexK

7:36 am on May 12, 2005 (gmt 0)

theBear:

Is your sticky mail still down?

I have received one email notification (from Walkman's sticky) that I *had* a sticky (if you are reading this, Walkman, I sent a sticky back to you) but no sticky in the box. In fact, no mails of any kind in any box :(. Apart from that, nothing.

Send me one to test. I *really* would like to know.

[Thinks: best to test myself also, so I will try to send one to *you*, Mr Bear.]

theBear

1:01 am on May 13, 2005 (gmt 0)

You have sticky mail and regular mail.

Let me know what else you think we might want to look at.

But that should help further run down the situation.

This looking at problems that first surfaced months ago can be very slow going.

AlexK

2:26 am on May 13, 2005 (gmt 0)

Thanks, Mr Bear - Stickies are now working for me.

crobb305

9:21 pm on May 13, 2005 (gmt 0)

No changes for me yet. I started this thread discussing how my once hijacked/banned site suddenly started getting deep crawled the day after my reincl request. This spider was Mozilla deep bot, and only index page and ONE of my internal pages have returned with full title/desc. All others url only.

So, I don't know what to think. I see Gbot much more frequently and much deeper than I ever did before the reincl request. But weeks and weeks go by and no changes to the stale serps.

zeus

10:44 pm on May 13, 2005 (gmt 0)

Im just thinking loud here : what if we redirect 301 from the old domain to a new domain and the just copy the frontpage or the whole site to the new domain, would that get any possitiv effect?

And is it possible to change the 301 if it dossent work, because something must happen now, it can not be we have to create simular sites just because google can not fix this.

Reid

4:23 am on May 14, 2005 (gmt 0)

crob305 did you look at your site with poodle predictor?

If you sticky me your URL I could take a look at it.

You know google does have half-indexed pages
"we know the page exists but we havn't crawled it yet" thats what url only means.

zeus

9:10 am on May 14, 2005 (gmt 0)

You are right about URL only, but there has NEVER been so many URL like the last 6-8 month.

Another thing I also did a 301 from non www to www. about 2 month ago still no no change when I type my /domain.com

Reid

9:15 am on May 14, 2005 (gmt 0)

zeus - what do you get when you try site:www yourdomain how old is the cache , are there any duplicate listings, is the whole site indexed properly?

zeus

9:49 am on May 14, 2005 (gmt 0)

Reid I have removed cache, but last time I checked it was firts in may and the site only got about 10% of the pages indexed.

I was hijacked once, where another domain.com cam up when I made www.mydomain.com in google and the worst thing is that site had a meta tag that said noindex. It all happen 3 nov. from 38.000 unique visits to 0 from google

zeus

10:27 am on May 14, 2005 (gmt 0)

Another thing, It think its wierd that sometime I see in the logs a hit from google with the old keyword and placement, but when I go to the site the site is gone, why is that happening and what can we make of this.

crobb305

9:42 pm on May 14, 2005 (gmt 0)

I see in the logs a hit from google with the old keyword and placement

I see this happening with my site sometimes too. So, I go to my raw stats log and extract the exact search url which shows the serps page on which the search phrase was found. I then go to that results page for the search query and see a 302. The traffic coming to my site from Google are through these 302s.

Zeus, have you checked this out, to see if the visitors are entering your site via a redirect/302 url ranking on that phrase where your canonical url used to be?

zeus

10:30 pm on May 14, 2005 (gmt 0)

Crobb - when I come to the google site my site is gone and I dont see anything els then normal sites.

zeus

10:44 pm on May 14, 2005 (gmt 0)

Ofcause its a bad thing what happen to alot of site, but the thing is I would realy like to understand this situation, I just can not see any logic to this.

If a site is replaced in the serps by another site - thats hijacking, then its removed + the google bug sites.

Then the logic will tell be I would maybe not be in the serps for 1- max 2 month because of possible dublicated content which was spidered through the other googlebug(302) sites and hijacking sites, but nothing is happening after the fix, so whats wrong here? with time I realy think all this mess is still related to all those "sites" added mid last year, with that we saw alot of mess and webmasterworld was full of wierd questions and observations, which is still so, as are all the fake sites indexed.

AlexK

9:34 pm on May 15, 2005 (gmt 0)

crobb305:

I started this thread ... site suddenly started getting deep crawled ... only index page and ONE of my internal pages have returned with full title/desc. All others url only.

The foll may be interesting; it is a combination of

fgrep -c 'Googlebot' access_log*

and

ls -al access_log*

May 15 04:03 access_log.1 : 600
May 8 04:03 access_log.2 : 3143
May 1 04:03 access_log.3 : 568
Apr 24 04:03 access_log.4 : 713

This is an astonishing rise in G-hits between May 1-8, and is still unexplained. For the record, 87% of the first 100 site:mysite.com SERPS are url-only (also a change of just 1 page from before the G-explosion).

Comparative figures for the MSN-Bot are astonishing:

May 15 04:03 access_log.1 : 5324
May 8 04:03 access_log.2 : 6154
May 1 04:03 access_log.3 : 5490
Apr 24 04:03 access_log.4 : 6230

Reid:

You know google does have half-indexed pages "we know the page exists but we havn't crawled it yet" thats what url only means.

Plain misleading bull#*$!, I am afraid. Sorry to be so blunt, but I can prove it. This is the very first url-only result for my site (also the very first result in the site:mysite.com SERPs):

http://www.mysite.com/mfcs.php?nocompress=1&mid=30

Now consider these (edited) results:

egrep "GET /mfcs.php?(.*)&mid=30(.*)Googlebot" access_log*
66.249.64.37 - - [12/May/2005:06:55:51 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32175 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.66.137 - - [03/May/2005:20:05:36 +0100] "GET /mfcs.php?nocompress=1&mid=30 HTTP/1.1" 200 53900 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.28 - - [28/Apr/2005:07:16:22 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32342 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.71.32 - - [17/Apr/2005:07:49:38 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32110 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

AlexK

9:15 am on May 17, 2005 (gmt 0)

Reid:

"we know the page exists but we havn't crawled it yet" thats what url only means.

This statement has annoyed the living daylights out of me for some months now. With 87 of the first 100 SERPs url-only for my site, yet in operation since 2001, it is clearly wrong. My site is moving host right at this moment so, whilst I cannot do anything else, I sought to set this mis-direction to rights.

17 of the first 20 SERPs for a site:mysite.com are url-only. Using a variation of the following I found that:

not

The foll is from just 4+ weeks of logs.

egrep "GET /mfcs.php\?nocompress=1&mid=30 (.*)Googlebot" access_log*
66.249.66.137 - - [03/May/2005:20:05:36 +0100] "GET /mfcs.php?nocompress=1&mid=30 HTTP/1.1" 200 53900 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Here are full--though edited to remove repetition--results:

URL-only:
01#1: 66.249.66.137 - - [03/May/2005:20:05:36 +0100] HTTP/1.1 200 53900 "-" "Mozilla/5.0
02#1: 66.249.66.137 - - [03/May/2005:09:06:14 +0100] HTTP/1.1 200 7021 "-" "Mozilla/5.0
03#1: 66.249.66.137 - - [03/May/2005:21:34:04 +0100] HTTP/1.1 200 6929 "-" "Mozilla/5.0
04#1: [no hit]
05#1: 66.249.66.137 - - [03/May/2005:13:43:55 +0100] HTTP/1.1 200 7408 "-" "Mozilla/5.0
.
06#1: 66.249.64.79 - - [26/Apr/2005:13:10:23 +0100] HTTP/1.0 200 32584 "-" "Googlebot/2.1
06#2: 66.249.65.80 - - [29/Apr/2005:17:41:23 +0100] HTTP/1.1 200 8188 "-" "Mozilla/5.0
.
08#1: 66.249.66.137 - - [03/May/2005:22:55:22 +0100] HTTP/1.1 200 6226 "-" "Mozilla/5.0
09#1: 66.249.66.137 - - [03/May/2005:23:42:48 +0100] HTTP/1.1 200 7750 "-" "Mozilla/5.0
10#1: [no hit]
11#1: 66.249.66.137 - - [03/May/2005:13:40:59 +0100] HTTP/1.1 200 7644 "-" "Mozilla/5.0
12#1: 66.249.66.137 - - [03/May/2005:22:59:54 +0100] HTTP/1.1 200 7678 "-" "Mozilla/5.0
.
14#1: 66.249.66.73 - - [13/May/2005:04:11:40 +0100] HTTP/1.1 200 6458 "-" "Mozilla/5.0
14#2: 66.249.66.137 - - [04/May/2005:01:59:26 +0100] HTTP/1.1 200 6594 "-" "Mozilla/5.0
.
15#1: 66.249.66.137 - - [04/May/2005:04:01:49 +0100] HTTP/1.1 200 10504 "-" "Mozilla/5.0
16#1: 66.249.66.137 - - [03/May/2005:19:41:11 +0100] HTTP/1.1 200 6964 "-" "Mozilla/5.0
18#1: 66.249.66.137 - - [03/May/2005:19:41:11 +0100] HTTP/1.1 200 6964 "-" "Mozilla/5.0
19#1: 66.249.66.137 - - [03/May/2005:20:25:51 +0100] HTTP/1.1 200 6931 "-" "Mozilla/5.0
20#1: 66.249.66.137 - - [03/May/2005:21:39:45 +0100] HTTP/1.1 200 7764 "-" "Mozilla/5.0
.
Title + snippet:
07#1: 66.249.71.69 - - [15/May/2005:07:27:16 +0100] HTTP/1.0 200 43804 "-" "Googlebot/2.1
07#2: 66.249.71.73 - - [30/Apr/2005:23:03:22 +0100] HTTP/1.0 200 43630 "-" "Googlebot/2.1
07#3: 66.249.64.30 - - [20/Apr/2005:01:19:02 +0100] HTTP/1.0 200 43625 "-" "Googlebot/2.1
.
13#1: 66.249.71.28 - - [15/May/2005:00:14:51 +0100] HTTP/1.0 200 42648 "-" "Googlebot/2.1
13#2: 66.249.71.72 - - [30/Apr/2005:21:33:47 +0100] HTTP/1.0 200 44252 "-" "Googlebot/2.1
13#3: 66.249.64.66 - - [20/Apr/2005:00:01:04 +0100] HTTP/1.0 200 41401 "-" "Googlebot/2.1
.
17#1: 66.249.64.33 - - [15/May/2005:02:41:22 +0100] HTTP/1.0 200 27565 "-" "Googlebot/2.1
17#2: 66.249.71.18 - - [30/Apr/2005:21:53:21 +0100] HTTP/1.0 200 27547 "-" "Googlebot/2.1
17#3: 66.249.71.29 - - [20/Apr/2005:01:08:09 +0100] HTTP/1.0 200 27493 "-" "Googlebot/2.1
___________________________________________________________________

This 60 message thread spans 2 pages: 60