homepage Welcome to WebmasterWorld Guest from 23.23.22.200
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 120 message thread spans 4 pages: < < 120 ( 1 [2] 3 4 > >     
Deepbot is in the house....
I just seen deepbot on my server all
teeceo




msg:47528
 10:30 pm on Apr 15, 2003 (gmt 0)

Its deepbot time!

teeceo.

 

NexDog




msg:47558
 8:11 am on Apr 16, 2003 (gmt 0)

Can your rankings in the SERPs change after a deep crawl? I think not, but there was a shift after last month's deep crawl....

Roscoe




msg:47559
 8:32 am on Apr 16, 2003 (gmt 0)

Just woke up to find my site was down between 12.30am UTC and 8am UTC. (Still waiting for the excuse from the ISP)

No googlebot on my log files for the past two days....does this mean that I'm out of google next month?

swones




msg:47560
 9:02 am on Apr 16, 2003 (gmt 0)

Deep Crawl starting for me in the UK, Woohoo!

crawl2.googlebot.com - - [16/Apr/2003:00:22:57 +0100] "GET /robots.txt HTTP/1.0" 200 24 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)

Simon.

creative craig




msg:47561
 9:36 am on Apr 16, 2003 (gmt 0)

Glad I got those changes in last night now :)

Visited all of my sites, and took the top level pages this morning, just started to see it come back for more :)

Craig

Mercenary




msg:47562
 9:52 am on Apr 16, 2003 (gmt 0)

just wondering what stats programs you guys are all using?

im using webalizer and i think it updates at 12am every day.......

does your stats have realtime time stats so u can see if google is at your sites?

what do u reckon is the best stats proggy?

cheers,

Josh

NexDog




msg:47563
 10:28 am on Apr 16, 2003 (gmt 0)

Webalizer typically runs at 2am unless your host has set it differently.

MattJ




msg:47564
 10:48 am on Apr 16, 2003 (gmt 0)

For Webalizer, I usually download my log and process it locally. Good for seeing up to date info and also allows you to customize the results (e.g. see the first 500 referrers instead of 50).

Mercenary




msg:47565
 11:14 am on Apr 16, 2003 (gmt 0)

I didnt know u can do this?

how would u do it?

- josh

MattJ




msg:47566
 11:23 am on Apr 16, 2003 (gmt 0)

My hosting provider provides a link to the log file from the control panel. The log file is not visible from FTP for me so you may be out of luck if your provider doesn't give you a link.

But if you can get the log file, download Webalizer (or the log file analyzer of your choice) from [mrunix.net...] It is command line based so it is helpful to use a batch file which will allow you to drag the log file onto it. The batch file will contain a command like the following:

"c:\program files\webalizer\webalizer.exe" -o "c:\program files\webalizer\stats" -R 1000 %1

This puts the results into a stats subdirectory and gives you the top 1000 referrers.

These instructions are for Windows of course.

dunnthat




msg:47567
 12:13 pm on Apr 16, 2003 (gmt 0)

OK....so you can pick on me for being totally naive...but where in the webalizer reports would I look for googlebot?

Thanks and sorry for such a stupid question

geckofuel




msg:47568
 12:36 pm on Apr 16, 2003 (gmt 0)

dunnthat,
Webalyzer may not show you googlebot. Some domain GUIs (graphical user interfaces) provide a link to something like "Last 20 visitors" This is a nice way to monitor current traffic and to find googlebot. If you use CPanel, I can help you even more...just sticky me.

Even better though, What you need is access to your Raw Access Logs: Does you ISP provide these as a download? If you can get your Raw Access Logs then you can purchase a analysis program which will allow you to, for example, view all trips by Googlebot to your website.

Anyway, hopefully you've got one of two things:

1. Access to a "Last 20 Visitors" feature
2. Access to your Raw Access Logs

mvl22




msg:47569
 12:48 pm on Apr 16, 2003 (gmt 0)

How does one recognise the difference between deep and fresh - is this purely by how extensively it grabs pages, or by the DNS name of the crawler?

Alphawolf




msg:47570
 12:50 pm on Apr 16, 2003 (gmt 0)

Same here. Just checked logs and he showed up after midnight:

Company or ISP : Google Inc. US,CA
IP Range : 216.239.46.0 - 216.239.46.255
Total Visits : 22

AW

qball0213




msg:47571
 1:57 pm on Apr 16, 2003 (gmt 0)

Glad I stayed up late now, googlebot, Fast and scooter are all dancing around my sites.

Jesse_Smith




msg:47572
 2:01 pm on Apr 16, 2003 (gmt 0)

:::How does one recognise the difference between deep and fresh

freshBot: 64.68.82.* bah, listed for only a few days, short dinner date then it dumps you.
deepcrawler: 216.239.46.* Good bot, it likes you, listed until death do you depart. So don't make him mad or it will divorce your site. You can divorce the Googlebot by using your robot.txt file. It's much cheaper and faster than going to the courts.

ogletree




msg:47573
 2:53 pm on Apr 16, 2003 (gmt 0)

I use a program called web log expert. It has the ability to get logs via ftp,http. It can also send the report the same way plus it can email it to you. It is pretty cool. You can run it manualy on your machine or have it scheduled.

raymurphy




msg:47574
 3:21 pm on Apr 16, 2003 (gmt 0)

Slightly off-topic, but for a small (only 25 pages) site I've just developed, I've taken a "roll-your-own" approach to log file analysis - basically, on the site's home page, each visit to that page is written to an xml file. I've then got an admin asp page, which reads the xml file, and does an xsl transform to present it in the browser - two of the values in the xml file are User Agent and IP Address.

So, for Googlebot, I see an entry of
Googlebot/2.1 (+http://www.googlebot.com/bot.html) for User Agent and 216.239.46.185 for IP address. At the moment the xsl transform just shows ALL visits, but I hope to modify the transform to allow me to search for Googlebot/FAST/Jeeves etc ....

Just an alternative to relying on a hosting provider for access to the log files, but might be impractical as the number of visits to the page increase ...

Ray

Stefan




msg:47575
 5:21 pm on Apr 16, 2003 (gmt 0)

Mercenary, If you can download the raw log files, you can open them in Wordpad etc and search for 64.68 (freshie) or 216.239 (deepbot). Analog is a good free analyzer, although you have to edit the config file first.

EliteWeb




msg:47576
 7:29 pm on Apr 16, 2003 (gmt 0)

Googlebot is one bot I dont mind eating up all my pages. So far so good, on one of my forum sites it has taken 250 pages so far, brand new site -- last round freshbot picked up like 50 or so of them. I love this new site, so google should too!

askjoe




msg:47577
 8:19 pm on Apr 16, 2003 (gmt 0)

Does this mean that the deepcrawl didn't happen last month? I noticed a lot of pages I added weren't included in the index. Will the next update be much more thorough. Just curious - would like to know more.

Stefan




msg:47578
 12:56 am on Apr 17, 2003 (gmt 0)

Hey askjoe

The deepcrawl happened last month during the second week, (it crawled me on Mar 11-12). Perhaps those pages you added went up after the deepbot came through.

clarksc3




msg:47579
 2:29 am on Apr 17, 2003 (gmt 0)

So what is the E.T.A. on the next update? I know some of you guys have documented these lag times. I have a life so I do not have time for that kind of thing. :)

Oaf357




msg:47580
 3:14 am on Apr 17, 2003 (gmt 0)

WTH... I'm still seeing freshbot as soon as thirty minutes ago.

futureX




msg:47581
 4:10 am on Apr 17, 2003 (gmt 0)

yes! From the hits it looks like G has deepcrawled my main site and is currently hanging around my forums... A few thousand pages there to index... You think he will take it all up in one go? This is the first time G has seen this forum ready for indexing so will it be in this index or will it be flagged for the next one?

BGumble




msg:47582
 4:13 am on Apr 17, 2003 (gmt 0)

I think you'll get in the first round. Deepbot is pretty considerate on my forum archives and hangs around for quite a few days instead of hitting the servers too hard all at once.

futureX




msg:47583
 4:25 am on Apr 17, 2003 (gmt 0)

Inktomi has been hitting my forums pretty hard recently every fortnight, I know it has been hitting several different pages in the forums, but so far only one page has actually been included in the index. So I hope Google gives me a little more results :)

pardo




msg:47584
 9:53 pm on Apr 17, 2003 (gmt 0)

I think she's bringing her little brother to the work:

216.239.37.5

is hitting separate pages starting just a couple of hours ago.

Whois query shows that it's from the same ip-range as the 216.239.46.x crawl bot.

Any idea's?

RankOutsider




msg:47585
 10:12 pm on Apr 17, 2003 (gmt 0)

Deepbot came whilst our site was down for maintenance - will that be it for another month or will it re-appear in the next couple of days?

aristoteles




msg:47586
 10:32 pm on Apr 17, 2003 (gmt 0)

Hi,

I have a PR1 site (quite new site, I think PR will go up to 4 after one or two updates), and deepbot has visited my site also.

On the log-subject: I use analog for general purpose log file analysis. It crunches 100 mb log files within seconds (only a 1 Ghz machine). If you have access to a unix-machine, I would use analog.

For searching for googlebot, I wrote a shell-script that does this:

cat /var/log/apache/access.log ¦ grep googlebot > /home/myaccount/google.log
vi /home/myaccount/google.log

Grep filtered out every line that contains the term googlebot. I then "analyse" the file with vi. When vi opens the file, I immediately see the number of lines, which is the number of hits. Quite low-tech but amazingly effective and easy to implement. You can grep the result again to filter out freshbot or deepbot as you wish.

Bas

Stefan




msg:47587
 10:59 pm on Apr 17, 2003 (gmt 0)

So what is the E.T.A. on the next update? I know some of you guys have documented these lag times. I have a life so I do not have time for that kind of thing. :)

It gets a bit crazy here around update time, clarksc3. We've just been through it, things seem good, most people are happy...

Let's not even talk about the next update. :-)

Deepbot came whilst our site was down for maintenance - will that be it for another month or will it re-appear in the next couple of days?

RankOutsider, if your site was previously in the index then usually the deepbot will look for it again during the crawl. Keep your fingers crossed... it might be back before this session is finished.

ircgeeks




msg:47588
 11:41 pm on Apr 17, 2003 (gmt 0)

On my second deep crawl for this site and it’s at 1321 hits this time. still trucking Go deepbot go!

I have been seeing alot of Inktomi deep craw’s to that is a first for me. About 600 hit's so far on this crawl

This 120 message thread spans 4 pages: < < 120 ( 1 [2] 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved