Forum Moderators: open

Message Too Old, No Replies

Inktomi used 1 GB of my bandwidth this month!

Inktomi bot constantly on my site

         

HostingDirectory

4:38 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



The last two months Inktomi's bot has been using a lot of bandwidth from my site. This month ( my weblog last update on the 20th) the bot was already used almost 1 GIGABITE of bandwidth.

Has anyone else had this type of problem? What could be causing it?

trillianjedi

4:44 pm on Aug 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How big is your site?

Are you serving Ink with a session ID?

TJ

Lynn_Terry

6:08 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



I havent checked my logs but have noticed them on my forum almost every day this past week... also got an invoice from my host from over-usuage (LOL - better go check my logs!).

I implemented a mod on my forum to remove the session ID when googlebot visits - I'm going to add Ink to that mod!

Thanks for bringing it up ;)

kpaul

6:43 pm on Aug 21, 2003 (gmt 0)

10+ Year Member



Yes, I was hit hard for a while. It's interesting that they're grabbing about 10x more pages than Google and yet Goog is where I get most of my traffic.

I changed my robots to keep them away from a section with Session Ids and they seem to be behaving better now.

Looks like they're also getting better at dynamic URLs from what I can tell.

my two bits,
kpaul

HostingDirectory

12:09 pm on Aug 22, 2003 (gmt 0)

10+ Year Member



My site isnt that big, i would say about 300 pages.
The bot seems to be scanned the forums more than anything.
I am using vBulletin.

Can someone explain to me about session id's? What are they and how can they help me fix this problem?

Also below i have cut and paste some of my logs to show you the inktomi bot and other bots and how much bandwidth they have used in the last 21 days.

Inktomi Slurp
Hits - 37101
Bandwidth - 1016.33 MB
Date - 21 Aug 2003 - 14:14

Unknown robot (identified by 'crawl')
Hits - 2130
Bandwidth - 23.12 MB
Date - 21 Aug 2003 - 11:15

Googlebot (Google)
Hits - 1497
Bandwidth - 84.86 MB 21
Date - Aug 2003 - 03:17

Scooter (AltaVista)
Hits - 875
Bandwidth - 39.79 MB 21
Date - Aug 2003 - 12:50

WISENutbot (Looksmart)
Hits - 469
Bandwidth - 11.90 MB
Date - 21 Aug 2003 - 13:35

Jeeves
Hits - 396
Bandwidth - 21.19 MB
Date - 21 Aug 2003 - 12:36

Alexa (IA Archiver)
Hits - 372
Bandwidth - 15.96 MB
Date - 20 Aug 2003 - 08:21

Fast-Webcrawler (AllTheWeb)
Hits - 120
Bandwidth - 7.12 MB
Date - 16 Aug 2003 - 21:27

Total bandwidth for all of the top crawling bots on my site = 1220.27 in the last 21 days.
Thats over 1 GB

trillianjedi

12:12 pm on Aug 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<Edit just read your post properly lol</edit>

Session ID's are used like cookies to store your users details like if they're logged in to the forum etc. It produces URL's to the server like:-

www.forum.com/message12345&SID=1234354376767656

Each time a bot visits (same as a human) it gets a new "session" started and a new session ID (SID).

So the bot thinks it's a new link as the URL's are unique (thanks to the SID) and ends up taking 1,000's of hits on the same URL.

Does that make sense or should I explain further?

TJ

HostingDirectory

4:13 pm on Aug 22, 2003 (gmt 0)

10+ Year Member



It makes sense but how can i stop it while using vbulletin?
If i use a hack to make the pages search engine friendly will that work?

trillianjedi

4:50 pm on Aug 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It makes sense but how can i stop it while using vbulletin?

Search for "vbulletin" and "SID" and "search engine".

There's a bit of info around for it, but to be honest I don't think there's an easy fix other than don't use SID's. That means users must have cookies enabled to work the site properly.

I assume WebmasterWorld uses cookies only?

TJ

Visi

2:24 am on Aug 31, 2003 (gmt 0)

10+ Year Member



A followup to the above posts....good old slurp is out hammering this month. Lo and behold it's lost its database again, hitting obsolete pages that haven't been on the server for months. We have seen this before around Feb but its found its way the last few months. Now out of the blue, back to an old database? Not sure what's going on but really is a bandwidth hog this month for us too.