Forum Moderators: Robert Charlton & goodroi
The problem
Listed below are my site stats for spiders/robots
(This is just for the last 5 days)
Googlebot12192+16428.12 MB
MSNBot60+311.05 MB
Unknown robot (identified by 'spider')39+33626.64 KB
Inktomi Slurp25+23353.72 KB
Unknown robot (identified by hit on 'robots.txt')0+102.27 KB
Unknown robot (identified by 'crawl')3+164.23 KB
AskJeeves1+16.51 KB
as you can see Googlebot is eating up a lot of bandwidth, and i would like to reduce this
The Site
First of the site is ...
has a dynamically generated store
has phpBB on it
has Coppermine photo gallery
and web calendar
What i have done to fix the problem
To try and solve the problem
I have used my robots.txt to block access to all unneccesary parts of the site (including images and web calendar),
I have also put a seperate robots.txt file in the phpBB directory to cut access to everything apart from those files i want indexed (index.php, viewForum.php, viewtopic.php)
I have also done the hack to remove the session ID on the phpBB
I am at my wits end, i don't want to affect my ranking, but this bandwidth usage is really quite high accounting for about a third of the sites bandwidth on average, (although this month it is about 50%)
Things i have heard about but am too chicken to try
Contacting google
apparently you can contact google and they can dial down the crawling, but I have read that this can really adversly effect rankings
Setting a crawl delay
How does this work,
if I set it to a week does that mean googlebot will index the whole site and then dissappear for a week, or does it mean it will crawl one page then wait a week before crawling the next page? -- sorry this delay thing confuses me
Any help would be greatly appreciated
Cheers
Nigel
How long ago did you take this step? Is much of the spidering bandwidth still going to "sessionID" urls?
Thanks
I am still having googlebot problems
This is the latest stats
Googlebot --16938+25 --597.46 MB
MSNBot --122+61 --1.91 MB
Inktomi Slurp --60+53 --1.21 MB
AskJeeves --4+3 --140.58 KB (thats a record for jeeves on my site more than the previous 2 months put together :) )
I had thought the problem was the phpBB as that was where the majority of site googlebot hits were coming from but now I am not so sure, in the last 4 hours googlebot ate 50Mbs of bandwidth
Is there a way to work out what parts are responsible for the bandwidth
My alteration of the robots.txt file to exclude parts of phpBB is now being recognised and none of the bots are trying to index any of the excluded parts,
However session Ids are still appearing in a small portion of the googlebot results, will this gradually decrease over time?
When i look at my Webmaster tools (part of the google Gmail thingie), i have an option to reduce the frequency of indexing carried out by googlebot, will this effect my ranking if i do this?
You can't do that, either. You must merge the per-robot records from both files into one file.
Review this doc: [robotstxt.org...]
Read it conservatively: If it doesn't say you can do something, then you cannot do it.
Jim
No. See the bolded phrase "completely worthless" in this FAQ: [code.google.com...]
That tag was invented and used by one small directory about ten years ago, and its use spread like wildfire among Webmasters who believed they could tell robots when to come back. It's like telling the tax man how much you'll pay and when... :)
Jim