Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
I'm currently hosting two websites, one is a Lasso based e-commerce site. My hosting provider recently disabled my site entirely b/c his server was getting overloaded and it was affecting his other clients on the same box.
He found that a specific IP was hitting our site, he said that it "appeared" to be a bot. He then blocked this IP and restored our site but said if this continued he'd have to disable it again. He also implicated that it was my fault for not having a robots.txt implemented to stop this from happening.
I'm still relatiely new at web designing and don't have much experience with the robots.txt file, however, I've read the tutorial and it seems relatively straight forward. Even if I would have had a robots.txt file implemented it seems that this could have still happened, isn't that correct?
So some questions that I'm not sure of how to answer came to mind:
Are there adverse affects to leaving your entire site open to site bots? I've read the responses about protecting user data, etc., but if i don't have that kind of sensitive information to protect, what's the problem? Wouldn't it be better to let the bots see as much as possible for search engines?
What types of files would you always want to exclude from bot searches?
What files are useless to have them look at?
If a bot is pounding my site is it my responsibility to resolve this problem, or my ISP's? or both?
Thanks for any advice.
Welcome to WebmasterWorld [webmasterworld.com]!
This can be a pretty big subject area, and I'd suggest you read several of the posts in this forum and try a few site searches on this subject, but yes, it's your responsibility to control robots on your site. It's also a very good idea, both from the standpoint of preventing your hosting provider from having to disable your site and from the standpoint of presenting the content you feel most important to the search engines for listing in their indices.
An often-discussed problem is that script-based sites (as I presume lasso is) are problematic to search engines. If the URLs you present to them contain query strings and/or session IDs, then the URL-space of your site can look "infinite" to them. After spidering for a while (down a link path you aren't controlling), they simply give up and go elsewhere. So, it's very important to present them with "search-engine-spider-friendly" URLs, and to steer them away from your shopping cart pages - Do you really want a robot "clicking" on your "Buy Now" button?
After reviewing the posts in this forum and in others concerned with search engine spiders on dynamic shopping sites, you'll likely come up with some more specifically-focused questions, and there are lots of people here to help you out.
Enjoy your visits,