Welcome to WebmasterWorld Guest from 34.204.173.36

Forum Moderators: open

Message Too Old, No Replies

HTML for guiding the search engine bots

     
7:33 pm on Jul 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 23, 2005
posts:132
votes: 0


Hello there HTML pro's. I think this must be the first time I am posting in this area of WW. I hope that someone here can help! I have started to learn HTML however my skills and that of my designer and programmer are not enough in finding a solution for the following issue.

I have learned lots on WW unfortunately a lot of that was only after putting a rather large and complex site on the web and now a year and a half later its sometimes a bit tough to make changes which should have been inplace before launching the site.

Ok, here it is. Our site is too heavy on graphics and we have started to resize and take of filesize where we can but its stil no way enough. I have rather heavy top of the page esp banner and some other images as well as type of sidebar on top of the template with many links. our left side bar is also rather full of links as well as images. Site looks great but the search engine bots aint loving it too much. When I oo throuh the logs and track the bots i see that especially google leaves the page again after only being at the top of the template probably due to the many links and file size it encounters. Problem is therefor that they never reach the body text with all the keywords and relevant content

I have read on this site about 2 options. One is with some trick with CSS where the left side bar floats to the right. My guys looked at that and said that the way my site is build this cant be implemented without rebuilding basically the whole site. From my side of looking at the matter, the top bar with all the links etc would still be inbetween the head section and the body text so this would still be a problem, right?

Another option i read about on some seo blog was that of using the follow and no folow robot.txt The way this would be done would be very much the same ways the blogs do it as they all have the same or similar sidebars and headers so they have created a system that will take the bot straight from the meta tags to the body text completey ignoring the sidebar and banners etc. Does this work and if yes, does the amount of code that the bot is being "guided" past before reaching the body text have any influence on them ranking your site?

I am sorry for the lengty post as I am not very technologically gifted and need a few more words then the average person on WW to explain myself. I hope you dont mind and hope someone can guide me into the right direction. Thanks..:-) Jim.

8:44 pm on July 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Here are some thoughts and questions your post brings up for me.

When I oo throuh the logs and track the bots i see that especially google leaves the page again after only being at the top of the template...

This is not clear to me. Do you mean that the logs are showing only a partial delivery for the page.html file? An error, in other words? Or is each page a frameset and the content frame is never requested?

If the log shows googlebot requested the page and got a resposne of "200 OK", then I'm not sure how you can tell it was "only at the top of the template".

However, I will agree that too large a file is not a good thing. But search engine bots will only grab the html file, and not usually request the images at the same time. How big is the file size of the html alone?

You often can use css positioning (position:absolute) to place the content at the top of the html, even though it displays under a header and to the right. This approach used to give a nice boost, but not so much these days.

9:57 pm on July 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 23, 2005
posts:132
votes: 0


Hi Tedster,

Thanks for the reply. The forum members on here are just so helpull. No other site competes!

Maybe I misunderstand what the log analyzer offers me in stats. I know for example that my average page is lets say 40k. The HTML file on my main page is around 60k and I have other pages that even reach 120k. The tool that I use shows me how users and bots move througout my site and show how much of the file they downloaded of a certain page. What I saw was that the google bot left after 3k. Problem is that my banner alone is 10+k. Another thing that I notice is that although I have many pages listed in google, they only crawl a small number of pages over a certain period when compated with Y and MSN. I have asked my guys about the CSS and they said it would be a nightmare to do. I am not a designer or programmer so its hard sometimes to know what or how to listen to. I do know that the best thing i ever did was start learning about HTML and some other technical matters as I have found plenty of places where we can improve.

Anyway back to the issue at hand. The option of using the follow / no follow robot.txt, I know we can use it to keep bots out of directories and pages of your site but can it be used for letting the bots see only what you want them to see on a page and therefor in this instance route the bot straight from the metatags / head to the body text? It sounds like an easy fix but I dont know if it would work or not. any experience with this?

10:21 pm on July 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


my banner alone is 10+k

But that's not part of the html file -- it's a separate file that is sourced by the html. The size of a header image is definitely NOT a problem for googlebot's spidering of the html file.

... google bot left after 3k

How big is the html file -- just the html itself, not adding in any other files that are used to display the visible page. I will grant you that 3k sounds too small. Since I don't know how your analytics works, I can't say for sure. It might be grabbing a "partial download" from the error file.

Google...they only crawl a small number of pages over a certain period

Yes, there is a new Google crawling pattern. It seems to be based on click paths from the Home Page, or click paths from an inbound link to an internal page. As a general rule, the higher the PR, the more frequent the spidering. How frequently a page changes, the "type" of site involved (news, information, e-commerce, etc) also plays in to a degree.

You mentioned images. If the site is using images for the main menu, you may get a lot of help by switching to text links with css style rules.

follow / no follow robot.txt...can it be used for letting the bots see only what you want them to see on a page

No, it cannot. The only thing I know of that is something like what you've described is for Google's mediabot when crawling an AdSense page to determine its topic, but that is special code just for Adsense pages, not standard code for basic spiders.

Nevertheless, your impulse to trim down the file size, whether html or images, is a good thing. In my experience, many web designers are focused on appearance but not file size. I've had many clients bring me templates with images that can be reduced by 75% and show NO visible degradation. Or using javascript rollover images when simple css hover rules and text links would save relatively huge amounts of file size and look very spiffy,

I would also encourage you to validate your html. Search engines may be hitting an error that they cannot recover from, even though browsers can handle it.

W3C Validator - HTML [validator.w3.org]
W3C Validator - CSS [jigsaw.w3.org]

If your team has trouble with the messages that the W3C validator gives them, then it's time for them to learn more about the tools and languages they use to earn their pay.

10:34 pm on July 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 23, 2005
posts:132
votes: 0


Thanks again Tedster!

Firstly about the validation tool. I have ran that tool and found plenty of errors on the page. A year or so ago I knew little about webdesign or programming and the guys involved only said that it showing irrelevant errors. One of the companies their worked for has been number 1 for many years for extremely competive keywords and when i did the test on their domain it had close to 220 errors on one page alone! i do realize now that them being number 1 is for a variety of reasons and they have been in the market for 10 years almost so I should NOT pay attention to them. When i do the test i get plenty of errors also but except for 2 or 3, all errors are the same error: required attribute "ALT" not specified Its not the alt tag of pictures as they are labeled. They did seem to have used gif files as part of the design of the template if I am not mistaken which would be all these errors with this message: required attribute "ALT" not specified Is there anything I can do about that. Should I give each a tag name although its only colors and decoration?

When I look at my site i look at the property of the header and its a jpeg file of in fact nearly 30K. When i look at the properties of the template then it tells me HTM file of nearly 60K.

11:20 pm on July 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


You are correct -- <img> tags with missing alt attributes will not cause a spidering problem. However, do fix any other errors. And using the alt attribute in images (even if it's alt="") is the right move for any visitors who come with a screen reader, or with images turned off. Alt attributes can also be a help to a search engine algorithm

Yes, 60 kb is a huge page, in my opinion. I'm guessing here, but rather than mostly text content, there's a whole bunch of <script> and/or <style> on the page -- possibly created automatically (and wastefully) by a wysiwyg editor such as Dreamweaver. If so, then move as much of that as possible to external .js and/or .css files.

At this point, I think you need to get someone working hands on with your actual templates and pages. Let your team know your exact concerns about spidering, and don't take "no" for an answer. Also, 60kb of html plus many more kb of images is bound to be a slow download for any visitor on a dial-up connection. That can't be helping your business either.