Forum Moderators: open
For example:
xyzcompany.com/support/default.htm vs.
support.xyzcompany.com
If a spider comes in via the second url, all relative links are broken since the spider completely ignores the base tag. Is there a work around (other than setting absolute references)? Note: When page is viewed by a human user, all links resolve properly because base tag is acknowledged by the user's browser.
Also wondering if a page has a significant number of broken images would a spider still crawl the page? Or, would this red flag causing the spider to ignore the page and/or de-list it.
I can't say as I notice a problem with relative addressing in the logs. Are you sure you don't have some other obscure construct on the page that may be causing it?
It is only if it enters through the other, direct url(i.e. support.xyz.com) that the problems start. Even then, it can crawl links that are to pages located within the same subdirectory (in this case that is "/support" - the relative link is simply filename.htm). However, it will break anything that is not within the same directory (i.e. ../images/file.gif)
It appends the filepath to the url it came in at (suppport.xyz.com/images/file.gif) instead of what is specified in the base tag... and then of coarse, the file does not exist.
Are your
xyzcompany.com/support/default.htm
And
support.xyzcompany.com,
Pages mirrors of each other? That would be a problem.
As to broken links, I feel the cleaner the content, code and links, the smoother the spidering.
The site is not mirrored - the urls point to exactly the same page, on the same server. This was setup for client conveneience, but over the years made its way into some engines. I now have a problem of being listed as one or the other in various engines and do not want to compromise the indexes if at all possible.
We recently revised the site. The old version had a very simply menu structure and all linked pages (on any given page) were contained in the same directory. To move to a new 'section' you had to return to the homepage which was the only required absolute link.
The new site however mixed all of this up. So links are mostly to other subdirectories on the server - hence the new problem.
Also, I definately don't want to have to set the images as absolutes. I realize the spider will slow as a result of the broken links... but is this really a big deal?
What I'd really like to know is if a spider would penalize the page (or site) by leaving, or worse, dropping it from the existing index, if it detects a significant number of broken image links on a page.
BTW I'm really impressed with this forum (and site) - what a fantastic resource. Thanks to everyone for sharing!
>>…Seems my only option is to set text links as absolutes. How much of a performance hit will this cause? There's about 40 links…>>
Why do you think you’d have problem with setting the text links as an absolute? Is it the loading time? Does anyone else have a solution for this or a comment? I’m not sure if I’m reading the problem correctly because I’m not seeing why creating absolute links is a problem.
>>…The site is not mirrored - the urls point to exactly the same page, on the same server…>>
See, this is where I would make them different. Create an actual canonical for your
support.xyzcompany.com
page. Place in it unique content, different from
xyzcompany.com/support/default.htm
and then link them to each other. Make use of two pages instead of one. That’s what I would do because of the linking and theme strength that you can gain from that. I know that’s not what you are asking. I’m just the type that wants to run with every opportunity. It looks as if the indexing in the engines you speak of have handed you an interesting opportunity.
>>…To move to a new 'section' you had to return to the homepage, which was the only required absolute link…>>
In my experience there is nothing wrong with using absolute links. When I link pages I consider the impact of that link more than if I can use a base href or not. Less internal linking isn’t necessarily better. I think the choices you make in linking two pages together and the future impact of those decisions is more important. Again, I don’t think that’s what you are asking.
>>…The new site however mixed all of this up. So links are mostly to other subdirectories on the server - hence the new problem…>>
Use your absolute links and be very specific about where you want your link to go. If it’s the time it takes to clean up the links that is worrying you, then Dreamweaver is a big help for editing large search and replace or use this as an opportunity to go in and clean up pages, fix code, add alt tags and h2 headings, rearrange content. Help your new site by cleaning up your old site, since it sounds like you have to get in and straighten up the links anyway.
>>…Also, I definitely don't want to have to set the images as absolutes. I realize the spider will slow as a result of the broken links... but is this really a big deal?..>
I think it matters, yes. I look at every page as if a human editor will be. I put each page through those standards. All links must work, including images. I suggest creating an image file in your canonical or just slipping the images in the directory you need them rather than leaving them broken. Attach your alt tags to help them work for you more and add a link to them even for the fullest impact. I use few images on a page but the images I do use, I make work for me in more than design.
>>…What I'd really like to know is if a spider would penalize the page (or site) by leaving, or worse, dropping it from the existing index, if it detects a significant number of broken image links on a page…>>
That’s a very good question and I wonder. Perhaps the competition for the keywords spaces would affect this. If I were an engine and could set standards, I suppose this would be my first. Give me a clean page with clean code and working links before you give me anything else.
Again, I may have missed the essence of your problem and if these are not the answers you are looking for then let us know so someone else can jump in here and help you out. I add my welcome to the forums, pixie, have fun.