Forum Moderators: open
Now, lets get into some of the more detailed and more difficult things that I as a spider might have problems with.
Those are just a few things that I would look for if I were a Spider. For those of you who are in the design/marketing industry online (web designers, search engine marketing consultants, etc.), I would also flag any sites where a link back to the design source appears on all pages of a site. I believe there are some current discussions about this right now with the Google Florida update.
Additions to the list would be appreciated. As the topic progresses, we'll refine the above list and hopefully come up with a valid list of things to look out for when designing web pages.
[edited by: pageoneresults at 1:49 am (utc) on Dec. 27, 2003]
CSS; Anything with a value of display:none would be flagged. This also would require a manual review at this time due to insufficient technology to determine a problem
Is this also an issue with google, alltheweb etc?
I have separate css include with media="print" for printing which hides things like right column news and left column menus.
Do you mean a url like www.paper-clip-solutions.com would be fine but www.paper-clip-solutuion-now.com would be bad.
If I were a Spider, yes, the third hyphen in the URI would raise a flag. Now keep in mind that this scenario is not real, I am just trying to come up with a list of things that might trip a filter with the various search engine spiders that are traversing the net.
I'd say that over 90% of hyphenated URIs are SEO'd. When you get into 3+ hyphens, most are over-optimized and many of the things I've mentioned above are prevalent.
[edited by: pageoneresults at 8:27 pm (utc) on Dec. 26, 2003]
I didn't quite understand your question, but hey, tis the day after Xmas and I'm enjoying some free time to stir up the pot!
if i were an engine that used links as criteria for ranking, i would think that a new site or an old site who historically had 6 links pointing to it would merit a flag if it suddenly jumped (within a month) to thousands of links pointing to it.
- Keywords in META keywords tag that don't appear on the page
I have several technical students working for me on content building. They generally take an existing page and do "save as" for a new page. They very often forget to change the title, description and/or the keywords. I would say the majority of webcontent builders have little idea about meta tags.
- Excessive repetition of keywords and keyword phrases in a short period. For example, a keyword phrase repeated 4+ times in a short paragraph and it gets flagged
When doing technical description lists or product comparisons, this type of repetition can easily occur innocently. (they do not always occur in a tabular form)
- Lists of keyword phrases that appear toward the end of the html. If that list exceeds a certain character count, it gets flagged
I've seen many scientific publications end with a list of keyword/phrases under which the document should be listed/categorised.
------------------------------------------------------------
Ideally when the spider trips on too many of the above filters, the algo ignores any "beneficial" effects.
Keywords in META keywords tag that don't appear on the page.I have several technical students working for me on content building. They generally take an existing page and do "save as" for a new page. They very often forget to change the title, description and/or the keywords. I would say the majority of webcontent builders have little idea about meta tags.
If I were a Spider and found instances of duplicate metadata on 2 or more pages, that particular metadata is ignored. No downgrade other than not having that metadata content indexible.
Excessive repetition of keywords and keyword phrases in a short period. For example, a keyword phrase repeated 4+ times in a short paragraph and it gets flagged.When doing technical description lists or product comparisons, this type of repetition can easily occur innocently. (they do not always occur in a tabular form)
This is a tough one. If I find that there is excessive repetition in other areas/elements, I'm going to take that into consideration and flag it accordingly.
Lists of keyword phrases that appear toward the end of the html. If that list exceeds a certain character count, it gets flagged.I've seen many scientific publications end with a list of keyword/phrases under which the document should be listed/categorised.
If I were a Spider, I've got to take into consideration all surrounding elements. I'm not too certain if I'm smart enough to determine that the list of keywords and keywords phrases at the bottom of the page is of scientific nature. If it falls within my parameters for flagging, it gets flagged.
Great closing...
Ideally when the spider trips on too many of the above filters, the algo ignores any "beneficial" effects.
If the proportion of anchor text is near to/or greater than plain text AND the anchor text is similar, AND the links have a query string OR they all link to within the same host. Reason: those pages that contain a navbar full of overture search terms, with pages to match the anchor text, or they're possibly affiliate codes.
bol, although i understand the intention, wouldn't this flag all directories too? Even dmoz, yahoo ... They all have more anchor text than plain text or at least a pretty close proportion at their category pages.
Title followed by description
<title>Big Bob's Widget Web Site</title>
<meta name="description" content="Big Bob's Widget Web Site offers how-to articles and reviews...">
Head and text separated by a graphic
<h1>Big Bob's Widget Web Site</h1>
¦---Large Graphic alt=""---¦
<p>Big Bob's Widget Web Site operates from the heartland of America's widget country...</p>
Head or other text followed by a graphic
<h1>Big Bob's Widget Web Site</h1>
¦---Linked logo with alt="Big Bob's Widget Web Site Home"---¦
_________________________________
To a spider each of these would read as:
Big Bob's Widget Web Site Big Bob's Widget Web Site...
(Just used sim spider to check and I'm actually changing an occurence similar to the last example on each page of one site as I write this.)
I skipped right over the word 'keyword' in the context it was intended and went right on to comment about commas between 'words' in a Descriptive META situation.
Ex:
<"I, have, a, site, about, widgets. We, have, blue, widgets, black, widgets, tall, widgets, sell, more, widgets, than, any, other, site, on, the, internet, v, mc, ae, all, welcome, over, night, over-night, shipping, around, the, universe, sorry, no, c., o., d, best, viewed, while, breathing, f., o., b., detroit.">
Maybe the delineation lies somewhere in e-commerce?
I've not seen it in academics.
Pendanticist.