Forum Moderators: open

Message Too Old, No Replies

If I were a Spider...

What would trip my filter?

         

pageoneresults

6:54 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I were a Spider, what would trip my filter? After reviewing thousands of sites over the years, I've come up with a list of things that might trip my filter if I were a Spider. Some of those are...

  1. Multi-hyphenated domain name. Three or more hyphens and its flagged.

  2. Comma separated list of keywords and keyword phrases in title. More than a certain number of characters and its flagged.

  3. Comma separated list of keywords in META description tag.

  4. Keywords in META keywords tag that don't appear on the page.

  5. Obscure third party metadata that contains comma separated lists of keywords and keyword phrases. For example, the Dublin Core Abstract Tag.

  6. HTML comments tag with keywords and keyword phrases.

  7. <h> tags wrapping entire paragraphs. I'd have a character limit assigned and anything over that limit would be flagged.

  8. Alt text that exceeds 80 characters.

  9. Alt text that reads like a comma separated META keywords tag.

  10. Lists of keyword phrases that appear toward the end of the html. If that list exceeds a certain character count, it gets flagged.

  11. Excessive repetition of keywords and keyword phrases in a short period. For example, a keyword phrase repeated 4+ times in a short paragraph and it gets flagged.

  12. Links that appear on pages that lead to other properties that are not on topic.

Now, lets get into some of the more detailed and more difficult things that I as a spider might have problems with.

  1. CSS; Anything with a negative value causing elements to be positioned off the page would get flagged.

  2. CSS; Anything with a value of hidden would get flagged. This includes many of the menu functions out there that use this value. Unfortunately, this particular flag would require a manual review because my technology is not sufficient enough to determine a problem.

  3. CSS; Anything with a value of display:none would be flagged. This also would require a manual review at this time due to insufficient technology to determine a problem.

  4. Text links without underlines whose color is the same, or close to, the text that surrounds it.

Those are just a few things that I would look for if I were a Spider. For those of you who are in the design/marketing industry online (web designers, search engine marketing consultants, etc.), I would also flag any sites where a link back to the design source appears on all pages of a site. I believe there are some current discussions about this right now with the Google Florida update.

Additions to the list would be appreciated. As the topic progresses, we'll refine the above list and hopefully come up with a valid list of things to look out for when designing web pages.

  1. If the proportion of anchor text is near to/or greater than plain text AND the anchor text is similar, AND the links have a query string OR they all link to within the same host. Reason: those pages that contain a navbar full of overture search terms, with pages to match the anchor text, or they're possibly affiliate codes.
    [brotherhood of lan]

  2. If i were an engine that used links as criteria for ranking, i would think that a new site or an old site who historically had 6 links pointing to it would merit a flag if it suddenly jumped (within a month) to thousands of links pointing to it.
    [bakedjake - msg #:11]

  3. Besides excessive repetition I would add "successive repetition" that's easy to slip in without noticing. It's sometimes natural when writing, sometimes disguised by grapics or table layout.
    [jimbeetle - msg #:16]

[edited by: pageoneresults at 1:49 am (utc) on Dec. 27, 2003]

brotherhood of LAN

7:00 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



/ edited

[edited by: brotherhood_of_LAN at 7:28 pm (utc) on Dec. 26, 2003]

humpo

7:08 pm on Dec 26, 2003 (gmt 0)

10+ Year Member



CSS; Anything with a value of display:none would be flagged. This also would require a manual review at this time due to insufficient technology to determine a problem

Is this also an issue with google, alltheweb etc?

I have separate css include with media="print" for printing which hides things like right column news and left column menus.

pageoneresults

7:12 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is this also an issue with google, alltheweb etc?

I personally don't think so. If I were a Spider, I'd be programmed to ignore print style sheets as I know they would contain CSS attributes that would be flagged in normal mode.

chuq_2001

7:39 pm on Dec 26, 2003 (gmt 0)

10+ Year Member



You mentioned that three or more hyphens might trip a spider. Do you mean a url like www.paper-clip-solutions.com would be fine but www.paper-clip-solutuion-now.com would be bad. Also what is the overall effect of any hyphend in the url?

Thanks

pendanticist

7:57 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> 3. Comma separated list of keywords in META description tag.

Are you looking at this 'pre-emptively' as in before folks use this methodolgy to enhance placement/rank? I don't think I've ever seen a site without comma seperations in the META description tags.

P,e,n,d,a,n,t,i,c,i,s,t.

pageoneresults

8:18 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



chuq_2001, Welcome to WebmasterWorld!

Do you mean a url like www.paper-clip-solutions.com would be fine but www.paper-clip-solutuion-now.com would be bad.

If I were a Spider, yes, the third hyphen in the URI would raise a flag. Now keep in mind that this scenario is not real, I am just trying to come up with a list of things that might trip a filter with the various search engine spiders that are traversing the net.

I'd say that over 90% of hyphenated URIs are SEO'd. When you get into 3+ hyphens, most are over-optimized and many of the things I've mentioned above are prevalent.

[edited by: pageoneresults at 8:27 pm (utc) on Dec. 26, 2003]

rogerd

8:20 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



chuq_2001, welcome to WebmasterWorld. Not to speak for pageoneresults, but I believe this is a hypothetical list of possible things spiders could look for - not a list of certain risk factors. The specific issue of hyphenated domains has been discussed often, with some concluding that a large number of hyphens is frequently an indicator of either spam or SEO (either of which could be a risk factor). The Florida update seemed to hit hyphenated domains in some areas, although these sites may well have had other risk factors.

pageoneresults

8:21 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



pendanticist, I've seen many META description tags that look like a META keywords tag. From my perspective, a description is a meaningful 25-30 word descriptive of the content on that page, not a comma separated list of words.

I didn't quite understand your question, but hey, tis the day after Xmas and I'm enjoying some free time to stir up the pot!

rogerd

8:36 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Oops, pageoneresults snuck in ahead of me... Looks like we agree, though. :)

bakedjake

8:55 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



pageone: this isn't strictly spider based, but it's related.

if i were an engine that used links as criteria for ranking, i would think that a new site or an old site who historically had 6 links pointing to it would merit a flag if it suddenly jumped (within a month) to thousands of links pointing to it.

vitaplease

9:19 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree in general on the list as a preliminary flag raiser. It should be a first sign to check if others also raise flags. IMO too many innocent pages would get hit with holding to these "filter trippers" stand alone. However, trip on too many and...

- Keywords in META keywords tag that don't appear on the page

I have several technical students working for me on content building. They generally take an existing page and do "save as" for a new page. They very often forget to change the title, description and/or the keywords. I would say the majority of webcontent builders have little idea about meta tags.

- Excessive repetition of keywords and keyword phrases in a short period. For example, a keyword phrase repeated 4+ times in a short paragraph and it gets flagged

When doing technical description lists or product comparisons, this type of repetition can easily occur innocently. (they do not always occur in a tabular form)

- Lists of keyword phrases that appear toward the end of the html. If that list exceeds a certain character count, it gets flagged

I've seen many scientific publications end with a list of keyword/phrases under which the document should be listed/categorised.

------------------------------------------------------------
Ideally when the spider trips on too many of the above filters, the algo ignores any "beneficial" effects.

pageoneresults

10:09 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good points vitaplease.

Keywords in META keywords tag that don't appear on the page.

I have several technical students working for me on content building. They generally take an existing page and do "save as" for a new page. They very often forget to change the title, description and/or the keywords. I would say the majority of webcontent builders have little idea about meta tags.

If I were a Spider and found instances of duplicate metadata on 2 or more pages, that particular metadata is ignored. No downgrade other than not having that metadata content indexible.

Excessive repetition of keywords and keyword phrases in a short period. For example, a keyword phrase repeated 4+ times in a short paragraph and it gets flagged.

When doing technical description lists or product comparisons, this type of repetition can easily occur innocently. (they do not always occur in a tabular form)

This is a tough one. If I find that there is excessive repetition in other areas/elements, I'm going to take that into consideration and flag it accordingly.

Lists of keyword phrases that appear toward the end of the html. If that list exceeds a certain character count, it gets flagged.

I've seen many scientific publications end with a list of keyword/phrases under which the document should be listed/categorised.

If I were a Spider, I've got to take into consideration all surrounding elements. I'm not too certain if I'm smart enough to determine that the list of keywords and keywords phrases at the bottom of the page is of scientific nature. If it falls within my parameters for flagging, it gets flagged.

Great closing...

Ideally when the spider trips on too many of the above filters, the algo ignores any "beneficial" effects.

Yidaki

10:40 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the proportion of anchor text is near to/or greater than plain text AND the anchor text is similar, AND the links have a query string OR they all link to within the same host. Reason: those pages that contain a navbar full of overture search terms, with pages to match the anchor text, or they're possibly affiliate codes.

bol, although i understand the intention, wouldn't this flag all directories too? Even dmoz, yahoo ... They all have more anchor text than plain text or at least a pretty close proportion at their category pages.

brotherhood of LAN

10:43 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



yidaki, i deleted that ;)

I'm playing with a home made parser here. I noticed yahoo has more anchor text than plain text, but no, I doubt dir pages would trigger it. I'll sticky you a few examples of statistics of "OK" pages and suspect pages if you want.

jimbeetle

10:49 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Besides excessive repetition I would add "successive repetition" that's easy to slip in without noticing. It's sometimes natural when writing, sometimes disguised by grapics or table layout.

Title followed by description
<title>Big Bob's Widget Web Site</title>
<meta name="description" content="Big Bob's Widget Web Site offers how-to articles and reviews...">

Head and text separated by a graphic

<h1>Big Bob's Widget Web Site</h1>

¦---Large Graphic alt=""---¦

<p>Big Bob's Widget Web Site operates from the heartland of America's widget country...</p>

Head or other text followed by a graphic

<h1>Big Bob's Widget Web Site</h1>

¦---Linked logo with alt="Big Bob's Widget Web Site Home"---¦

_________________________________

To a spider each of these would read as:

Big Bob's Widget Web Site Big Bob's Widget Web Site...

(Just used sim spider to check and I'm actually changing an occurence similar to the last example on each page of one site as I write this.)

pendanticist

11:46 pm on Dec 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now I get it. Duh!

I skipped right over the word 'keyword' in the context it was intended and went right on to comment about commas between 'words' in a Descriptive META situation.

Ex:
<"I, have, a, site, about, widgets. We, have, blue, widgets, black, widgets, tall, widgets, sell, more, widgets, than, any, other, site, on, the, internet, v, mc, ae, all, welcome, over, night, over-night, shipping, around, the, universe, sorry, no, c., o., d, best, viewed, while, breathing, f., o., b., detroit.">

Maybe the delineation lies somewhere in e-commerce?

I've not seen it in academics.

Pendanticist.

pendanticist

3:22 pm on Dec 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Where'd you go? Hope you didn't stop on my account...

Yidaki

7:49 pm on Dec 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I'll sticky you a few examples of statistics of "OK" pages and suspect pages if you want.

Yes, please. Maybe you're interest in running your parser over my own directories ...? Sticky me.