Forum Moderators: mack

Message Too Old, No Replies

Someone HELP me before Google spiders me!

Test Spider not following links, already submitted to EVERYONE!

         

kstprod

6:07 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



Someone please help me out here... I am a somewhat intermediate webmaster, but I have done something seriously wrong in my html code, on my whole site. I used SEW's Spider test and its not spidering my Meta Keywords OR ANY of my links. I have already been hit by a couple of spiders, but not biggies (Thank God). I have already submitted to ALL of the SE and Indexes, and I need someone to help me figure out what I have done, BEFORE Googlebot, or any other biggie hits me. What I did in creating my pages is just copy from the 1st one, and add content for the 2nd, etc. My guess is that every single page on my site is all screwed up now.

Just in case it's needed, I'm <bleeep>

Please someone save my butt....lol

Thanks, Karen

[edited by: rcjordan at 6:10 pm (utc) on Oct. 24, 2002]
[edit reason] no references to your site, please [/edit]

rcjordan

6:13 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld.

Start by running your page(s) through the validator
[validator.w3.org...]

oilman

6:18 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



remove the Doctype line and then try it again. The validator is spitting it out as an error.

Powdork

6:20 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



are you using js dropdown menus for navigation. heard tell gbot can read these if the http: is present even if the <href is not. I can not verify this however, just what I heard. The SEW spider test may not read them. But I'm sure someone here knows.

roscoepico

6:25 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



I just ran her site through Xenu and it appeared to follow all her links, so I'm not sure if this is something to worry about. As far as the other errors, I haven't the slighest

Michael

oilman

6:29 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the W3C validator is lighting up all over the JS stuff in there. The regular links are fine so assuming that Gbot ignores the JS it should find the links just fine.

kstprod

6:38 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



Ok, thanks guys..... I'm changing my doctype thing and adding a meta tag for content type. I hope this helps.... I sure don't want to wait forever to be spidered and then have it all screwed up. In reference to the JS, will this effect page ranking? Should I maybe try and get rid of some of it?

Thanks again for the quick response.... I think I went into a full fledged panic attack when I saw that my links werent spidering....lol

Karen

mediaman

6:38 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



Is not having the doc type listed first make any difference in the site being crawled successfully?

(Fatal Error: no document type declaration; will parse without validation)

This is what I have for doc type: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Can anyone tell me if this will be a problem for the spiders?

Thanks in advance.

kstprod

6:51 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



Thanks again guys for the immediate responses!

I did the following on tbindex3.html, rather than just plain index, so I wouldn't confuse myself....

Ok, now I added the correct stuff at the top doctype etc, and added the meta tag for doctype. Then I ran the spider test again, and still no keywords or links were pulled. It must be something in my html somewhere because I have an imagemap of links, text links on the side, and text links at the bottom (all internal).

Eeek.....all those validation errors scare me....

Karen

lorax

7:00 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're using the Spider tool at SEW then it may not be you. I used that tool today and it didn't see my keywords or description. I went back and looked at them twice just to be sure they were there - and yes.

Re: the errors. Start at the top. Sometimes one syntax issue will generate more than one error as the validator reads through the page. Fix them one at a time and you'll get through them fine.

Andrew Thomas

7:18 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



It may sound daft, but make sure your links are :

htttp://www.yoursite.com/link2.asp

instead of:

/link2.asp

As the links on the spider test look for the full url of each page

Andy

TallTroll

7:19 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A couple of quirks that I have discovered when using the SEW SimSpider :

1) When entering a URL, use a trailing slash (ie [domain.com...] rather than [domain.com)...] otherwise it can have trouble reading links correctly
2) To get it to read the keywords and description tags, they have to appear in the correct order, otherwise it seems not to see them. I think its description then keywords. Also ensure you have used "" to enclose the contents of any and all element attributes

lorax

7:29 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



htttp://www.yoursite.com/link2.asp

You sure about that Andrew? ;)

Thanks for the tip TallTroll. I think my order may be reversed. Is that just a quirk of the SEW spider or should that be considered a good SOP?

Andrew Thomas

7:49 pm on Oct 24, 2002 (gmt 0)

10+ Year Member



Oh, sorry if im wrong, but when i tried it without the [blahblah...] it would not spider any more of my pages, so i changed all the links on my page to the FULL url and it worked fine.

Is it OK to do it this way? or have i made a mistake

Marcia

9:01 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



According to policy we don't do individual site reviews or checks here, but a site search (top of page in the menu) for dynamic links and javascript links can yield a lot of great information related to which links are spiderable and what kind present challenges.

The grand-daddy of resources for anything about HTML is the W3C [w3.org], where complete specifications are available for HTML and validation. There's also a validator onsite and a link to HTML Tidy, a free tool that's very helpful.

lorax

12:14 am on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Andrew, I was ribbing you about the URL you posted - look very closely at the protocol (the very beginning of the URL).

GB

kstprod

12:58 am on Oct 25, 2002 (gmt 0)

10+ Year Member



Ok guys.... I fixed most of the errors, but it is still spitting out this message to me...

Sorry, this document does not validate as HTML 4.01 Transitional.

The couple of errors that are still left are no biggie to me....but what I am wondering is if being validated will effect how SE's spider me or rank me?

Thanks guys,

Karen

Marcia

1:03 am on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Karen, you don't have to validate to be spidered. Googlebot is voracious, if other search engine spiders get through, she will also.

kstprod

1:52 am on Oct 25, 2002 (gmt 0)

10+ Year Member



Marcia,

Ok wheeeeew - big breath of relief. Thank you. This was beginning to drive me insane. Just out of curiosity, if I removed ALL of the errors, would I then get validated? :)

Karen

Marcia

2:27 am on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes Karen, error free can generally validate. And it's kind of nice to put the little W3C graphic on, too.

politicsandlabor

2:56 am on Oct 25, 2002 (gmt 0)

10+ Year Member



I find the "validation" process to be over-rated.

Not a single one of my pages will pass the W3C validator and yet they can all be read by a wide range of browsers on different OS platforms (and versions!) and the pages all get hit by the G-bot (and a lot of other 'bots.)

I have two Macs and two PCS, which can boot into Mac OS 9, Mac OS X, Windows 98 and 2000 and two different flavors of Linux, all of which I use to test run pages (alternating between dial up and cable)

By far, my best resource is a group of people with a wide range of computer set ups (whom I call the "Un-Indicted Co-Conspirators") (anybody remember that moniker?) and they report back problems to me.

If a page passes this process- Google will handle it.

Then I proceed to get a good night sleep, because tomorrow we've got 50,000+ postcards to label...by hand.

TTFN

Andrew Thomas

1:08 pm on Oct 25, 2002 (gmt 0)

10+ Year Member



lorax

Sorry, i was half asleep when i wrote that (and the typo)- though i had gave some miss info :)

Andy

kstprod

4:40 pm on Oct 25, 2002 (gmt 0)

10+ Year Member



Thanks everyone for helping! All is well, except there is one thing that concerns me....

When I use NetMechanic to check my html, I get the following error....

Warning: HTML content does NOT match <!DOCTYPE>

This scares me.... how important is this? I currently use HTML 4.1 Transistional for DOCTYPE. Does the same thing when I use HTML 3.2 DOCTYPE. Am I possibly using a mixture of both or something?...lol

You guys are the best!

Karen

lorax

4:45 pm on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



how important is this?

May be or may be not important. Has that for a clear answer?

Have you validated the code with the SEW validator or W3C's validator?

kstprod

6:52 pm on Oct 25, 2002 (gmt 0)

10+ Year Member



I can't get validated thru W3C or SEW. Here are my errors, which I don't want to fix because of style issues.

Results of attempting to parse this document with an SGML parser.

(The 1st 10 errors below are from the code from my stats program, which I was told NOT to change in any way)

16 7 required attribute "TYPE" not specified
<script><!--

19 41 required attribute "TYPE" not specified
</script><script language="javascript1.2"><!--

21 16 required attribute "TYPE" not specified
</script><script><!--

23 33 "ALT" missing ... </script><noscript><img src=http://x3.extreme-dm.com/z/?tag= ...

23 68 general entity "p" not defined and no default entity ... //x3.extreme-dm.com/z/?tag=kstprod&p=http%3A%2F%2Fwww.trendy ...

23 104 general entity "j" not defined and no default entity ... /?tag=kstprod&p=http%3A%2F%2Fwww.trendybabies.com&j=n height=1 width=1 ...

24 7 required attribute "TYPE" not specified <script><!--

27 41 required attribute "TYPE" not specified </script><script language="javascript1.2"><!--

29 16 required attribute "TYPE" not specified </script><script><!--

31 33 "ALT" missing ... </script><noscript><img src=http://x3.extreme-dm.com/z/?tag= ...

(The only browser that doesn't support this is NS 3 so I don't care)
64 19 there is no attribute "BORDERCOLOR" <table bordercolor="#FA8BB6" width="100%" border="1" cellspacing="0" ...

(No versions of NS support this, but all IE's do, so NS users can just see the default color)
72 10 there is no attribute "COLOR" <hr color="#ffffff" width="95%" align=center>

(This I use just to indent)
107 23 start tag for "LI" omitted <center><ul><blockquote> ...

Sorry, this document does not validate as HTML 4.01 Transitional.

So that's what I get. I guess if validation has no effect on spidering/ranking, then I don't care. But the DOCTYPE error stated earlier DOES bother me. Any ideas?

Thanks!

Karen

kstprod

7:28 pm on Oct 25, 2002 (gmt 0)

10+ Year Member



Ok.... In the midst of testing some things, I decided to take out the DOCTYPE statement entirely, and change the META statement to:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

Now the Warning on NetMechanic that said my html didn't match my DOCTYPE is gone. NOW, will NOT having a DOCTYPE statement at the top of my page, effect anything? Meaning, is it required by anyone? Or is the META tag enough?

Have patience with me please :)

Karen

g1smd

8:26 pm on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




My 'minimum' header has all of this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>

<HEAD>
<TITLE> Your Title Here </TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META HTTP-EQUIV="Content-Language" CONTENT="EN-GB">
<META NAME="Keywords" CONTENT=" your keyword list here ">
<META NAME="Description" CONTENT=" Your Description Here. ">
</HEAD>

<BODY>

You may need EN-US rather then EN-GB for the Content Language.

The Title, Keywords, and Description are useful to some search engines (and the Title is displayed by the Browser along the top of the window). The DocType is useful for Validation and to tell the browser what version of HTML is being used. The Character Set and Content Language tags are going to become more important in search engines in the future. At present, the browser determines which character set to use to display the page (important for people who also have DBCS and Shifted character set support installed, and non-US defaults on their browser) using this information. They will also help with online translation tools, and help visitors to your web site that use other settings than US as their default.

Additional items such as Copyright, Distribution, PICS rating, and others are left for you to decide. As XML becomes more popular, then all the 'DC.Description', and so on, tags will become more well known, but can be safely ignored by most people for now.

I will sometimes also add:

<META NAME="MSSmartTagsPreventParsing" CONTENT="TRUE">
<META NAME="Generator" CONTENT="Wordpad">
<META NAME="Author" CONTENT="Your Name Here">
<META NAME="Date" CONTENT="2002-09-30">

Any more is usually overkill; but some commercial entities like to add a Copyright line, just to keep their corporate bean counters happy; not that it can easily be enforced.

If you remove these two lines (it is a good idea to remove them):
<META NAME="GENERATOR" CONTENT="Microsoft FrontPage 5.0">
<META NAME="ProgId" CONTENT="FrontPage.Editor.Document">
then, unless you tick the appropriate box in the settings for FrontPage, you will find that they are automatcally added back in each time that you re-edit the page.

After writing all this stuff the last job is to send the page to [validator.w3.org...] and get the HTML syntax checked out for errors.

More about validation in a mo...

g1smd

8:36 pm on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> (Fatal Error: no document type declaration; will parse without validation) <<

>> This is what I have for doctype: <<
>> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <<

That isn't the DOCTYPE! The line above is your CONTENT type and should be within the HEAD.

The DOCTYPE should be something like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

and must be the very first line of the file, even before the <HTML> statement.

>> I find the "validation" process to be over-rated. <<

>> Not a single one of my pages will pass the W3C validator and yet they can all be read by a wide range of browsers on different OS platforms (and versions!) and the pages all get hit by the G-bot (and a lot of other 'bots.) <<

One point to make here, is that many people use the HTML validator at [validator.w3.org...] not to get their code exactly as per the W3C standards but to provide a list of problems such as:

  • Tags with typos like <TBALE> or <IMGG>
  • Missing > or < such as <TD or ...blarg.gif"/A>
  • Unescaped ampersands: & should be &amp;
  • Wrongly nested tags, tags closed in the wrong order
  • Tags opened but not closed
  • Essential elements missing, like having a table with <TD> but no <TR>
  • Block elements wrongly being contained inside Inline elements
  • Missing or wrongly formed META tags.
  • Notices of attibutes without the value in Quotes

    There are a number of extensions to HTML that only work in Netscape, or only work in Infernal Exploiter (such as <MARQUEE> ). Using the validator is not a requirement to remove all such proprietory extensions to the HTML coding. Use the validator to help you to write code without logic and structural errors in the code; don't worry too much that you may use a few non-standard tags - these will be ignored in browsers that do not use or require them.

    So, your specific errors are all very easy to fix...

    >> 16 7 required attribute "TYPE" not specified <<
    >> <script><!--

    These should be:
    <script language="javascript" type="text/javascript"><!--

    >> 19 41 required attribute "TYPE" not specified <<
    >> <script language="javascript1.2"><!--

    These should be:
    <script language="javascript1.2" type="text/javascript"><!--

    >> 33 "ALT" missing <<

    On every IMG tag you need to add >> ALT="some text" << where "some text" describes the images or what clicking there will do. For images that are logos use ALT="logo" or just ALT="", and for images that are bullet points or spacers use ALT="".

    >> 68 general entity "p" not defined and no default entity ...<<
    >> //x3.extreme-dm.com/z/?tag=kstprod&p=http%3A%2F%2Fwww.trendy ... <<

    Easy, just change every occurence of & to be &amp; instead.

    >> 23 start tag for "LI" omitted <center><ul><blockquote> ... <<

    Yes, each item in a list MUST be marked with a LI (ListItem) tag. The principle is to use
    <UL>
    <LI>Item 1
    <LI>Item 2
    <LI>Item 3
    </UL>
    You have missed out the <LI> each time.

    Run it through [validator.w3.org...] again, after fixing. It should be OK.

    Don't worry about the BORDERCOLOR and the COLOR error message (just make sure all attributes are in Quotes like foo="bar"), Browsers that do not support BORDERCOLOR and COLOR will simply ignore that statement.

  • kstprod

    2:22 pm on Oct 26, 2002 (gmt 0)

    10+ Year Member



    I got it, thanks!

    I really appreciate all of the help, everyone! You guys were ON the ball and very informative. WW will be my ONE and only source for help, from now on!

    Karen