Forum Moderators: open

Message Too Old, No Replies

No deep crawl due to java-script?

any other reason?

         

Oliver Henniges

12:09 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HI,
in order to make all niche-terms available to search engines I decided to split my product-catalogue into 160 sub pages, which all got perfectly indexed by google in autumn 2001. The main site has a page rank of 4.
I then moved forward to a frame-based design with a java-script-based single-line-redirection on top of the 160 pages: no problem.
In order to make the single pages bookmarkable and get rid of the redirection delay, I finally decided to split the sites: one version which might be indexed by search engines, grabbing the java-shop-code from source and providing the frameset, and another version in case the page is shown inside the frame. Since then only my index.html page remains indexed, but all the catalogue pages are gone from google.
I put considerable effort in making the pages available (links in the noframes-section, robots.-txt file to exclude the inframe-versions etc) but nothing helped. The only solution I see for the moment is to move forward to a server-site-scripting-based version and back to mere html, but this would be a huge effort for me, since I so far have no idea about perl, cgi and the like.

Any other idea would be well appreciated.

kstprod

12:22 pm on Nov 5, 2002 (gmt 0)

10+ Year Member



I claim to be no expert, but I do know that some spiders will get "caught" by javascript, unless you call it from an external js file. I learned this the hard way, so just be sure when using javascript, that you always call it in the following manner....

<script language="javascript" type="text/javascript" src="yourscript.js"></script>

Also try using the Sim Spider on SEW, and make sure everything is ok on there, since your changes.

You may already know this, but just in case you don't :)

Karen

HarryM

1:29 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oliver,

If you want to get away from frames, it might be worthwhile looking at php. It's very easy to use, especially if you know Javascript, and offers (in my opinion :) ) a better solution than perl and cgi.

ciml

1:54 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld [webmasterworld.com], Oliver.

People who have frame based sites would be better able to comment on this, but I've had the impression over this year that Google' is less willing to follow links in NOFRAMES (i.e. the PageRank threshold for crawling seems to be higher).

In your position I would certainly look toward a more robot friendly approach to selling the products, either pre-processing the HTML pages (and linking those to the shopping system), or moving to a system that can be robot friendly up until the 'order now' link.

Calum

martinibuster

3:32 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



As regards Google, if this is a relatively NEW page, then it simply may be that you don't have enough PageRank. GoogleGuy stated here last week that PR determines the depth of a crawl.

Also, you may want to set up a SITE MAP. Link it off the bottom of your homepage, and list all of the pages you want to have crawled.

But make sure your pages validate, or are very close to validating.

Gruesome code is another spider stumbling block that can keep a site from being thoroughly indexed.

Oliver Henniges

5:39 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thank you very much so far,

yes, my site has been indexed for more than a year yet with a page rank of four which as I know is sufficient to get the rest indexed. yes, the sim-spider has no problems with it; yes, I have a sitemap, or lets say at least twenty or so links in the noframes-area of the first page into the rest (more'd be spam someone suggested).

I think I'll give it one last try and add a sort of sitemap in one of the frames on the first page pointing to the indexable mirror-pages, because maybe as calum suggested google doesn't follow noframes-links at all meanwhile. We'll see by the End of November.

The reason why I posted this here is that noone has given an explanation so far: Could it be because the Javascript is named "steuerung.txt" and not "xxx.js"? Which java-scripts are accepted and which not? Could it be because of the embedded frame-structure? Is it because google is not fed with fully grammatical sentences but rather product-adjectives?

ciml

6:35 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Could it be because the Javascript is named "steuerung.txt" and not "xxx.js"?

Nope, Google will ignore both.

> Which java-scripts are accepted and which not?

None are.

> Could it be because of the embedded frame-structure?

I don't suggest that Google never follow links in noframes, just that the PageRank required may be higher.

Maybe a "sort of sitemap in one of the frames on the first page pointing to the indexable mirror-pages" would work well. Last time I looked into frames, Googlebot seemed more willing to follow frame src attribute values than normal links in noframes.

> Is it because google is not fed with fully grammatical sentences but rather product-adjectives

Some people believe that grammatical sentences get a boost, but using product names should not be a barrier to crawling.

WebGuerrilla

7:50 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Remove the <no frames> tag and use an external JD file to write your frameset. Google gets the non-framed version and you don't take a hit for the no frames tag.

Powdork

7:59 am on Nov 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it seems that so many people decided to spam with <noframes> content Google places less, if any, importance on it any more. I recently moved my original site out of frames and it is doing sooo much better. Of course, all other things have not been equal. Equally importantly, it is sooo much easier to maintain. People can use their back button, the page gets cached, and Google sent me a free mousepad (not really :(). By the way, welcome to webmaster world. The instant gratification here is what brought me over from 'groups. One question. Is it possible you are being penalized for duplicate content?
Cut

Oliver Henniges

9:07 am on Nov 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> One question. Is it possible you are being penalized for duplicate content?

1) How could I find out if so?
2) the inframe-mirror-pages are marked "noindex" in the robots.txt, so I actually don't think I am, but who knows...
3) The main-page remains indexed, I thought any penalty would affect the whole site or IP. By the way: I run four urls on the same IP: One is in the index, the others aren't. Is this new? I am not sure, but I think a year ago or so PR showed up for all four. A site search on url, ip and google does not reveal helpful information because these terms may be dropped in 50 percent of all postings.