Forum Moderators: open

Message Too Old, No Replies

Newbie Spidering Question

How does that work?

         

lanesharon

2:52 am on Jan 26, 2003 (gmt 0)

10+ Year Member



I am relatively new to the world of web building and I am having difficulty with spidering. I have a site with a search from sitelevel on the first page. One thing that they do for me is respider the account when requested so that all new pages get reindexed for the search. I added some pages yesterday and put it in the queue to be respidered. For some reason it only picked up two pages of my web pages when it did that. Oddly enough, it was the two pages that are accessed through buttons in the first table of the page. I have no idea what is wrong with the rest of the coding for the site that would cause this, so I am posting this question. Exactly how to spiders work? I know this is an entry level question, so if you do not want to answer it can you at least give me a website address that explains it? Thank and take care, Sharon

[edited by: Woz at 2:55 am (utc) on Jan. 26, 2003]
[edit reason] no urls please: TOS#13&20 [/edit]

Woz

2:58 am on Jan 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Sharon, Welcome to WebmasterWorld,

You say the only pages spidered are those accessed through buttons, how is the rest of the site accessed.

>One thing that they do for me is respider the account when requested
Who is "they"?

Onya
Woz

lanesharon

3:48 am on Jan 26, 2003 (gmt 0)

10+ Year Member



There are two buttons at the top of the page that are mentioned in the HTML of the index.html page. They are the home page and the contact us page. When I use your spider simulator, it picks up only those two pages. The other pages are access through javascript routines and buttons. They are not mentioned in the HTML code of the index.html page. Would this create a problem with spidering. If so, how can I get around that? Thanks for your reply.

<snip - no urls please>

Take Care, Sharon

[edited by: Woz at 3:50 am (utc) on Jan. 26, 2003]

lanesharon

3:50 am on Jan 26, 2003 (gmt 0)

10+ Year Member



Sorry, forgot to mention, what I refer to as buttons are actually images I created to access the pages with HTML tags behind them. And the 'they' I am referring to is SiteLevel. I have a search routine through them and they will respider my site on request in order to update that search engine for my site. I also used the spider simulator discussed here in this forum. It also only spiders those same two pages. Thanks.

Woz

3:54 am on Jan 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah!

Generally speaking Spiders cannot follow links in Javascript. If all your menu is Javascript then you have a problem. Solutions are to either change the menu system to straight html, or use a site map so the engines can find the other pages. Or even both. Use the search function at the top of the page and seach for Site Map and you should find heaps of info.

Onya
Woz

PS, check your sticky mail, at the top of the page, click on "You have mail". - Woz

lanesharon

4:21 am on Jan 26, 2003 (gmt 0)

10+ Year Member



I know that this is a basic type of question, but I have to ask it. I used the search on this forum to try to find the answer using the term spider and I got a number of posts all on subjects much more advanced than mine, so forgive me for asking this basic question. Here goes....

I am assuming from your answers to my posts then that the HTML coding for your entry page (index), needs to have an actually HTML link of coding for each page on your website in order for a search engine like google, etc. to spider your page fully and access all of your pages in their search routines. I thought that the way search engines worked was that they went to the directory where that page resides and spidered anything in the directory that was not in the robots.txt disallows. I guess I was wrong.

It took me a long time to do that javascript menu. I had to do it to make the page readable in the large print format. Many of my users are people with cancers that have caused them visual problems. So, I created a website with large buttons that would move and allow for screening more easily by these people, regardless of screen setting size. So, the javascript pretty much has to stay to make this usable by all users of my website.

I do have a sitemap page and I will use that temporarily to help with my current internal website search problem, but I am concerned that the majority of the information will be passed by on the major search engines. Do I submit my sitemap page to them instead of my index page? Or both? I am a little confused on the mechanics of search engines, so please forgive my repeat posts. Thanks so much for taking the time to train a newbie. I appreciate the help.
Take Care, Sharon

Woz

4:36 am on Jan 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Spiders follow links to find new pages, they do not get the directory contents. So for a spider to get all your pages there needs to be links to all of them. Not necesserily on one page, but page linking to page linking to, you get the idea.

The site map is a way to make sure the spiders get either them all, or at least the major section pages which list sub-pages in that section.

So, if you need the J menu to stay, then you need to look at additional linking techniques to ensure overall coverage. One suggestion might be to link to major section pages at the bottom of the home page, and then again link to sub-pages at the bottom of those pages, and so on.

Suggested reading:-

Successful Site in 12 Months with Google Alone [webmasterworld.com] & Theme Pyramids [searchengineworld.com]

>please forgive my repeat posts

No problem, glad to help.

Onya
Woz

lanesharon

5:20 am on Jan 26, 2003 (gmt 0)

10+ Year Member



One last and final question, please, before I can go to bed tonight. Based on your explanation of spidering, it would seem logical that if I just include a link of the sitemap page on that first entry page, then the full site would get spidered. When I changed the code to include a link in the body of the page to the sitemap and put it through the spider simulator, it picked up just the first two pages (that have been spidered all along) and the sitemap page link itself, but not the rest of the pages included on the sitemap page. Do I assume that spidering search engines would then spider each of those lower level pages before completing the spidering operation? (On the simulator there is an icon next to each page to spider those lower level pages.) Thank you for all of your help and all the great resources. I have been looking through the site and bookmarked several pages that had great info. I am so glad that I found your site tonight. Thanks and take care, Sharon