Forum Moderators: open

Message Too Old, No Replies

leading the googlebot to water

How to get the googlebot to crawl and list pages in your site

         

cfx211

7:00 pm on Dec 4, 2002 (gmt 0)

10+ Year Member


Here is the story. We run the largest and most popular site on a particular subject. Let's call it hardware even though it is not.

One of the things we have on this site is a shop. The company tells me to start paying attention to shop to drive sales. I take a look at it and realize that for search engines the shop is configured terribly. One generic title for all pages and images instead of text. The only thing going for it is that the shop home page has a PR of 6 and our site's index page has a PR of 7. My goal is to get the shop's home page listed high for the big subject keywords, and then to get the individual product pages listed.

3 months into this, I have the shop's home page PR up to 7, we are listed highly for the shops big subject keywords, but I cannot get google to list individual pages in the shop. Here is a chronology of what we have done.

1. Changed the page title of the shop's home page, swapped out the top image which was stylized words in a gif with actual words, and linked to the shop's home page (PR6) from our index (PR7).

This gets us top 10 listings for the 3 big subject keyword phrases we are after, and raises the shop's home page page PR to 7.

2. We then added page titles to all of the other pages in the shop that are specific to the category and product on the page. For example if we are selling hammers the page title will be red ball point hammer.

This gets us nothing. Google does not crawl these pages in the next update. Researching it a bit, I think it is because of the url structure or because of our nav. URLs look like:

http://www.hardwareworl.com/hardware_shop.asp?cat=10&prod=1234&shop=boutique

The navs are all java.

3. We decide to build a flat html page listing all of the products in our shop. The page is linked to from the shop's home page. This page has about 500 links straight into each product with the product name in the link. The page size is about 100k. The hope is the googlebot crawls this page and jumps into each product from there.

This page goes up, gets crawled immediately by the freshbot and ends up doing ok in search listings for individual products. If you searched on Penguin Grip Rainbow Hammer and that was a product we carried then it would show up first page and drop you into this big product list page. Only problem is that the page is a mess for navigation and conversion from there to product is terrible.

When the monthly update comes around, the googlebot does not go past this page into the products. This makes me think the URL structure is the problem.

4. At this point we do two things. First we redesign this massive product list so it is more user friendly. Second we bring a 404 redirector into play and change all the links on the page from

http://www.hardwareworl.com/hardware_shop.asp?cat=10&prod=1234&shop=boutique

to

http://www.hardwareworl.com/hardware_shop.asp/cat/10/prod/1234/shop/boutique

You hit this link and it will redirect you to the product page which is still in its orginal .asp form.

This takes place just before the last update. When the deep crawl comes around, the googlebot hits the product list and jumps into just about all of the product pages from there. There is only one problem, it does not add any of them into its index.

To throw one last curve into this problem, we have another redirector which we use to keep track of what links people click on. You change the URL to look like

http://www.hardwareworl.com/redir/redirect.asp?value=123

it hits an XML page, writes a row to a DB table,and then sends you to the destination.

We have a couple of links on one of our main pages into some products in our shop. They are in the style direcly above. Recently I noticed that it has picked up those links and is dropping people right into the product. You search for Penguin Grip Rainbow Hammer and google puts you in the product page which is what we want the massive product list page to do.

So now that I have the longest post ever, my question is what do you think we should do here? Google has crawled these product pages for the first time, but not added them. Do you think it will add them next month, or will it not because of the 404 redirector or because the links are all on an enormous (101k) page?

I've led the googlebot to water, but it does not want to drink...

amue1977

7:21 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



I had a similar experience with a full products list in the noframes part of the indexpage.
We had the products link to an extra frameset that would call the correct parts and finally show the product in the appropriate context (about 280 Items).
At first this worked beauty, but after some weeks google listed the products with different links to a major category listing page in our site (I would have liked to do that but it looked rather too complex :-). How?
We tried some changes in the linking and ended up being blocked completely! no hint to that domain whatsoever and no reaction from google.
We reduced again and - joke - now page and products (with links changed still another way!) show up again, but under an alternative domain that was intended for internal use only and never submitted to any engine.

To me this looks like google might not like that kind of links so well. I don't believe it can be done properly.

BigDave

7:42 pm on Dec 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



cfx211,

Does your navigation really need to be in javascript?

How about putting in a <noscript> section for the bots and the users that prefer to surf with JS off.

There are places where JS is useful, but in most cases, it is more of a "see what I can do" sort of thing.

I think you would be much better served by getting googlebot to follow your actual link structure than depending on a sitemap type of structure.

cfx211

7:51 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Our site has a lot of dynamic elements to it, and I believe that we have to keep our nav in js to accomodate those elements. I do not code so I am not sure, but I do know that from a practical stand point our company will not change the nav structure anytime soon just because of the level of effort it would require. Outside of the crawling, we have not run into any other problems that would call for us to change.

jpavery

7:54 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



I fully read your post, but started to lose focus... a few things did pop to mind.

I went through similar challanges. We had an e-commerce site with? = in the URL. We installed a http-rewrite... which is not a re-direct.

1 - SEs do not like DBs - affraid of getting caught in a black hole... google apparently crawls one deep.
2 - SEs do not like redirects. (or pop-ups/unders)

get the rewrite working without using redirects.

500 links on one page sounds like a lot... maybe they are being crawled but ignored...
JP

Brett_Tabke

8:07 pm on Dec 4, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Welcome to the board cfx. A good site map is about the only way.

jimh009

9:17 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Instead of using one big site map, how about using several mini-site maps, broken down by cateogries of hammers, nails, etc...thus, a hammer page would have a mini site map that leads to all hammers, the nail page would have a mini site map that leads to all nail pages, etc...

I've had good success with that on my site (a content site) in getting Google to take it all in, despite have a deep directory structure. I use some flash menus on my site for my visitors convenience, but google didn't like. When I made these mini site maps, google took everything in - plus the visitors seem to like it too.

Jim

Birdman

10:00 pm on Dec 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome, cfx211! I just recently had the same problem, except I skipped the query string method and used your second method with the variables passed between slashes. You can read the thread [webmasterworld.com], but it deals with Apache mod_rewrite.

>>>.com/hardware_shop.asp/cat/10/prod/1234/shop/boutique

What I think happens is, the bot gets confused because it looks like a malformed url. If you used that method straight from the top-level directory, it may work.

>>a.com/cat/10/prod/1234/shop/boutique

The problem with that is, it looks like you are jumping seven levels deep. Mod_rewrite solved my problem to perfection, but I do not know if there is a M$/ASP equivelent to mod_rewrite.

cfx211

10:21 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Thanks for the suggestions. What bugs me about this is that the bot crawled these pages and just didn't add them into the index.

We also have a paid vendor directory which we are trying to drive traffic into. As a part of that we have done something similar to the product site map, except these are much much smaller, 20 links instead of 500 hundred. I have not seen those crawled yet, but if they do get picked up then I know it is the size of the map and not the URL format that is scaring google off.

If they don't get picked up then it is back to the URL redrawing board.

taxpod

10:31 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



This was kind of painful to read but your answer is in what you said. You said something like we changed these right before the last update. Likely these pages are just being crawled now. Here's an example:

Pages changed Nov 24
Update occurs Nov 28
Deepcrawl begins Dec 2 and ends something like Dec 10

The changes you made Nov 24 will be in with the Dec update. They will only be in there sooner if the freshbot hits them. They won't be in the Nov update.

Regarding the URL query strings. Somebody else said don't use redirects. I agree. Don't simply use flat pages to redirect to your dynamic ones thinking that all you have to do is get Gbot to your pages. The targets need to appear flat. asp will give you the HTML output but your URLs still have multiple variables in them. Use one variable or rewrite the URL.

Finally, with dynamic pages, patience seems to be important. Google sometimes takes time to list dynamic pages. I have a ton of single variable asp pages and it took two to three months, maybe longer before they were listed properly.