Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google wont follow RewriteEngine generated links

Google wont follow RewriteEngine generated links

         

namcoke

10:50 pm on Feb 14, 2005 (gmt 0)

10+ Year Member



Hi,

I've been reading this forum for a while and it's been a really great help, this is my first post I am truly stumped on this!

I think this is something googlebot specific.

I have made a dynamic website using php/mysql and tried to make it SE friendly using the rewriteengine in the .htaccess file to make my dynamic pages look like static pages.

The implementation works as I can follow the links with my browser and a few spiders have been through the links.

The problem is, Google just won't follow the links. I thought I was suffering from the "sandbox" effect, but the other night I created a single static page on my site, with a link to it from the main page(which has been indexed) and google spidered this page within 6 hours.

So the problem doesn't seem to be the sandbox effect, just that googlebot won't follow my RewriteEngine generated links.

I would be gratefull for any ideas that anyone has about this,

Thanks,

Nam

tantalus

4:55 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



namcoke:

Oops did'nt read your post properly.

Have you checked your robots.txt for errors?

namcoke

5:57 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Hi,

Thanks for that - I don't have a robots.txt file, do you think it would help?

I made another couple of static pages last night with normal links to them and they havn't yet been spidered - maybe it is just the sandbox. If the new static pages get spidered i'll know it's definately something to do with the rewriteengine generated links.

pmkpmk

6:00 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All of my sites are done with the Typo3-CMS, which only uses dynamic pages. They are all transformed to static pages via the Rewrite-Engine, and googlebot is a daily (and very hungry) guest. Therefore I assume you encounter other issues.

PatrickDeese

6:01 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It might be worthwhile to test your server headers from the tool in your control panel here in WW, just in case its a error in your htaccess, or something.

theBear

6:32 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What return codes are you sending from the rewrite rules?

And while we are at it the robots metatag generated by one of the pages.

namcoke

6:37 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Hi, not sure what you mean by return codes but here is my full .htaccess file.

I am passing 6 variables through

RewriteEngine on
RewriteBase /
RewriteRule ^(.*)/(.*)/(.*)/(.*)/(.*)/(.*)/index.htm /index.php?masterCategory=$1&advertiser=$2&brand=$3&productID=$4&sspage=$5&sslimit=$6

namcoke

6:42 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Generated meta tags are:

<meta name='robots' content='all'>
<META NAME="revisit-after" CONTENT="1 days">

The robots meta tag is within my PHP - just noticed the single quote marks and lower case - could that be the issue?

Thanks for your help on this

Wizard

7:18 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



RewriteRule ^(.*)/(.*)/(.*)/(.*)/(.*)/(.*)/index.htm

Maybe you made mistake by using so deep paths. I'd rather use different separator than slash, '-' for instance.

Has anyone heard about how G treats urls with very deep paths?

namcoke

7:29 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



I was starting to worry that the paths were to deep, it would be interesting to find out how deep G goes on normal URLs

I've taken the robots meta tag out of PHP and put in in as straight html because of the quote marks issue, maybe that will help

If the url's are to deep, i've got a bit of a re-design on my hands...

Cheers

namcoke

7:49 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Wizard - the penny just dropped on replacing slash with dash

That should make it look like all my pages are on the first level right? just with very long page names?

arrowman

7:51 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



That's a rewriting rewrite rule (as opposed to a redirecting rewrite rule with [R]), which means Google doesn't even notice. There must be something else going on.

theBear

8:47 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Rewrite rule is fine.

Google doesn't give a hoot about your directory structure it cares about your link structure.

Home page is first level, pages linked to from the home page are second, and so forth.

Do you have access to your server logs .... see exactly what G is indexing and check your robots.txt file.

In fact why don't you post your robots.txt file if it isn't a concern for you to do so.

namcoke

10:51 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Yeah got access to server logs - recording hits to mysql db just now so can see every one

I've not got a robots.txt file yet - do you think it would help?

Thanks to wizard I've gonverted the links to use dashes between variables instead of slashes, not googlebot yet but Mediapartners-Google/2.1 is alot more active eventhough i've been displaying adsense for weeks

Just Guessing

11:16 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Depth of link structure is very important. Links from the home page are usually spidered quickly. Links from sub-pages can be very slow to be spidered. Try putting a random link on the home page to give googlebot a new link on every visit to the home page. Visitors might like it too.

incrediBILL

11:28 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use a rewrite rule with no paths and google gobbles it right up.

theBear

12:11 am on Feb 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No robots.txt file tells all the bots that they can go anywhere on your site.

Are you certain there is no robots.txt file on your site .... in a browser do a http//www.example.com/robots.txt where www.example.com is your site.

I have never seen G not follow links unless told not to you have only a few ways of telling G's bots to hold off.

1: robot meta tag in pages remove it default is for access.
2: robots.txt file remove it default is for access.
3: .htaccess banning.
4: server firewall rules.
5: script programming.
6: active conditional html.

Other than we can't schedule the bots visit I haven't a clue as to what is going on.

Here is my robots.txt file the site gets plenty of bot visits every day.

User-agent: *
Disallow: