Forum Moderators: phranque

Message Too Old, No Replies

Blog permanent link structure

Can the SEs handle this?

         

CernyM

2:39 pm on Jul 25, 2004 (gmt 0)

10+ Year Member



I have a semi-personal, semi-business blog that I regularly update, running as a subdomain on a purely personal website.

I don't have access to mod rewrite, and so the articles were all getting permalinked off the index.php with parameters:

blogtopic.example.com/index.php?article=XX

I found some blogging software that created links of the form:

blogtopic.example.com/index.php/year/month/day/article-title/

It seemed to me that the second approach was more likely to get those old pages indexed and into the engines than the first. Is that true?

Also, the blog has an RSS feed - do any SEs use that, or do they all crawl the pages?

encyclo

1:38 am on Jul 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The rewritten URL is certainly cleaner and more user-friendly, but the search engines should have no real difficulty with the unmodified version.

As for your RSS feed, if I remember correctly, Googlebot can scan XML as plain text, but it's not really worth letting the bot in - I would just exclude the RSS feed in robots.txt.

CernyM

7:34 am on Jul 26, 2004 (gmt 0)

10+ Year Member



The Googlebot is visiting regularly and going right for the rss.xml feed file.

I should exclude this in robots.txt and force it to real the php files?

encyclo

11:22 am on Jul 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



going right for the rss.xml feed file

If this is the case, then you should definitely block the RSS file in robots.txt. The change will take a little while to "register" with Googlebot, but once done, it will mean that only the true content files (those for the end user) are indexed.

User-agent: * 
Disallow: /rss.xml

CernyM

4:46 am on Aug 30, 2004 (gmt 0)

10+ Year Member



Update...

I decided not to block Google from getting the RSS file. Mostly because I noticed that Google generally hits it a few minutes after I put up a new post. (I ping the usual blog update services, which must be notifying Google somewhere along the line).

It took a couple of weeks, but Google eventually started indexing my archived content as well.