Forum Moderators: open

Message Too Old, No Replies

URL abbreviation using Apache's mod_rewrite module

which link does the search engine spider read?

         

jamie

10:03 am on Oct 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi y'all,

have just read wonderful article on html optimization (webreference) in which they encourage the use of

"URL abbreviation using Apache's mod_rewrite module"

as yahoo does. i.e. instead of using the full url in every a href: [domain.com...]

that it is mapped to /st/page.htm

we have lots of urls on most pages and this could certainly save a few kbs, not to mention making it easier for me as webmaster to maintain everything.

BUT.... what does the search engine spider read? does it read the full URL from the txt file, or will it only read the abbreviated URL /st/page.htm

obviously we have named directories logically after keywords, and would like the benefit of these keywords being spidered by the engines

many thanks!

andreasfriedrich

2:03 pm on Oct 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Spiders crawl the web by following links. If a spider sees a URI like [domain.tld...] it will request that resource and associate the body of the answer with this UniformResourceIdentifier.

Itīs just like somebody calling out names of people in a crowed. Once they answer that sb will associate the name with the person answering. If a given person goes by two names and answers twice it will depend on the intelligence of the callee to recognize the two images it sees as belonging to the same person.

If spiders are smart enough to recognize alias URIs then they certainly wonīt react kindly to that but will suspect spam instead.

what does the search engine spider read? does it read the full URL from the txt file, or will it only read the abbreviated URL /st/page.htm

The spider will follow links or will try to identify URIs that are not in links. If some URI points to the txt file and a spider found that URI then it will index that page and will possibly try to index any resources that are referenced in that file. If you use the short URIs in your html pages then the spider will certainly follow those URIs as well.

Andreas

jamie

7:01 am on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



that's basically a no-no then, so that the SEs don't start spidering my site twice.

thanks andreas

andreasfriedrich

11:20 am on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I would use only one URI per file just as most people only go by one name.

If you decide to go for the original, longer URIs on your website implementing abbreviation URIs is worthwhile never the less. I use those short URIs in emails. The shorter those URIs are the better the chance that they will be displayed in one line and will be rendered as correct links by the email program.

Andreas