Forum Moderators: open
I have to get our 150.000+ (multilingual)product catalog indexed (now only about 100 pages indexed) in Google. The site is build around a database, when surfing around session ID and more is in the url. I read before that you have to get rid of that sort of urls.
Second is the navigation through categories and products. It goes sort of <a href="javascript:showCategory('-here goes_the_number_of_the_category')"
The site is in english. Different countries have no opportunity to get all products translated into their own language, only some non-products related pages and page elements could be translated.
My plan is to get the database modified in a way that each country can translate the most important information (product name and html-title-tag). Next to this the url's have to be rewritten into static looking urls, for each language, so that it's fairly easy to index all urls and that each url is giving information what it's about (../red-widgets.html is more likely to be clicked on than ../categoryID=12345 when someone is searching for red-widgets).
Last thing is the navigation. Either I'll get it out of the 'javascript method' or have some proper, dynamic sitemaps that points the spider into the product database to the deepest level.
Next to my question to have some feedback on this approach there remain some questions for me:
1. what harm does the sessionID's do?
2. how about duplicate content:
.../de/blue-widgets.html
.../uk/blue-widgets.html
where there will be several 'the same' objects on the page?
This will be my first major product catalogue so any help would be fine to give the 'techies' in our company some evidence to get things changed...
Many thanks in advance!
1. If the content of
.../de/blue-widgets.html
.../uk/blue-widgets.html
are the same then ban google from all(using robots.txt) but one directory e.g. uk
2. Google doesnt like session id's so try and pass these in a cookie instead of the url
3. To guarantee google will be able to index the site I would rewrite the site to use no url parameters e.g.
/area/category/productid
You *may* get away with having simple parameters e.g.
/viewproduct.php?id=4532
4. Get rid of the javascript if possible, robots generally cannot understand it and users may also have it disabled.
I've got a directory with around 70-90 thausand pages, wheresubject pages may be in many topics with different paths, but same page, google never penalised for that yet.
also, think carefully about your URLs first up front, it'll make hte d4esign and layout of the site much easier. The URL structure will almost be your site/structzure map.
Good luck with the conversion.
SN
first of all, build an _accessible_ site.
i think a user should be able to use a site with a browser that can't handle anything but plain html.
no javascript, no cookies, no css, no java, no plugins.
if you can do this, then you don't have to worry much about search-bots.
there's quite a few resources on w3.org dealing with web content accessibility.
a site navigation based on javascript simply destroys the idea behind hypertext.
such a thing is bad for the _user_ in the first place - and it's also bad for search-bots.
try to get rid of it.
sessions:
make sure that a http-client can go through all relevant pages without triggering a session.
sessions could indicate temporary and/or personalized content.
moreover, googlebot would receive a new session-id (i.e. a new url) every time it requests a document.
regards
martin dunst
// get agent
$agent = $HTTP_SERVER_VARS['HTTP_USER_AGENT'];
// check for some agents
if (
stristr($agent, "Googlebot") ¦¦
stristr($agent, "inktomi") ¦¦
stristr($agent, "scooter") ¦¦
stristr($agent, "webcrawler")
)
{
// No Session ID
// any kind of special code
// or do nothing
}
else
{
// do session stuff
// 15 minutes lifetime
ini_set("session.gc_maxlifetime", 900);
// 10 percent probability
// for session garbage collection
ini_set("session.gc_probability", 10);
session_start();
}
----8<-----8<----
I put this in my header file which is alway loaded with any of my dynamic pages.
Of course: no warranty ;-)
Jever
Killroy, it make perfect sense for me to serve a page in the language of the user, but I have to agree with Darkness where he points outs his concern. The page wouldn't be exactly the same. Some database fields will be 'all english' some elements though will be translated by the countries. Together with rewriting URL's this will be separated pages I think...