Forum Moderators: Robert Charlton & goodroi
Looking in Google Webmaster Tools, though, I see that 8,667,936 pages are indexed with the h parameter.
If I mark it to be ignored, though, am I going to lose 8 million indexed pages?
What I DON'T know, though, is whether example.com/board/1234/ is indexed at all, or if it's only indexed using the h parameter.
What I DON'T know, though, is whether example.com/board/1234/ is indexed at all, or if it's only indexed using the h parameter.
As it seems you are internally linking to URL with ?h parameter, it is possible that the URL without h parameter is not indexed at all. You can try to check it using the combination of the site: and inurl: command
site:example.com inurl:/board/1234/
and see what versions of URLs are returned.
What currently happens currently if your date changes? I presume internal links are replaced with a new value of ?h parameter, but what happens with the old one? Are you redirecting it to the new ?h or is it return 404 or something else? If you are not redirecting it, then over the time you will have a ton of duplicate content.
However, this is very unusual and not recommended way of forcing the cached page to be served. Are your pages so big that you need to employ this kind of mechanism for page caching? Or do you have a problem with the amount of traffic to your server? Ideally you would stop using this mechanism of interlinking.
If your non parameter pages aren't being indexed now
In your case, the first option is applicable and when selected, you have no further choices (such as Every URL, No URLs etc that exist for "Changes content" option).
GoNC, when you talk about caching, are you simply referring to the browser's own cache? "I saw this URL yesterday, no need to reload it". Or do you serve up static versions of each URL from copies on your server?
Sgt Kickaxe it sounds like he doesn't have any non h parameter pages.
[edited by: aakk9999 at 12:34 am (utc) on Feb 4, 2014]
How's your search traffic been since you implemented this caching solution?
2) add a canonical link element to this page:
<link re="canonical" href="http://www.example.com/board/1234/">
When a search engine meets URLs in the form /dir1/dir2/pagename.html they will eventually ask for /dir1/ and /dir1/dir2/ even if these pages don't actually exist and nobody links to them. Do they do the same with parameters? That is, if they habitually see something with h=some-value, do they eventually ask for the same URL without the h= parameter? Seems like sooner or later they would.
Any other way to accomplish the caching?
Otherwise, you're still sending visitors to a huge number of pages with incorrect urls
Any other way to accomplish the caching?
User clicks on a link leading to post # suchandsuch. Browser says "Oh, I've been there before, I'll just serve up my cached copy." But what if new posts have come in to the same page of the thread? Wouldn't the user then miss out on those new posts because the browser doesn't know they exist and therefore doesn't put in a new request?
Have you thought of removing ?h= parameter on one small section of the forum and monitor the traffic/page loads/time on page/number of visitors google sends?
Someone, somewhere, probably has data on the two aspects of server load: the mere fact of a request, vs. the size of the material sent out.