Forum Moderators: Robert Charlton & goodroi
Has anyone experienced problems with getting OS Commerce sites indexed in google? I'm only really assuming it's an OS commerce issue as there is no other logical explanation.
We have;
> rewritten the URL's to html format (3 months ago)
> Implemented dynamic meta tags (3 months ago)
> implemented an XML Google sitemap (2 weeks ago)
However, Google is still only indexing the old dynamic URL's and old generic meta (title & descrip).
I don't think it's a time issue as other google sitemaps that we submitted at the same time have had the desired effect.
Can this be an OS Commerce issue or had google crossed the site off their chistmas card list? (the google PR hasn't changed and no unethical linking strategies are being used)
Thanks in advance!
Lee
I checked hundred times i did not have any problems that i could detect, find with osc pages and have no clue what kept them from being indexed. But then that was about the time google was working on big daddy, so that could have been the reason perhaps.
The software is very clever, and very flexible, but the interface presented to search engine spiders is poorly thought out (ummmm, NO thought has gone into it at all) and very badly implemented.
However, Google is still only indexing the old dynamic URL's and old generic meta (title & descrip).
Is it possible that the dynamic URIs are still browsable? If so, that may be one of the major issues. When you implement a rewrite, you should not be able to browse to those dynamic URIs anymore, they should be 301'ing to the new rewritten URI.
I'm going to assume that you've updated all internal links to reflect the rewritten URIs? And, that you've addressed any other issues that may cause a spider to get into a dynamic URI instead of the rewritten one?
which seo mod are you using?
i think the one i used was called easy seo urls v 2 or something (its comes prepacked with oscmax), it requires
you to reset the cache after every product add or edit, if you don't every product since the last reset, will use the regular url structure
not the rewritten, so if you aren't careful, you end up with half a site rewritten and half not.
If i remember right the popular google sitemap and froogle mod for the standard build, also didnt use the rewritten url structure, since they pulled
straight from the database.
I never used any of the meta mods, i just edited the template so the product name echo'ed out for the title,
and then used the short description contrib to populate the meta description.
Also check and make sure you have sessions turned off for spiders in your admin configuration.
Of course you probaly don't want to hear this, but since it was the first oscomm project i did, there was a lot of mistakes,
mainly the site and froogle maps using the dynamic urls and the site itself using the rewritten urls, but that site ranks really well,
with some product pages actually outranking the manufacture's page on it, and has held steady for a year. But the domain was 4 years old
and was a hobby site turned into a store, so it was well linked and i got really really lucky lol...
but theres some of my mistakes, hopefully you can use them.
Let me paint a scenario and then you tell me if I'm "barking up the wrong tree". From this point forward, think footprints.
So, a particular piece of software becomes very popular amongst certain groups. Because of it's popularity, it gains more exposure. Not only from a marketing perspective but, from an indexing one too.
The number of users increases steadily over time. The number of sites built using that software increases steadily over time.
During that time period, Googlebot is indexing anything and everything it can get it's hands on and this includes both static and dynamic content.
Due to inherent problems with the software, certain things begin to happen. In this case, we have rewritten URIs and dynamic URIs being indexed simultaneously from a variety of queries. Duplicate content. Footprints.
If you were a Search Quality Engineer (SQE), would you want your spider to get trapped within an environment where you would be indexing the same content under 10-15 different URIs?
So, are we possibly seeing a cleansing of the index based on certain technical issues? It's easier to filter based on the footprints and of course you're going to get caught in the net, there's no way around it. Or is there? ;)
P.S. I've watched Google (and other search engines) wipe out entire networks over the years based on footprints. They are easy to detect and it's unfortunate but there is going to be unintentional fallout.
Let me paint a scenario and then you tell me if I'm "barking up the wrong tree"
great point...and to be honest on the whole most oscommerce shops aren't essential to the index,
2 clicks from fantastico and you have your own online "business". And like you said why risk having a spider
getting trapped, especially gathering "useless" information.
Any estore is easy to spot, just not oscommerce, lol and for the conspiricy minded, wouldn't google be better off
if those sites out to make money, would pay for their traffic via adwords?
i'll go dig up and give the store logs a quick look and see if anything has changed drastically over the past few months,
but considering i haven't got a panicked email about traffic drop-offs from the store owner, everything is probaly about the same.
so it may be domain and link age is the way around it, lemme double check
I don't see the issue as being one of footprints per se, but rather of Failure. That's failure to fix the errors of these applications to generate decent, search engine friendly, long term stable urls.
The only application I've seen that does that pretty much straight out of the box is wordpress (maybe others are finally now catching on), the rest require a lot of tweaking and hacks to the supposed 'seo' mods that supposedly 'fix' these issues. And some are just a disaster when they do try to implement these fixes out of the box, like the vbulletin stuff for example.
But that's not the only failure point, the real one is from people who don't spend the time or money to fix these errors, then fill up the google etc forums for the next few years complaining about their site not being indexed, whatever. This is to me probably the single most annoying thing I come across year in and year out here, especially the abject refusal to even admit that this was and probably still is the case.
I haven't seen any real indication of the footprint phenomena, but I have seen plenty of indication that if your fix your urls asap, use mod_rewrite well to redirect the fixes, and so on, the benefits are unmistakeable.
Re the topic thread, do yourself a huge favor and dump oscommerce, it has some nice offshoots, zencart is well regarded, I don't know the other one that was mentioned, but I do know that I have NEVER seen a worse executed open source application than oscommerce. Rumour has it that the key developers have sort of gotten to like the fact that they get paid to fix their own work, or something to that affect.
The dynamoic URL's are using the 3-1 redirect to the new HTML pages and SPider sessions are set to off.
The Server headers looks as follows; Can't see naything wrong with it myself/
#1 Server Response: <snip>
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Sun, 21 May 2006 14:56:15 GMT
Server: Apache 3
X-Powered-By: PHP/4.3.11
Set-Cookie: cookie_test=please_accept_for_session; expires=Tue, 20-Jun-06 14:56:15 GMT; path=/catalog/; domain=<snip>
Set-Cookie: osCsid=c1d4826240264f1962394cea0600b282; path=/catalog/; domain=<snip>
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: <snip>
Connection: close
Content-Type: text/html
Redirect Target: <snip>
#2 Server Response: <snip>
HTTP Status Code: HTTP/1.1 200 OK
Date: Sun, 21 May 2006 14:56:15 GMT
Server: Apache 3
X-Powered-By: PHP/4.3.11
Set-Cookie: cookie_test=please_accept_for_session; expires=Tue, 20-Jun-06 14:56:15 GMT; path=/catalog/; <snip>
Set-Cookie: osCsid=0395ad236823f335aab1e47045147011; path=/catalog/; <snip>
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Content-Type: text/html
[edited by: lawman at 3:31 pm (utc) on May 21, 2006]
[edit reason] No Urls Please [/edit]
For one, you have blank pages returning a 200 status. You have popup pages being indexed. You have some fatal errors in the html. The list could get quite lengthy upon a full blown discovery.
If you did all this three months ago, you may be dealing with Google's time factor in getting things sorted out after the rewrite. You've got enough going on there to confuse the heck out of a spider. It's going to take some time for a spider to sort out what you've done.
My advice? Hire someone who knows exactly what they are doing. Have them do a complete discovery of before and after. There will need to be some cleanup from the previous incarnation and also from the current one. This is not a push-button fix. You first need to undo what is being done and then redo it the correct way.
Google is going to continue to index the old URIs for a few months. At first, they will go supplemental. Then they will slowly start to disappear. During this process, the new URIs are getting indexed and may go supplemental too. But, once they get fully indexed, the supplemental tags should go away. This will all take time. I'd plan on at least 6-9 months to sort everything out that you have going on.
Personally, I would have consulted with someone who specializes in this area before making the change. You may have added insult to injury and it is really difficult to tell without a full discovery.
Good luck.
Getting Google to deindex stuff they had already found at the "wrong" URL was the most difficult of all the tasks.
Many of the problems come from basic design errors within the package itself. It needs a lot of work to make it SE friendly.
Personally, I would have consulted with someone who specializes in this area before making the change. You may have added insult to injury and it is really difficult to tell without a full discovery.
This is exactly what I saw: implementing a complete correction and permanent seo url solution to a prepackaged product, especially one as bad as oscommerce, is not trivial, and the odds of someone who isn't fairly well versed in mod_rewrite etc doing this successfully are very low. But, sadly, so are the odds of people actually doing what's necessary and paying someone to help them.
For one, you have blank pages returning a 200 status. You have popup pages being indexed. You have some fatal errors in the html.
In fact, the mod-rewrite one is having a lot of supplemental issues with google and it taking forever to get new pages indexed.
I agree there are issues with the oscommerce stores. However, prior to Big Daddy, my osc stores ranked very, very well.
With so many sites reporting google issues (not just osc stores), I am not sure jumping in and changing everything is such a good idea.
Hopefully google will fix their latest buggy update.
The only people I see in denial about this are the ones who think that it's the same google with some light window dressing type changes. And whose sites not only do not have any trust, they will never have it. And who will never fix their site's technical errors, let alone admit that they exist, or should be fixed.
I did get a dedicated OS commerce company to do this for me.. someone I actually found on this forum!
LOL, that's really funny. I did one OSCommerce install, and it will be the last one I ever do, a repair actually. Anyone who can work with that product and not be totally repulsed... and then who decides it's a good idea to form a 'dedicated oscommerce' company, is probably one of the developers I've heard about who have come to realize they have the name recognition that will make people use their product, and the product itself is so bad that you have to pay a dedicated company to implement it. And I'll bet at least one employee of that company is an oscommerce developer.
zencart etc forked because oscommerce was so bad. To me, anybody who gets paid to setup shopping carts who does not exert every possible effort to guide you away from oscommerce is highly suspect. And is definitely not a good programmer, since no good programmer I know of would want to touch that garbage. You couldn't pay me any amount to ever work on any oscommerce install again, I consider implementing that product, as a developer, a highly unethical action, since I know how bad it is, I could and would never allow any client to use it.
And as you can see by pageoneresult's quick overview of the site in question, in fact that company did exactly as bad a job as I would expect, since nobody good would use or specialize in that product.
And just for the record, I use almost all open source / free software products, for development, desktop, and hosting.
The thing that really bit was the lack of standardised methods across the package. Different bits did the same task in completely different ways.
The package has several huge stylesheets, yet parts of it are stuffed full of font tags.
Almost no thought.... No. No. Absolutely no thought had gone into how the damn thing interfaces with search engines.
There is no attempt to keep "useless" pages or duplicate content out of the index.
The same things are true of many such programs, scripted packages: CMS, carts, forums, etc, even stuff like vBulletin and PHPbb fare little better.
I recently spent several hours in on/off PM exchanges about a site showing obvious duplicate content problems (from "same meta description on every page" design).
Even after showing explicit search examples that screamed "big problem here", I was brushed off with "that can't be the problem".
Having seen and fixed the same problem on several dozen websites alone this year, I can assure you that it is the problem.....