Forum Moderators: Robert Charlton & goodroi
I've added a permanent 301 redirect, absolute links, removed all duplicate content. As I was reviewing the site, I discovered a few things that seem strange. FYI: The site
1. Google indexed back in January, some pages that do not exist on my site' server. eg. <snip>
There a handfull of these dyanamic URLs that are still showing up in the index. Any idea where they came from? I have two valid PHP pages on my site that ask the user to reserve a table or send in a comment. They contain an image verifier script via a formmail.php file that I downloaded from <snip>. My fear is, that these URLs are being seen as duplicate content.
2. I have noticed that there are countless restaurant/wine related Web site directories that have hijacked a lot of my site's content/wine list/menus, and posted on their own site (which, in turn, back link to our site). Could this duplicate content on these other Web sites be a contributing factor in my site's severe loss in page ranking?
I recently found this thread, <snip>
about dealing with and finding duplicate sites.
Thanks for any assistance.
[edited by: trillianjedi at 3:10 pm (utc) on July 11, 2005]
[edited by: ciml at 3:34 pm (utc) on July 11, 2005]
[edit reason] No URL drops please as per TOS #13 [/edit]
<snip>
Any idea why these bogus (but live) dynamic URLs are appearing in the Google index?
[edited by: trillianjedi at 3:11 pm (utc) on July 11, 2005]
[edit reason] See above. Thread also being moved to google forum. [/edit]
Google: There's almost nothing a competitor can do to harm your ranking or have your site removed from our index.
I wonder if I can think of one...
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
Not saying this is what happened, but it sure seems possible.
I wonder how many times this page will get indexed?
I wonder if they all 'count' as the same page, or duplicate content?
Justin
Edit: Attribution
Google currently has many thousands of these against its own directory. They all end with?il=1
Now what does one make of that?
I gave up trying to find a site using those urls so I assume that someone or something submitted them and went bye bye.
Some are fully indexed and now supplemental, some are url only, cache dates go back to last year and some have current cache dates.
BTW Justin, thanks for the sticky, I will give it try.
I have two sites that are now immune to that problem, one left to go.
PS:In ref to the links is that before or after the redirect hits Google?
I'm sure MikeNoLastName would say they will get indexed .
PS:In ref to the links is that before or after the redirect hits Google?
If you mean the code I sent you, the code will server a 301 to the non-? version of the page, and then if the page exisit a 200 and if not a 404, which might be confusing to the little bots, but I need to be able to pass query strings and that one lets you.
If you are asking about other redirects/links I am not sure what you mean... the three links above serve a 200 on the first request, just like they should =).
The tough part with the whole situation is if the content changes based on the ?blah=stuff you want the pages indexed, but if it does not, then you don't. So, in some cases (where a script uses the parameters to serve the right information) it is right for SE's to index ?page=1, ?page=2, ?page=3 as different pages, but in other cases (where the content stays the same) they should not be indexed.
Unfortunately, there is not really an easy way for a SE's to determine which ones should be indexed and which should be dropped, so that leaves it to us to protect our site the best we can.
I believe the best way around this is to serve all pages as html and rewrite to any necessary script(s). Then you can catch any, uh, *bad* requests on the way in and decide what to do with them.
Personally, I use php and do not serve a file that needs parameters (or one that doesn't) as php, unless I have to. I use mod_rewrite to pass the variables and serve all my pages as html. I initially started doing this to protect my scripts, but in hindsight, some of the other added benefits far out-weigh the 'script hiding' aspect.
Justin
Added: I use mod_rewrite to pass the variables and serve all my pages as html. This is not the same as parsing html as php.