homepage Welcome to WebmasterWorld Guest from 54.205.144.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
GWT Reporting Duplicate titles & metas? Where are they coming from?
Frost_Angel




msg:4497100
 9:48 pm on Sep 18, 2012 (gmt 0)

I went into my GWT account today and normally I might have 2-3 HTML optimization suggestions from Google in there.

Today I have over 1000+ ?

My site is in wordpress. So I'm not sure what I might be doing wrong because it's been in wordpress for 1.5 years? Why there errors now?

I'll use "example-page" instead of my actual page name.

This is what I am seeing:

/example-page/1345455629000/
/example-page/1345547746000/
/example-page/1345557128000/
/example-page/1345560196000/
/example-page/1345560437000/
/example-page/1345677539000/
/example-page/1345696266000/
/example-page/1345803012000/
/example-page/1345803037000/
/example-page/1345803063000/
/example-page/1345885749000/
/example-page/

For each instance or page that Google reports these - they all lead back to the original/actual page. But where are they coming from?

 

phranque




msg:4497151
 12:56 am on Sep 19, 2012 (gmt 0)

you might be serving the same content at those urls and at one or more other urls.

Frost_Angel




msg:4497153
 1:06 am on Sep 19, 2012 (gmt 0)

I know the same content is being served. That's the issue.
How are the pages with the numbers on the end being generated? Somehow they are being generated and Google is calling them duplicate content when there is really only the original post?

phranque




msg:4497160
 1:10 am on Sep 19, 2012 (gmt 0)

what does the duplicate url path look like?
normally the non-canonical url gets internally rewritten to index.php and the script issues a 301 external redirect in response.

MinosTheNinth




msg:4497216
 7:16 am on Sep 19, 2012 (gmt 0)

What do you see in GWT, when you navigate to Configuration -> URL Parameters?

lucy24




msg:4497251
 9:00 am on Sep 19, 2012 (gmt 0)

Unfortunately these aren't parameters. They're part of the URL.
/example-page/1345455629000/
/example-page/1345547746000/
/example-page/1345557128000/

For comparison purposes, the present page is

http://www.example.com/google/4497098.htm

As it were. And in fact if you search these very forums for anything specialized enough to yield only a page or two of results, you'll find half a dozen variant names leading to the identical thread. But I don't think the People Up Top are worried ;)

gehrlekrona




msg:4497342
 1:27 pm on Sep 19, 2012 (gmt 0)

I also see the same and a lot of weird paths that never existed and page not found series after the last crawl spike. Maybe something went wrong?

g1smd




msg:4497345
 1:36 pm on Sep 19, 2012 (gmt 0)

Those numbers look like datestamps. Is this related to sessions in some way?

Frost_Angel




msg:4497353
 1:43 pm on Sep 19, 2012 (gmt 0)

@MinostheNinth

This is what is in the box at the Configuration-->> URL parameters:

c month day week

indyank




msg:4497368
 3:21 pm on Sep 19, 2012 (gmt 0)

Aren't you using canonical urls? Unfortunately in wordpress you can append any number towards the end like your example and it will return the same content as /example-page/.

The best way to handle this is by using canonical urls on your posts.

There are millions of wordpress blogs that suffer from this. Googlebot is probably discovering them thro. their buggy javascript crawler. Do you see any referrers for these links in your logs? If not it is surely the result of their js crawler.

Frost_Angel




msg:4497409
 4:48 pm on Sep 19, 2012 (gmt 0)

@indyank

Hopefully I'm not a totally doofus - trying to understand what you're saying.

Do you mean that any site links on my site should have the FULL url when linking?
Like is should be: http://www.example.com/page1/
and NOT just: /page1/

tedster




msg:4497478
 7:18 pm on Sep 19, 2012 (gmt 0)

Just my personal opinion, but I don't think that's exactly what indyank meant. It seems to me that he's talking about the canonical link element in the <head> section. That's a very reasonable approach to this kind of Wordpress trouble, and there are canonical tag plug-ins available to help with the job.

<link rel="canonical" href="http://example.com/page1/">

That way, search engines that read the canonical link will know that, no matter what exact URL they requested, the URL to be indexed is shown in the <head>. There are over 40 possible canonical problems [webmasterworld.com] and many of them can be challenging to deal with depending on your hosting.

The URL listed in a canonical link is not 100% binding, technically - but Google does take it as a very strong suggestion. For that reason it is best to deal with the potential canonical errors [webmasterworld.com] directly on your server whenever you can.

Before you go live with a canonical link, it's good to double check to be sure you aren't creating any canonical disasters [webmasterworld.com].

Frost_Angel




msg:4501309
 7:26 pm on Sep 28, 2012 (gmt 0)

I don't know if I can post a link here to Google forums - but my original issue is something others are experiencing and it seems to be an issue with Google and Disqus conflicting or something. But my errors have skyrocketed to 7500+ and grows daily. In case someone else has this issue - I thought this might help since it was none of what was suggested here by those that commented.
[url]https://productforums.google.com/forum/#!msg/webmasters/zvez-eib0Ao/lf08lf_K8AkJ[/url]

tedster




msg:4501451
 4:04 am on Sep 29, 2012 (gmt 0)

A link to an official communication from Google staff is fine - and this one comes from John Mueller, who was handson in SEO before he took the job at Google. He's been an excellent advocate for webmasters and a good communicator on the Google forums.

According to John, commenting on an explosion of 404 errors, it looks like this is the bottom line:

It does look like we're picking up something funny via JavaScript there. We're looking into what can be done in this particular case. In the meantime, keep in mind that 404 errors of URLs that are invalid...are not something that would affect your site's indexing or ranking...


This is a bit different from the opening post, however, because the server IS resolving the links rather than sending a 404 Not Found. While the source of those URLs may be the same JavaScript crawling problem, the fact the your server is resolving those URLs is something you should address. It increases your site's vulnerability to both accidental errors and malicious attacks.

I'd look into what kind of add-on characters can be appended to a valid URL and have the new URL actually resolve. Then take steps to either correct the bad configuration, or add a line in .htaccess to at least NOT return a 200 OK for those artificially padded URLs.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved