homepage Welcome to WebmasterWorld Guest from 54.166.84.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Can't decrease crawl errors or delete non-existent pages from Google
Drag1



 
Msg#: 4633924 posted 3:18 pm on Dec 28, 2013 (gmt 0)

Hi everyone,

I few months ago I installed sample data that came with a new template I bought for my website. Sample data included thousands of random pages with random content and a few modules that I thought were ready made for me to simply change so that I can keep the original coding/quality/look...Oh no. Not that easy.

My website dropped from no.3 in SERPS to outside of top 500 within a week. I have been trying for months to fix this but I just can't. I deleted the site from my server and 'started again' but I now have over 1300 crawl errors in my Webmaster tools and due to all the 404 pages that I can't delete from Google cache. I can't 301 them as they are completely irrelevant, e.g. a page will show content about the template features which I can't re-direct anywhere as my site is about the finance industry.

I know Webmaster tools has a tool for this but the results are temporary. What is the best way to tell Google that those sample data pages are worthless and they need to be deleted from their index?

If you can help me with this I would really appreciate it.

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4633924 posted 6:17 pm on Dec 28, 2013 (gmt 0)

Is there any pattern to the URLs of nonexistent pages? Google does tend to understand a 410. So if it's practical-- and if you don't mind "taking responsibility" for pages that never existed in the first place-- returning an explicit 410 might work.

But even with a 404, the number of requests should drop off. If they're requesting the pages just as often now as they did at the beginning, it implies that the request is being reinforced somewhere. Pick some random 404s in wmt and see where google claims to have heard about it. Make sure they're not linked from anything currently in your site, and make sure they're not on an auto-generated sitemap. (The words "in sitemap" by themselves don't necessarily mean you did anything wrong. It just means that some sitemap at some time in the historical past had this URL on it.)

Drag1



 
Msg#: 4633924 posted 8:27 pm on Dec 28, 2013 (gmt 0)

There is no pattern really except that all the URL's are derived from the sample data. For example http://www.example.com/66-sample-data-articles/joomla/extensions/modules/display-modules/19-footer-module - note the'sample-data' string in the URL.

So, pages like this existed for a few days when I downloaded the sample data (big mistake) then after I deleted the site and re-built it, they became 404's. The thing is that the 'Linked from' tab in WMT for the above 404 URL shows that the page is linked from other pages but they are 404 pages also?! How can it state they are linked from other pages when the pages they are linked from either don't exist or they are not linked from at all? For example, there a few 404 pages that WMT states they are linked from my homepage, but that is not true because I have rebuilt the site.

Also, when I hover over the URL in WMT to see a preview I get a really strange/basic text version of a site with half the page showing my content and half showing the sample data content. So, I thought the cache must be out of date.

[edited by: aakk9999 at 8:55 pm (utc) on Dec 28, 2013]
[edit reason] Exemplified - No URLs as per Charter [/edit]

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4633924 posted 8:34 pm on Dec 28, 2013 (gmt 0)

How can it state they are linked from other pages when the pages they are linked from either don't exist or they are not linked from at all?

It's WMT -- Notoriously slow to update and "glitchy" on a good day.

For example, there a few 404 pages that WMT states they are linked from my homepage, but that is not true because I have rebuilt the site.

Nothing you can do except remove them [already done], wait, and quit worrying about WMT telling you things you know aren't accurate.

-- The pages and links are removed. That's all you can really do except "get back to business" and keep building your site.

BTW: Welcome to WebmasterWorld!

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4633924 posted 9:17 pm on Dec 28, 2013 (gmt 0)

There is no pattern really except that all the URL's are derived from the sample data. For example http://www.example.com/66-sample-data-articles/joomla/extensions/modules/display-modules/19-footer-module - note the'sample-data' string in the URL.

So there are other URLs containing the string "sample-data" that are legitimate pages? No recurring theme to the numbers (here "66-"), or some other part of the URL? Eeuw.

The term "linked from" doesn't necessarily mean the link is present right now. Like "in sitemap", it simply means that's how the search engine first learned about the page.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4633924 posted 9:48 pm on Dec 28, 2013 (gmt 0)

Welcome to WebmasterWorld, Drag1!

Google search for inurl:/66-sample-data-articles/joomla/extensions/modules/display-modules/ returns over 3 million results from many websites, so I would guess you are not the only one with this problem of random URLs being created.

Unfortunately, whilst it is so easy to leak URLs to Google, it can take a long time for Google to drop URLs from its 404/410 graph. Google will drop URLs a bit faster if you return 410 instead of 404.

As Lucy says, you should investigate whether you have legitimate pages with 66-sample-data-articles pattern in URL and if you don't, use this pattern to return 410 Gone.

Even then it can take more than a year for Google to drop these pages - depending on the size of the site, the number of URLs that return 404/410 and also depending on how often the site is crawled.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4633924 posted 10:11 pm on Dec 28, 2013 (gmt 0)

All good advice, yet do know that none of the sesrch engines, G, B, and Y in particular, ever "forget" a url they have met. Expect at some time in the distant future that url request will come back again... so keep those 410s in place.

I still have pages deleted (properly) over 10 years ago still being requested on occasion.

Drag1



 
Msg#: 4633924 posted 1:11 pm on Dec 30, 2013 (gmt 0)

I honestly, never knew that Google would act like this. Thanks so much everybody, you have been a real help.

From what you all say, the first step is to return a 410 error code for all those pages. Does anybody know a place on the web where I can find out how to do that? I haven't done this before so i want to make sure I do it correctly.

Thanks

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4633924 posted 4:16 pm on Dec 30, 2013 (gmt 0)

From what you all say, the first step is to return a 410 error code for all those pages. Does anybody know a place on the web where I can find out how to do that?

I suggest that you search our Apache Web Server [webmasterworld.com] forum.

There are many examples of how to serve 410 Gone using .htaccess directive. You may also post a thread there with your question - but try to give your best shot at creating .htaccess directives yourself first, then post what you have come up with, changing your domain name with example.com

Drag1



 
Msg#: 4633924 posted 12:34 pm on Jan 3, 2014 (gmt 0)

I will. Thanks again for everyone's help. I really appreciate it.

andrewc



 
Msg#: 4633924 posted 6:55 am on Feb 14, 2014 (gmt 0)

@tangor. @aakk9999

I have the same problem, trying to get rid of some URLs. I added the 410 header this week, but the URLs show up as 404 response code in webmaster tools. Shouldn't i see 410 or google doesn't display this?


Andrew

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4633924 posted 9:59 am on Feb 14, 2014 (gmt 0)

Double check, please: In wmt there's a category for "page not found". You then have to look closer to see which response code is returned, 404 or 410. I think Bing goes into even finer detail.

What exactly do you mean by "410 header"? Where servers are concerned, "exactly" is key. That's assuming the response is being returned by the server as such; if so, you can also check your access logs. If the response header is generated by php or similar, you won't learn anything from logs. But it's worth trying some random URLs with Live Headers or equivalent to make sure you're getting the intended response.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4633924 posted 11:32 am on Feb 14, 2014 (gmt 0)

http://support.google.com/webmasters/answer/2409439 [support.google.com]:
Currently Google treats 410s (Gone) the same as 404s (Not found).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved