|Using Google Sitemaps To Find Panda-Hit pages?|
I was looking up something Panda-related yesterday and stumbled across an article written in August of this year about Panda recoveries. One suggestion it had for figuring out which section(s) of your site were hit by Panda was to break down your site into small chunks in sitemaps (by category, folder, or whatever) and submit those to GWT and see which areas had noticeable discrepancies between submitted and indexed numbers.
Has anyone tried this idea and found it to be helpful (or accurate)? I've already noindexed my "bad" content so I'm not sure I'd see anything really stand out at this point but I'm curious about this method. I was also under the assumption that Panda-hit pages were still in the index and that was what was bringing the site down (or is that assumption wrong?)
This can't be accurate. There will always be discrepancies between submitted and indexed numbers and you can't control how many pages Google chooses to index out of each sitemap.
Breaking down your site into small chunks is a good start, only you have to do this on Analytics. Find out which section has been hit by comparing the traffic before and after the Panda update.
I have done this and it is a good indicator, depending on the quantities of pages you are talinking about.
The sector I was perfoming in. allowed me to created dozens of sitemaps of 100 pages each.
No reason why any of the pages should not be indexed.
I found some sitemaps with 0 listed then others from 25 up to the full 100.
I then discovered trends. IE pages with similar title tags and URLS. (the on page content was considerably different, which is why I did not remove them initially)
I then did different experiments with each sitemap group, until I saw a recovery, then applied the solutions across the board.
Have you changes titles and URLS to those that haven't been listed (at first) until they were listed/indexed?
|No reason why any of the pages should not be indexed. |
First, you're assuming that Google's indexing process is perfect.
Second, you're assuming that, if Google is failing to index certain pages, it's because those pages aren't passing the Panda sniff test.
The first assumption is highly questionable, and the second assumption is just a shot in the dark.
Lets just clarify that I answered the orginal question and dont have an issue anymore.
I now have around 98% listed in google to sitemaps.
I got my recovery.
The pages that were not listed, were similar versions of those that were.
Then looking at search criteria, I noticed that traffic wise the pages missing they didn't deliver that much.
As I started to clean these pages away, other better pages returned and showed in sitemaps, it was interesting to see pages clashing that I would not have come to that conslusion so I got an indicator I was on the right track.
As I say, I have a 98% listing in sitmaps compared to a much smaller amount originally.
Traffic is down 20% on the old days, but with 400% less pages on the original site.
Yes 80% of my site pages was delivering only 20% of traffic.
I created a second site with a completely different approach, content structure that looked at those speciality search terms I had lost, and with a relativly small amout of pages, got the last 20% or so traffic back also.
I would say that conversion is down though, as these specific lomg tail terms did convert better, when just a small change in a page content (product info), would relate to a cutomers need more specifically.
Could you please provide a link to the article you are referring to so that we can take a look?
Getcooking how can i split my sitemap? i think this trick will help me a lot. I have a lot of suplementary posts (omitted results) in the google serps.
May I ask what the solutions were that you applied across the board once you'd finished experimenting with each sitemap group?
I orginally thought the issue was quality of content, so I rewrote different content in a different way in first 3 different subgroups.
Subgroup 4, I focused on removing h1 tags etc
subgroup 5 title tags change
subgroup 6 was sub url names
I got no improvement in any pages(removed from index and number count on sitemaps) returing until I started on urls (subgroup 6).
If I had 3 or 4 urls that had a similar description (actually more specific to the products), I removed certain variations (using remove url tool in wmt even though they were not counted in sitemaps, or appeared in index), until one version reapeared both live and I saw increased index number in sitemaps.
Just to clarify, similar urls: the content was not as similar.
(the sector I am in, would have suggested I needed each page. Similar in url name, but very different in actual real products).
But in the end affected 20% traffic and reduced conversion rate also.
I learned I hadn't got to grip with Panda, focusing way more on the "improve content" aspect, rather than on the urls.
The content had always been good enough and unique, but when Panda went live I was initially convinced thats all it could be.
Once I started choosing which urls to keep and which to let go, the count of other pages I had chosen to keep increased daily.
I then repeated the same process on the other 5 groups, and even with (Now) different content, title tags etc, they too started to come back.
When creating new pages (even with blog posts and news items), I think very carefully about what to name the url and will it be too similar to other pages? Even when the actual written content follows a completly different line of subject.
Unfortunately the Forum Charter [webmasterworld.com] prohibits us linking to that article, but the article in question suggests putting each of your site categories into a separate sitemap and then comparing the Pages Submitted vs. Pages Indexed for each category. It does not say much more than this on the subject.
It is important that you include in each (sub)sitemap all pages that are allowed to be indexed for that category (I am saying that because sometimes sitemaps include only important pages. For this analysis to be of any use, you must not restrict the sitemap to important pages only)
Read these results with care. If you have duplicate content problem, the reason why the page from the sitemap is not indexed may be because Google picked up a different canonical URL to what you have in the sitemap for the same page.
Create multiple sitemaps (sitemap1.xml, sitemap2.xml etc) with each having URLs from one of categories (or sections) from your website.
Then you can either submit all these different sitemaps to WMT or better, create a Sitemap Index file where you list all sitemaps you created. More on sitemap index file here:
Sitemap index file
Thanks for the explanation.
So you basically just changed your urls to make them distinct within your site and your rankings recovered. Have I understood that correctly?
Congratulations on your recovery by the way.
yes I suppose so
Some urls were changed altogether and others i just chose a master url and removed those that were similar.
I normally chose the shortest one.
I didn't change the url/subfolder structure, just the url names within that structure/folder, which in my case was 2 levels in.
It was almost as if the the importance of the url names were more important that the content within.
The urls were similar, but even today I think the actual written content was different enough to deserve being presented as seperate pages .
Hi new here,
My site was also hit bad by Panda back in Feb 2011. I pretty much stopped posting on my site hoping the 'panda' would be short lived and Google would find a new improved way to detect quality content. 3 years later my site has plummeted to the bows of google search.
Last 2 weeks I've been working on my site, blocking googlebot on dupe pages and cleaning up junk pages as much as possible on my Wordpress website.
I luckily stumbled onto this thread and would like to learn more on how to make category sitemaps like flanok did. I'm almost 100% sure my site has the same problem he is suggesting with titles/URLs being too similar as my website is about guitars. I mean I post about Les Paul guitars all the time as they have so many and every page is unique, but I think Google sees them as similar and hit my site.
Anyhow, any video tutorials or a reference on how to make multiple XML sitemaps? I looked for a wordpress plugin that does this, but had no luck.
Any help would be greatly appreciated.
I expect if I were going to do it, I'd probably do it manually to start, just to see if it made a difference. I'd probably run my favorite SEO spider tool (which does create sitemaps) on it, and just break it down into chunks with a text editor.
If you have thousands of pages, then you're going to want to code something that will create it for you.
But all that said, I'm not sure I buy in to this. I have some sites that have 99% of all pages indexed according to GWT, and I have sites that have about 50% indexed according to GWT, and they both seem to do equally well. Morever, the type of page that GWT claims it isn't indexing (one of taxonomy classes) is in fact indexed, ranking, and bringing in traffic, no matter *what* GWT says. So try it by all means, but don't be surprised if it doesn't seem to help.
I've noticed that sites that have lots (lots) of intermediate reading level content, or sites that have virtually zero advanced reading level content (as rated by Google), seem to get a "free pass" on some Panda penalties.
Frankly, I don't understand it.
If the conclusions of this thread are true, Google has stamped a history of good software engineering practices (unique, specific, useful, filenames), as bad practice.
In the days of assembly code programming, punch cards, paper tape, and cassette tape, (60's,70's), one dreamed of a future where one could use more than 6 characters to name anything concisely.
I will say, similar file and path names are the hallmark of scraper sites, of course they typically have tons (tonnes) of pages.
I find this topic interesting because I don't think the sitemap tells us anything. For some reason, Google's never indexed about three out of 1,500 pages. It's been that way for about two years. I add about six pages a day and Google typically shows a six page increase in about four days.
This particular site has shown signs of recovery recently, but has about 20% of it peak traffic.