Welcome to WebmasterWorld Guest from 35.172.100.232

Forum Moderators: goodroi

Sitemap Creation

     
6:19 am on Sep 12, 2019 (gmt 0)

Junior Member

10+ Year Member

joined:July 10, 2007
posts:122
votes: 0


Hello Forum,
I need a suggestion regarding sitemap creation of our company Website.
Our company has N number of pages and honestly speaking we are not getting a proper info of how many pages we have.
Recently I have been given the charge to supervise the SEO Activities. The first thing I discovered that in GSC – a huge number of pages are de-indexed due to redirection issues. I am working with the developer to solve that.
Another huge problem our website has is no Sitemap submitted in GSC. I have suggested to have a proper sitemap – so that Google can crawl our site through that. My idea is until we solve the redirection issues (de-indexed pages by Google) – don’t want Google to crawl those pages. Those pages (having redirection issues) are linked from within the site and outside the site as well.
My concern is; if we submit xml sitemap in GSC – Google will crawl those listed pages only and ignore the pages not listed there?
Any suggestion can help me to move forward with the sitemap issue.
Regards
Utsav
12:25 pm on Sept 12, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4453
votes: 330


Google will crawl those listed pages only and ignore the pages not listed there?
A sitemap should list all your pages but there are cases when you do not want to list all URLs. Not all websites are built the same way and there are platforms such as Joomla and WordPress that have their own URL environments.

You do not mention whether the site you are asking about is static html, dynamically generated, or using a CMS platform so I have no specific suggestions. In general you would want to have a clear idea of all the existing files and the indexing goals for your pages and directories (or folders).

Google generally will follow all links found on a site unless they are disallowed in your robots.txt file. You don't want to list pages on a sitemap that are blocked in robots.txt but they will crawl pages that are not listed on a sitemap. A sitemap lets you show the pages you want to have indexed. It is like a guide or map.
3:10 pm on Sept 12, 2019 (gmt 0)

Junior Member

10+ Year Member

joined:July 10, 2007
posts:122
votes: 0


Thanks for your suggestion; our company URLs generated through CMS. But my question is with de-indexed pages with redirection issue. Until we solve those I want Google not to crawl those URLs. The list is huge and with different pattern. So you can understand that it will be difficult to block those through robots. What can be done in this scenario? Even I am not sure whether it is a good idea to stop Google from crawling those pages - or we can let it de as it is and keep rectifying those URLs.
3:32 pm on Sept 12, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4453
votes: 330


Some pages have been changed or replaced with newer pages? If this is the case, you may be able rewrite the old URL to the new page URL and just remove the old page. You would only want to use rewrite/redirect when the new page is a replacement for the old page. If the new page is not a replacement you can remove the old pages and return a 410 (Gone) server response.

Old pages which are being replaced should include a noindex header so that the new URL that replaces the old URL will be seen as its replacement. If the old page is just being dropped, it should not be redirecting to any other page. As you can see, there is not one rule for all changes, it depends on the purpose of the change.

You mention that the URLs are CMS generated but without knowing "which CMS?" I can't offer any better suggestions.
4:47 am on Sept 13, 2019 (gmt 0)

Junior Member

10+ Year Member

joined:July 10, 2007
posts:122
votes: 0


We are using Adobe Experience Manager as our CMS, I am not knowledgeable enough to give you any insight about it. It is totally managed by our Dev team.
Regarding new pages, as per requirements dev team create new pages every now and then. But sometimes they do 302 redirections from old to new or sometimes they don't even care to do that. This process is going for a long time. As a result the list of such URLs are now huge and our SEO team is struggling to manage such pages from Google de-indexed list.
5:05 am on Sept 13, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4453
votes: 330


That sounds like quite a large number of pages to carry with an uncertain status. A 302 redirect is a Temporarily Moved status so that the "new" URL is not likely to be indexed. It appears to need some direction from management to be able to create a useful sitemap.
6:12 am on Sept 13, 2019 (gmt 0)

Junior Member

10+ Year Member

joined:July 10, 2007
posts:122
votes: 0


So what do u suggest; after rectifying all the de-indexed issue - we should go for sitemap? If yes, then will the time required to solve the de-index issue is going to harm anyhow?
12:00 pm on Sept 13, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10281
votes: 1049


Do the housecleaning (ie, if a page is truly gone, let it die as a 404, more preferably as a 410... which requires a redirect).

Quit making mistakes. That should be rule #1.

A sitemap will not cure the prior ills. But will at least let g know what you WANT to be indexed.

G never forgets a url it has met... they continue asking for those DECADES after the page disappeared. YOU CAN'T FIX THAT and trying to is a waste of time.

GSC is not that reliable, just keep that in mind. Think of it as an indicator of how befuddled g is at times. :)

As for how many pages you have ... check your folder/files and run a site review. Then you will know exactly how many YOU have. As for what g thinks you have don't lose too much sleep over that... that's their problem, not yours.

Your site map, if you use one (I never have), should be EXACTLY what you want indexed/crawled. You cannot do better than that for that purpose. If g still gets it wrong that's on them, not you.

Sometimes you can't fix stupid.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members