I used the free Xenu program to generate an XML sitemap for my medium-sized Q&A site that has around 1,500 individual URLs.
HOWEVER, when I opened the XML file it generated, I saw that it actually included about DOUBLE the number of URLs my site actually has, because it included all sorts of variants of the same pages, for example with different sort orders or filters applied. So included in the XML sitemap were URLs like:
- http://example.com/threads?direction=asc&page=17&sort=title (index of threads sorted by title)
- http://example.com/threads?direction=desc&page=2&sort=title&tag%23Education=on (index of threads with both a sort-by-title and category-filter applied)
- http://example.com/threads/show/2283-lawyer?sort_by=newest (an individual Q&A page with content sorted by date)
My question is simply whether those URL variants belong in an XML sitemap or not. Or does it really not matter (aka will Google ignore them anyway)? I figure cleaner is always better, so should I manually remove all of the URLs that are just sort/filter variants, leaving only "real" URLs?