Forum Moderators: phranque

Message Too Old, No Replies

How to map / get idea of a website with 4,000,000 pages?

         

tpb101

10:29 am on Feb 13, 2015 (gmt 0)

10+ Year Member



I don't have access to the server, and I would need to get to know the structure of a website with 4,000,000 pages. This is what I will be working on. I need to look at the website as a whole, and all individual elements, and possibly come up with a plan on what would need to be changed.

There is some sitemap tools, but from what I remember, they don't work very well with big sites. What would be the best way to do it these days? How to get the website structure, and all urls, also fairly fast?

Thanks.

n0tSEO

12:49 pm on Feb 13, 2015 (gmt 0)

10+ Year Member



Is this a static or a dynamic website (e.g. Wordpress, Joomla!, etc.)?

If it runs Wordpress, I found Yoast's SEO plugin to help with XML Sitemap generation; you can paginate your sitemap with the "Max entries per sitemap page" option.

Hoople

2:47 am on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Xenu Link Sleuth is worth looking into. The Xenu Wikipedia page at 'Xenu's_Link_Sleuth' has a Reference section of many links too.

The author has a Yahoo! support group at 'tech DOT groups DOT yahoo DOT com/neo/groups/linksleuthupdates/info' In that group is a beta version tailored for huge sites. Posting questions there will get an answer. The tool's author usually answers in a day or so.

tangor

3:36 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Map from the outside with Xenu (as mentioned above), but do have to ask why you don't have access to the server. Are you a member of that site's team? If so, why no access? If not, then what is the reason/need to obtain this info?

lucy24

8:56 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would need to get to know the structure of a website

Do you mean the visible URL structure, or the physical file structure?

Even if there's some legitimate security reason why they can't give you server access, there's no reason they can't send you a screenshot of the directory structure.