If you understand the underlying technologies you'll see where this question loses a little steam. :-)
First, html markup: a method of marking up content so that it tells the browser how to render it. HTML has a fixed set of elements the browser can interpret, right? <p>, <h1>, etc. There is an "agreed" set of standards for what HTML should contain and how the browsers interpret it, set forth by the W3C (and I.E., historically, has chosen to take their own path in this agreement - but that's another story.)
Along side this is XML, which means extensible markup language. There are a few small "fixed" elements because nearly all of XML is created by the developer. That is, you create tags to do what you need to do.
I use XML nearly every day, or connect with an API that uses it. Two examples are payments gateways and other API's, such as SalesForce, SOAP, etc. You compose a valid XML string, pass it to the API, and it responds with an XML string you parse and interpret. RSS reader programs generally request URL's to a valid XML RSS feed, that is what those are for, so you can filter the feeds you want to see based on the XML definition. When you say "required for putting up a site", it depends on what you want to do.
For the sake of argument, we'll say you mean just outputting pages. Given the previous scenario - use check out, you post XML to the gateway and parse the response (thus have "used" XML) and output to the browser. Now is where we get into XHTML and HTML.
The idea occurred to mix HTML with XML. "Hey, let's apply the robustness of an extensible markup language to HTML, so we can embed XML directly into web pages, and provide devices with contextual information." With a custom DDT, you can do this. Consider my often posted example:
<p><movie>The Titanic</movie> was a movie about a <adverb>titanic</adverb> ship that sunk in the North Atlantic. The <ship>Titanic</ship> was the largest ship of it's kind.</p>
Read the paragraph, humans know what each context means. Machines don't. They're all the same dictionary word. In this way, we can define a custom DDT that defines the context. In this way, our paragraph becomes relevant for movie searches for the movie The Titanic, and not relevant at all for the definition of titanic or history of The Titanic.
You can see what a powerful tool this could be, affording real relevance to your copy. It was predicted that XHTML would eventually replace HTML, and that, for most people, is about all they hear: HTML is old school, be on the cutting edge, use XHTML.
Sadly, this never happened. Most developers never really understand XHTML and just output vanilla HTML with the funny /> 's and tout their code as "cleaner" because of it - but it's just as easy to have "clean" HTML. In my opinion, it's also far more relevant.
If the only thing you use a page for is plain old HTML, there really is no reason to use an XHTML doctype. If it's HTML, call it what it is!
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/strict.dtd">
Or - it's pretty evident HTML is here to stay, HTML 5
If however, you actually intend to use the features of XHTML, create DDT's and extend the HTML set, by all means - call it what it is and use it accordingly.
A historical (I'd almost say epic) thread that is still relevant today:
Why most of us should NOT use XHTML [webmasterworld.com]
So you can see, you will probably have opportunity to use XML at some point, but the decision to commit to XHTML or HTML is really only relevant if you have basis for making the decision. Personally, sometimes I use an XHTML doctype simply because this trend has made it too difficult not to - Wordpress, for example, uses (aaargh! should be configurable) XHTML style <br/> and <img/> tags in their core code, which forces the issue unless you want to hack it every time an update is deployed (which has been a couple times a month lately.)
Check the horizons starting the project . . . if all you see is HTML, use HTML. :-)