DDOS by Doctype: W3C burdened with excessive DTD traffic

From the W3C Systems Team: W3C's Excessive DTD Traffic [w3.org]

If you view the source code of a typical web page, you are likely to see something like this near the top:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
and/or
<html xmlns="http://www.w3.org/1999/xhtml" ...>
These refer to HTML DTDs and namespace documents hosted on W3C's site.

Note that these are not hyperlinks; these URIs are used for identification. This is a machine-readable way to say "this is HTML". In particular, software does not usually need to fetch these resources, and certainly does not need to fetch the same one over and over! Yet we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven't changed in years.

That's 1500 requests a second, with close to 100% of requests being totally unnecessary. Note that normal web browsers such as IE or Firefox don't fetch the DTD, or if they did they would cache it.

I believe this is a good example of the effect that rogue crawlers are having on websites - there are a large number of crawlers which grab pages and fetch everything which looks like a link - and the crawlers are not sophisticated enough to analyse the HTML and realize that the links in the DTD and

xmlns

should be ignored.

So, as there are millions of documents out there which declare a "full" doctype, we are all contributing to a permanent distributed denial-of-service (DDOS) attack on the W3C!

The W3C are partially to blame - they placed the DTDs under their primary website (www.w3.org) instead of a dedicated subdomain, and they encouraged the use of their DTDs for all (X)HTML documents, despite the fact that the XML folks all know that DTDs Don’t Work on the Web [hsivonen.iki.fi]. Perhaps it's time for me to update the Doctype FAQ [webmasterworld.com] and suggest some of the (several) doctypes which conserve standards-compliance mode but don't include the DTD link...

DDOS by Doctype: W3C burdened with excessive DTD traffic

encyclo

Solution1

JAB Creations

encyclo

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week