> Could someone please point me to a description of legal html?

The "current W3C released" version of HTML is 4.0.  The following is an
introduction to HTML 4.0

> I'm working on a web crawling program and am continually astonished at
> the bizarre things in html that seem to be a)done in the real
> world, and
> b)acceptable to at least Netscape.

The ambiguities found in the world of the web is sometime overwhelming.
First HTML is not the only thing going on in a web page.  There are a number
of various technologies there [HTML, Cascaded Style Sheets (CSS), Document
Object Model (DOM), Extendable HTML (XHTML ), and scripting.

Neither Netscape nor Microsoft fully support HTML 4.0 [or previous version].
Microsoft dreamed up a bunch of tags that are useful but unfortunately not
recognized by anyone else.  Netscape has done the same thing.

Microsoft claimed that IE 5.0 would be W3C compliant on HTML, Cascaded Style
Sheets (CSS), Document Object Model (DOM), Extendable HTML (XHTML ), and
various XML components.  They did not deliver. Microsoft finally gave a
statement as to their compliance to W3C [or any other] standard:

"Microsoft will comply to a standard when it is in Microsoft's interest."

It is nice to know where the customer stands in the Microsoft world.

Netscape 6.0 was supposed to be the benchmark for W3C stand compliance.
After nearly two years of a complete redesign based on the Mozilla model.
They have not worked the bugs out of the release before releasing it.  There
is a way to go.  Netscape 4.76 is the current stable release of

The link to information on Netscape's technologies is:

The link to information on Internet Explorer is: