Google

Syntax error handling
>=======================<

[This file explains how netrik handles syntax errors in HTML pages, why it does so, and what to do about that. See index.txt or index.html for an overview of available netrik documentation.]

or:

Why does netrik always complain about HTML syntax errors??

or:

No matter what site I load, (almost) always I get error messages. That can't be OK?!

Well, it *is* OK... The problem is: Actually almost all sites *are* more or less broken.

Other browsers simply ignore the errors, hoping that the effect will resemble more or less what the author intended -- and nobody ever knows, but for some little blemish maybe, where no one cares about. (If Netscape wasn't so tolerant about syntax errors in the first place, we wouldn't have that problem today :-() However, the browser can only "guess" what the author intended, and that doesn't always work out; such errors can cause the page to be layouted completely wrong, including missing text parts etc. That's why netrik warns about them, so the user at least knows what the matter is, and also gets the chance to tell the page author about the problem. (s.b.)

Note that XHTML even *requires* a browser to abort on syntax errors -- sadly, XHTML is not very popular. (Yet?...)

Of course, it might also happen that netrik sees an error where there is none -- but no such case is known yet, so that is quite unlike.

Can't I turn that off?

Well, actually, you can: There are the two options "--broken-html" and "--ignore-broken". "--ignore-broken" will prevent netrik complaining about *any* syntax errors, while "--broken-html" only turns off warnings about common errors which in most cases can be guessed correctly and nicely worked around. (But not always!)

Add "--broken-html" to your ~/.netrikrc file (see config.* for details) to generally disable halting on less important errors.

However, please avoid usage of these options if possible. We know that the way the errors messages are presented now is quite insistent; we can't change that behaviour as long as multi-windowing is not implemented, though. Thus, using --broken-html probably can't be helped when using netrik seriously -- sadly almost all pages on the Net are more or less broken. (But please try to avoid --ignore-broken -- it's generally a bad idea to use this!)

Still, please take the (little) trouble to tell the page author about the problem.

Reporting a problem

Normally you will find an e-mail address of the right contact at the bottom of the page, or on some "contact" page. Most authors will be greatful for the information, and will gladly fix the problem -- once and for all.

You can improve your chances by telling exactly what the problem is. To find out, you can load the page again, but using the "--debug" option. While parsing the page, netrik will dump the source in this mode. When an error is encountered, the last charactacter printed before the error message is the offending one. Note however that in some cases the real reason for the problem may be somewhere before.

If the output is too big, you may either use the --dump option and pipe the output through some pager (but note that the debug output is written to stderr, so you'll need to redirect it: "netrik --debug --dump <url> 2>&1 |less" or something the like), or you can use the "--correct-html" option, causing netrik to abort after the first error is detected.

Alternatively, you can use the W3Cs (WWW Consortium) HTML validating service. (http://validator.w3.org) This will create a report with all problems nicely listed -- very helpful for the page author.

A note on comments

Netrik now has full SGML comment parsing. The problem is that the comment syntax in SGML is fairly complicated; most authors do not know it, and thus do not know that putting a "--" string inside a comment normally generates errors. In most cases these are quite obvious; netrik will print an error message, and use a workaround which will work in most cases, but not always -- just like for other typical errors.

However, there are some constructs which *are* valid SGML -- but do not make what the author probably intended. "<!------>" is a typical example: It is valid, but the comment doesn't terminate; everything behind it will also be treated as part of the comment! As it is not an error, netrik can't print an error message or apply a workaround; instead, just a warning is printed.

Often such an unterminated comment will generate other errors later, because it interferes with other comments or because it stretches to the end of the file; in such a case, the warning may help finding the problem. In other cases however, no error is generated at all -- the warning is the only cue that something went wrong.