HTML5, a reprieve for HTML ?

For over one year, we talk more and more about  HTML5. This new standard, still in development, was powered by the new way we are using the web since about 2005 called “Web 2.0”. But something bothers me. It’s not especially about HTML5, but using these new tags made me wonder about the purpose of HTML, its reason of being. I may be wrong but if I am, tell me what’s wrong with giving one’s opinion. Perhaps someone could make it better, who knows?

So first, a little history. In 1989, when Tim Berners-Lee invented the World Wide Web, started the work of HTML. The beginning was a bit wobbly and the first normalization of the language appeared in March 1995 under the name of HTML 2.0. At the time, the need was to give a semantic meaning to structure the information of a web page. That’s why it was based on SGML (the predecessor of XML). But as it was the beginning, web pages were very rudimentary and the small range of tags was sufficient : titles, paragraphs, anchors… The problem is that there were not only semantic tags. Indeed, to make an important text bold, underlined or italic for instance, there were the tags <b>, <u>, <i> and so on. I think the reason why these tags are a problem is today obvious enough to not detail it here. Thus was born the CSS, whose goal was to style the pages. And from that time, semantic and style are separated… Well, theoretically.

Here we are, I could go on the story of HTML but that would not be useful for the following. So, we said that now, semantic and style are separated. Assuming that all tags in HTML5 have a true semantic meaning, a problem remains. Is there a tag for every meaning you want to give? Obviously not. HTML5 introduced tags such as <header>, <footer> and <article>. Okay, nice, we’ve got now a tag to create articles ! Very useful for a blog. But all websites are not blogs ! What about forums ? Why not creating a tag <topic>, <post>, <author-info> or whatever. What about e-commerce websites? Why not creating a tag <product>, <price>, <details> or whatever? What I mean is that we cannot have a unique semantic language. Each website have its own semantic rules. Just as RSS has its owns or SVG and so on.

I think you understood what I implied. Every website should define its own XML DTD. I know that it seems unthinkable for many reasons and I would answer them one by one.

The first one is that building a DTD is not a cakewalk. Developing a website thus become more difficult. To this I reply two things. First, nothing avoids to create a document without DTD. This is dirty but doable. Second, nothing avoids to share a DTD on the internet. There could thus be “libraries” of DTD for different types of websites. Seen in this light, the problem actually become a great asset!

Okay, we’ve got now a panel of languages which have a true semantic meaning. But how do I create a form? How do I create a link? How do I create a canvas? I can say “This is a form” but I can’t say what is a form. We can’t do it in CSS. Right, we can’t do it and this is not its goal. But here we are, is it the goal of the HTML ? I don’t think so. To me, the semantic should only say “This is a form” or “This is a canvas” etc. But what is a form or what is a canvas, this is currently the task of the web browser. Here is the problem. How could the browser know the DTD of every custom languages ? This is impossible. Yes it is. But, by asking this question, we confine ourselves in what we are used to. We don’t think higher than what we know. Actually, there is a solution to this problem. And this solution enables new ways of thinking a website. Actually it enables developers to think about new features, new facilities and so on. This solution is to create a new language so that the code of a website would be separated in three parts : “Meaning”, “Style” and “Features”. The concept of this language would be to say “A <textctrl> is a part of the screen where the user can type text, when he press ‘Enter’, the form is submitted”. The power of creating such a language is that we could also say “A <textctrl> is a part of the screen where the user can type text, when he press ‘Enter’, the focus is send to the next <textctrl>” or “A <textctrl multiple=’True’> is a <textctrl> but when the user press ‘Enter’, it makes a carriage return and a new line.”. Moreover, this third language would put away the problems of browsers compatibility. Or at least, reduce them.

It could be yet another problem which would be that robots which parse codes would be a bit lost. But with the improvement of artificial intelligence, I don’t think that it is a real problem.

Well, I hope I was precise enough so that you caught what’s on my mind. This is even not a draft. Just an idea. I hope someone would be interested in the idea and make it better. Once again, I’m not professional and I may be wrong. But maybe the web could walk a new way.

Of course, such a new paradigm would take a lot of time to set up and web browsers would still keep HTML compatibility for a long time.

To put it in a nutshell, I think that the web needs a brushstroke. Websites should not be considered as part of a big engine anymore but they should be regard as softwares in their own right.