Dr. Dobb's TechNetcast shows


h o m e

s c h e d u l e

a r c h i v e s

c h a t

f a q

t o o l s

a b o u t


dr. dobbs journal




[PLAY THIS PROGRAM]


XML - Extensible Markup Language

with Tim Bray, hypertext evangelist, member of the W3C XML Working Group and co-editor of the XML specifications.

XML adds a new dimension to web publishing by making possible the public distribution of documents of any type, not only HTML. In this new world, browsers dynamically extract information about the document's structure and markup tags from the XML metadata. XML brings the web closer to the original model envisioned by hypertext evangelists like Tim Bray. Catch up with the latest news and get the technical scoop.

Links

  • TextualityTim's home
  • W3C XML Working Group
  • Robin Cover's XML Web Page, extensive links to specifications and resources, includes annotations
  • The XML FAQ maintained by Peter Flynn
  • DataChannel Member of the XML Active Content Technologies Council (X-ACT) - Up to date XML news and resources
  • xml.com XML news and resources by Seybold/O'Reilly
  • XML: Recommended Reading, by Adam Rifkin and Rohit Khare
  • finetuning.com, annotated links


transcript:

TNC: Welcome to the program. My name is Philippe Lourier.

Our topic today is XML, Extensible Markup Language, and our guest is Tim Bray. Tim has been working with text-based systems since well before the Web existed. He is currently a member of the XML Working Group where and he is co-editor of the specification.

Tim, are you with us?

TB: I'm here.

TNC: Welcome to the show.

You're calling in from Vancouver.

TB: Vancouver, Canada.

TNC: That's where you're based. One of the most beautiful cities in the world.

TB: And it's sunny today, as it always is up here.

TNC: I was there twice and it rained. It didn't stop raining, but I guess my experience is really not representative

TB: It never rains here.

The Process

TNC: Now, you're part of the XML Working Group. Let's talk about the process for a while before we get into the details of what XML is and how it works.

What is the XML Working Group?

TB: Well, to answer that question, it might be wise to back off and just outline briefly what the consortium is that we're all working inside. And the consortium is not a government organization or anything like that. It's just a consortium, a grouping of some 240, with some 240 members, each of whom pay to play. And it is designed to increase the interoperability of the Web.

And it really grew out of the difficulties that the IETF process was having. The IETF has been wonderfully successful, but in the intensely commercialized atmosphere of the Web its process was running into severe static.

So the Web consortium is acknowledged to be commercial top to bottom. And the premise is you lock the key engineers in a room and you don't let them come out until they have agreement.

TNC: What does it take to be part of that, part of the group?

TB: You just have to pay. If your revenue is more than 50 million U.S. dollars a year, it will cost you 50,000 a year to play. If your revenues are under 50,000, it will cost 5,000 to play. So pretty well anybody can get in.

TNC: There is some criticism that maybe some of the lesser players, consultants are not represented properly.

TB: I have also heard criticism that the larger players are predominant. So I guess that depends how you're looking at it. I'm certainly a very minor play. I'm a one-man consulting shop, and I have achieved a reasonably loud voice up there. So I'm not sure I'd go for that.

TNC: The Working Group proposed XML as an official recommendation in February.

TB: That's correct. And I never really answered your first question as to what the Working Group is. Let me just do that briefly.

The Working Group has a small number of people. It started with eleven and we're up to about 20 now. And we also have a much larger group called the Interest Group. The way it works is, the Working Group figures out what the design issues are and marshals them up and the Interest Group hashes them out by E-mail, and then the Working Group, the smaller group, resolves them by voting, trying to achieve consensus where possible.

TNC: When you say by E-mail, these are the lists, XML-DEV and XML-L?

TB: No, these are World Wide Web Consortium lists. You have to be a member of the consortium to join the list.

TNC: So they're private lists among members of the consortium.

TB: Right. I mean, they're not all that private. There are hundreds of members and in fact we also have some non-member invited experts who have been invited in because of what they have to offer. I am one of those, in fact.

TNC: The entire process started for XML in 1996, summer 1996. And now it's a proposed recommendation. So this is happening pretty fast.

TB: Well, we're trying to operate in Internet time.

A very tiny correction. We're not a proposed recommendation. We are now a recommendation, which means as far as the Web Consortium is concerned, the XML1.0 process is done.

TNC: What is the next stop?

TB: That is an issue that's currently under intense debate. We're working on a hyper-linking facility for XML. We are working on a name space facility for XML. And there are several other issues that are trying to jostle their way to the front of the queue.

So we are actually at the moment in the middle of an intense meta-discussion as to what should be done next.

TNC: We'll talk about this when we get back. We have to take a quick break. This is the Dr. Dobbs Technetcast.

(Commercial break)

TNC: Welcome back. This is Philippe Lourier. We're talking today with Tim Bray about XML, extensible markup language. Tim is a member of the XML Working Group and a co-editor of the XML specifications.

Tim, are you there?

TB: Yes, and I must say that I popped up your site and I'm looking at you in real video.

TNC: Okay, so it works. It actually works

TB: Yes, indeed.

Custom Tags

TNC: Let's start with maybe the basics about XML. XML has been described as SGML Lite. One of its main characteristics is that it allows for the creation of documents that contain tags that are not fixed. Is that correct?

TB: Well, that's the idea. In fact, if you look inside an HTML page, which I'm sure everybody listening to this has, you see tags. And HTML comes with a pre-cooked set of 50 or 60 or some number of tags like that, and those are the ones you use.

The idea with XML, which it inherited from SGML, is that you make up your own. If you have something in a Web page that is, for example, a part number or a date of birth or a state of the union, well, you can put a tag in there called "date of birth" or "part number" or "state of the union", and XML has rules that let you go ahead and do that.

Now, having done that, that doesn't really buy you very much. Of course, the browser isn't going to know how to display that. So inventing tags is nice, but it also means that you need some ancillary machinery, such as style sheets or Java classes or whatever. But, nonetheless, you know, the ability to invent your own tags is something that I think is going to prove very, very useful.

TNC: How do you introduce a tag and what information about the tag do you need to expose for the browser to be able to do something with it?

TB: Well, to invent a tag, you just go ahead and type it into your file. I mean, there's no central authority that authorizes people to invent tags. You just go ahead and do it.

Having done it, if you want the browser to display it, you're going to need a style sheet. At the moment, the Web browsers are starting to do a pretty good job of doing CSS, cascading style sheet, and that is presumably the way that people will go about displaying XML.

TNC: Now, when I create a tag, there's more than just the name. The tag may have a certain behavior. Any XML browser would be able to parse it. But, for example, if I'm a pharmaceutical company and I create a tag that identifies a certain medicine, an XML browser may be able to parse it but won't know what to do with it.

TB: Well, that's exactly correct. So what do you want to do with the tag? Well, the first thing you want to do is to display it, and that's what I was talking about with the style sheet.

But probably you'd like to do something more ambitious. Let's take on your example of the pharmaceutical situation, and presumably when you have, you know, the trade name of a drug you'd like to click on it and get an abstract or research results, or actually run some tests.

Well, if you're going to do that, you're going to need to associate some program code with the tag. An obvious way to do that would be with Java classes.

TNC: What is the interface between an XML document and behavior implemented in Java? How does that work? What's the glue there?

TB: That's still being worked out. Probably the best candidate for that is another standard under development in the consortium called the Document Object Model. This is an application programming interface which can be used to talk to the data in a web page after it's been loaded into the browser. And it's designed for HTML and XML, but I suspect that XML will be the most popular format for use with the document object model. So that gives you exactly what you need. You can query a page, find out what the tag names in it are, and then run your own code on the data.

TNC: Tags can also be contained in a separate file in a document type definition file.

TB: Yes. The document type definition is a facility that you can use if you want to declare your tags and give some extra information about them.

Now, the document type definition, which is always abbreviated DTD, which is hard to pronounce, but it's short, anyhow, is particularly useful if you're trying to write an authoring system or an editing program. Because the DTD tells you what tags are going to be in this document, what order you'd like them to appear in, and which ones can be contained in other ones. And obviously it's nice to know that information if you're writing an editing system to create such documents.

Now, the DTD does not really tell you that this is a date and this is a number and that's a string, and so on, and it also doesn't tell you which Java classes which be run, nor is it a style sheet. So it's really more of an authoring support facility at the moment.

TNC: Well, would it make sense to put type information in the DTD? And indicate, for example, that the value associated with this tag is a date. The document refers to the rules that apply to the syntax of tags as constraints. You explain how users may want to be able to constrain the value of a tag to a certain value.

Wouldn't it make sense to also store type information in the DTD?

TB: Absolutely. And you've put your finger on one of the hot issues we're facing right at the moment, which is that DTD's are real handy. They're very, very useful. But for a bunch of applications that people want to do on the Web you need more than DTD can do.

Data testing is one example and there are some others. So, as I said, one of the things that's trying to jostle its way to the front of the queue in the process, is a way of getting a more advanced and flexible schema system - as we're tending to call it-, that will allow us to do some of the things you were just talking about.

TNC: And you're opening a whole can of worms here. This was probably not in the original intention of XML. When you originally designed XML, you probably didn't have this in mind.

TB: Well, it wasn't one of our design goals but I don't think I would agree that we didn't have it in mind. If you -- remember, this was pulled out of the SGML community. And people have been clamoring for the addition of data typing to SGML for years and years and years. So it is absolutely the case that we want to, this is not a new request.

There were some other things that needed to be done more urgently, namely just to produce XML itself in a way that would be simple and usable on the Web. But the addition of data typing is something that's been hanging fire for a long time.

TNC: Now, when we're talking data typing, we're talking not only of simple typed definitions -- for example, this is a date and this must conform to this type of format, character format -- but also of robust OO concepts such as inheritance.

TB: Well, sure. And a good example of that is, well, suppose I have some element, as we call them, that represents a person. Well, it would be nice to say, okay, I'm going to define another element that represents a golfer. And this is just the same as a person element, only it also has a handicap datum. So that's the kind of inheritance thing that we'd like to be able to do.

There's a lot of stuff here --data typing, inheritance-- there's a bunch of things that we could ask for in the next generation of schemas. And people want all this stuff, no doubt about it.

TNC: Schemas are part of another specification called XML Data. What can you tell us about XML Data and what are its prospects?

TB: Well, XML Data is a submission primarily from Microsoft to the Web Consortium, and it is aimed specifically at the area we've just been talking about, that is to say, the realization that we need a more advanced, new generation of schemas. And it makes a whole bunch of proposals for how these kinds of facilities could be added to schemas.

At the moment it is a submission to the Web Consortium. It is my impression that there will be a couple of other submissions also in the same area. And I think that XML Data has some real good ideas that will absolutely make it into whatever we end up building as the next generation schema system.

I'd be surprised if it actually ended up being called XML Data, but some of those ideas are absolutely going to be, going to prove to have been good ones.

TNC: Currently, what functionality can be specified in the DTD? How extensively you describe your data?

TB: Well, let's think of an E-mail message, okay? So if I were going to do a DTD for an E-mail message, I could say, okay, well, first of all, the whole thing has to be contained in an E-mail tag. And then inside that E-mail tag I can say there is a header tag and a body tag. And inside the header tag I would have to, and from, and cc and subject and so on and so forth. And I would say that the to is, you can only have one to -- you can have lots of TO's. You can only have one from. You can only have one subject. You can have zero or more CC's. And that's basically the kind of things that DTDs can do.

one of our primary design goals was that it's easy to implement

TNC: Looking at the spec, it seems that in some ways XML is designed from the bottom up. I mean that it's designed so that it can be very easily parsed. Implementation of the parser is an important aspect of the way the language is designed. This was an important concern in the design of XML, more than it was in SGML. Is that correct?

TB: Much more important. I mean, SGML was designed to be totally general and totally flexible, without all that much concern for how easy it was to implement. And as a result, it was hard to implement.

And XML, one of our primary design goals was that it's easy to implement. And you don't have to be a rocket scientist to see why. Formats succeed when there's good tools for them. And you're not going to get good tools unless it's easy for programmers to build them.

Now, our design goal specifically was that somebody with a CS degree should be able to build a general XML processor in a week. And we didn't quite fit that. It tends to take a little longer than that. But there's lots of them out there now. They're all free. So they are in fact not the hard to build at all.

In the one that I built, which is in Java, the compiled files are only 45K, and it runs at two or 200 K's per second and implements the whole specification. So this is lightweight, portable software. And that's exactly what we were trying to achieve.

TNC: You're talking about Lark.

TB: That's correct.

TNC: Okay, Tim, we have to take a quick break. We'll be back on the Dr. Dobbs Technetcast.

(Commercial break)

TNC: We're back on the Dr. Dobbs Technetcast. We're talking about XML with Tim Bray.

Before the break, Tim was telling us about his XML parser, Lark. And I simply want to give out Tim's Web site here: www.textuality.com, where you can find Lark and other XML pertinent information.

Tim, do you want to tell us about your Web site?

TB: Well, I'm not just an XML guy. I'm an electronic publishing guy. So if you go to textuality.com, what you'll find is sort of an extended hypertextual essay on everything, more or less, having to do with electronic publishing.

And if we're going to be plugging Web sites, I should plug my other Web site, which is www.xml.com. This is a commercial Web site solely dedicated to XML, produced in cooperation with O'Reilly and Seybold. And it's got some amusing stuff on it, too. So...

TNC: Yes, actually I was there. And among the very interesting stuff there is the annotated specification.

Annotated Specification and X-Link

TB: Right. I'm super pumped up about that. That was just immense fun to do.

Now, that is based on XML and also one of the associated technologies that's not quite cooked yet, and that is the extended hyperlink specification.

TNC: That's XLL?

TB: Its name is even in flux. I think X-Link is probably what it's going to be called. Because the XML specification itself is in XML, I was able to construct an annotation that has 300 and change hyperlinks into it that provide historical background and technical exegesis and further illumination and examples for all the stuff in the XML spec. And one of the nice things is with the XML hyperlinking facility, you can link into read only documents such as this one, and without ever actually having to touch them.

We're coming, you know, a little bit closer to Ted Nelson's original notion of hypertext with this kind of stuff.

TNC: The user is looking at a normal HTML page with the regular, horizontal hyperlinks, document with hyperlinks, and a new type of links. When the user clicks on the new hyperlinks, associated text pops up in a frame window on the right side. How does this work internally?

TB: Well, of course, it turns out that ordinary Web browsers and HTML don't do these extended hyperlinks. So there's some trickery involved.

What happens is there's the XML version of the specification, and there's another XML file that has all these hundreds of hyperlinks in it. And a Java program pressures them both, hooks up all the hyperlinks and dumps it all out with all these magic hyperlinks as ordinary HTML links. So it all runs in a perfectly ordinary set of HTML files in a frame document.

Of course, our expectation is that before too much longer this kind of functionality will be supported directly by the browsers, which will be an immense step forward.

TNC: So let's talk about X-Link. What kind of functionality can we expect to find in X-Link?

TB: Well, you know, the Web has been wonderfully successful and the hypertext on it works pretty well. But there's lots of things that you could do ten years ago in Hypercard on the Mac that you can't do today on the Web. So you've got blue underlined text staring at you. Well, if you click on it, you're off, hurtling into cyberspace. You don't know where, you don't know what's at the other end. It clears your screen. You lose your context.

So what we've tried to do in X-Link is take the basic HTML paradigm and extend it in a few painless ways.

First of all, you can provide multi-ended links so you can have a link that goes to several things. Secondly, you can put labels on it, so when you click on something, instead of just going there you get a little menu with a bunch of labels saying, well, do you want to go here, do you want to go there. Third, you can arrange that rather than going and replacing your current display, the results of the links gets embedded right in your current display or gets popped up in a different window. Finally --

TNC: That's similar to target=_blank?

TB: Similar to target, buy you don't need frames to do it.

And finally, there's a little addressing language that allows you to point right into the middle of other documents, even if they haven't put anchors in there to point out.

TNC: Is this related to X-Pointer?

TB: Yes.

TNC: So what is X-Pointer?

TB: Well, the X-Link specification is currently made up of two parts, which are called, just to be confusing, X-Link and X-Pointer. And X-Link gives you the basic machinery that allows you to identify one of these hyperlinks and tells what it is and how many ends it has and so on.

TNC: Okay. Let me just stop you here.

TB: Sure.

TNC: When you say how many ends...

TB: Well, at the moment, you know, a Web HREF only points at one thing. An XML hyperlink can be set up to point at five different things.

TNC: So you would have a little pop-up box and you would have the links.

TB: Exactly. Right. So that's what X-Link says.

Now, X-Pointer is this little miniature language that you can use to point inside documents at parts of them.

TNC: Without having, without the author of the document specifically putting anchors in there.

TB: Exactly.

TNC: That seems like basic functionality that should already be there. There is no way to link directly to a specific location in a document if you have not authored it and inserted an anchor.

"The Web is essentially FTP with pictures today."

TB: Well, no kidding. This is one of the big missing pieces we have now on the Web.

Now, I don't want to dis the Web. One of the reasons that the Web has worked so well is that the hypertext today is simple and linear and does a few things very well.

TNC: Ok Let me just remind you of what you said a while ago, Tim. I don't want to dis the Web "The Web sucks. It is lightweight, shallow, trivial and disposable."

Do you take responsibility for those comments

TB: Oh, I do, yes I published that in "Wired" magazine in 1994, or '95. And to a certain extent I stand by that. The Web is essentially FTP with pictures today. And also it does allow you to get by at a very shallow level of engagement.

Having said that, I have to acknowledge the fact that over 100 million people have gone and downloaded a browser because they wanted to, not because their boss told them to. And the thing about the Web is that it does so much with so little.

TNC: Now, you were there before the Web, before HTML.

TB: Absolutely.

TNC: Did it surprise you that it took off like that, or did you keep saying, well, it's not really ready for prime time and does not have some of the essential functionality required to make it successful. HTML is fixed tags. It's not dynamic enough. It just won't take off. And then suddenly --

TB: No, I think I can say this. I was like everybody else. The first time somebody popped that Mosaic in front of me, I said, oh, yes, this is it. This will take off.

And I cannot claim to have predicted it. You know, I was caught by surprise as much as anybody else. But, you know, people like it.

"I think it's terribly, terribly wrong that the world's documents should be locked up in proprietary binary formats."

TNC: On your screens right now you can see xml.com, Tim's Web site

How far does XML go towards meeting your vision of what a global hypertext system should be?

TB: Not very far. I think that we really need the uplink work to get finished. (Unintelligible)

That's not the real important vision about XML. The important vision is this: I think it's terribly, terribly wrong that the world's documents should be locked up in proprietary binary formats. And unfortunately, too many of them are. And the real vision of XML, what we're really trying to accomplish is to increase the proportion of the world's documents that are open, that are available, that are usable, that aren't partially owned by any word processing vendor. That's really what it's all about.

TNC: What are the prospects of X-Link and X-Pointer?

TB: They're still working drafts. They will probably become recommendations mid-year, we hope. I'm real optimistic that the boys at the big browsers are going to implement them real soon. It turns out that they're not that hard to implement. And the engineering groups of both sides have reacted positively.

Netscape and Microsoft

TNC: Since you brought up the topic, let's talk about specific vendors.

One of the big news in the last few weeks information the XML community is that Mozilla has full XML support in the version that was just released to the public, the public source code version.

Now, apparently that was a surprise.

TB: Well, it was a surprise. I'd been doing --

TNC: Maybe not to you because you've been working with Netscape.

TB: Right. I've been doing some consulting for Netscape last year. So I knew they were working on it. But it is the case that Netscape had essentially been saying nothing about XML for months and months and months. Then they popped up out of the blue at a big XML conference in Seattle last month and pulled this nice-looking page up in the browser and did a view source, and it was XML. So that was pretty dramatic.

Now, I'm not sure I would go so far as to say they have full XML support, because at the moment Mozilla doesn't have full support for anything. It's not finished.

TNC: Does it support XML, the February specification?

TB: Yes, absolutely. Namespaces arent't quite finished yet. It's very close. I think everybody is pretty well supporting namespaces now. They're something that's obviously necessary.

TNC: It's probably more problematic from a design issue than in implementation issue. Implementation is just a matter of opening a connection and getting whatever document is referred to.

TB: That's correct.

TNC: How about Microsoft? For a while Microsoft was actually ahead of Netscape in terms of XML functionality. Now they've been leapfrogged and they're keeping quiet about their XML plans.

TB: Well, I wouldn't go so far as to say they've been leapfrogged. And in some respects they still are ahead. The problem that we have been having is that, first of all, Netscape wasn't saying anything. And now that Netscape is talking, at the moment Netscape and Microsoft are kind of pointed in different directions. And we need to sort that out.

Microsoft has been talking up XML big time, but primarily in the role of a medium for program to program database interchange. They've talked about using it for structured data interchange on the Web. They have not been talking very much about just taking XML and parsing it and displaying it directly.

I suspect now that Netscape has shown that this is a perfectly sensible thing to do, that Microsoft will probably climb on board that bandwagon. I can't see them not doing that.

On the other hand, Microsoft has been doing some very clever and advanced things in terms of wiring stuff into IE to load chunks of XML data, not particularly document-oriented stuff. You know, weather forecasts, auction bids. And then use their programming facilities to make IE into an application delivery platform based on XML. And whereas that's a different kind of thing, they're clearly ahead in that area.

TNC: They also have something called CDF, channel definition format, which is the basis of the channel system in IE40.

TB: Yeah, but come on. Push is dead. Nobody cares.

XML, A Protocol

"nobody is going to invent any more new syntaxes. They'll just invent little XML languages."

TNC: CDF is an example of an XML document. It's very basic. There's really not much in there.

TB: Well, that's fine. The idea is that any time you are going to put some sort of a new facility up on the Net, you need a protocol. So that the software that implements the facility can talk back and forth across the Net. And one of the things you need in a protocol is typically a syntax at the bottom level so you can pump the bytes back and forth.

Well, XML is a perfectly good syntax. It's easy to implement. It's standardized. There's processors everywhere. So I suspect, I hope that nobody is going to invent any more new syntaxes. They'll just invent little XML languages. And I think for simple, straightforward tasks, such as managing channels, a simple, straightforward language such as CDF is totally appropriate.

TNC: You were talking about a weather report document that would be delivered through XML. So I guess this would contain weather-related tags and area tags, maybe, and a temperature tag embedded in there?

TB: Exactly. Yes. Low, high, sunshine, probability of precipitation, that kind of stuff and then they've got the stuff where you can load it into IE and have this dynamically sorted and displayed and graphed and so on; on your screen.

TNC: What this means is that IE knows about these tags. Hard coded IE somewhere, there's information on what these tags mean, how to display them.

TB: No, that's not true. What they actually have done is, they've got this general purpose facility in IE so you can load some XML into it and then write some program code to access it. So they've got a little Visual Basic code, I think it is, that you can get from the Microsoft side, and it reads this thing and does the work.

But, no, they haven't got hardware code. They've got a general purpose interface.

TNC: So the document is parsed and loaded into a tree. Some tags are associated with Java code, and the Java code executes with that data.

TB: That's right. Exactly.

TNC: There is no global dictionary anywhere -- the dictionary is the document.

TB: Well, that's right. I mean, we don't have anything like a universal semantic definition language. Right now, if you want to make a tag do something interesting, you can, A, either do a style sheet or, B, write some code. We don't really have any other options at the moment.

TNC: I guess my point Netscape can retrieve the same document, but it can't do anything with it.

TB: Well, that's right. But, you know, there's a solution to that, too. And I think I mentioned that earlier. And that's the document object model. This is a sort of universal Web API.

The right way to do what Microsoft's doing right now in IE is to pull that stuff into your browser and expose it through the DOM, document object model, API. Then the Java code that does this cool stuff should work the same on both browsers.

TNC: Okay, Tim, we have to take a quick break. We'll be back here on the Dr. Dobbs Technetcast.

(Commercial break)

TNC: Welcome back. This is Philippe Lourier and you're watching the Dr. Dobbs Technetcast on the Pseudo Online Network. Join us in our chat room at irc.pseudo.com. The room is #tech.technetcast.

And we're talking with Tim Bray about XML, XSL, XLL, XPL, XML Data, DTD and DOM. There's a lot of abbreviations here

TB: Yes.

TNC: Wouldn't it be useful to have some kind of convention that describes how to generically handle any tag?

TB: Well, it would be, but that's kind of a blue sky dream, I think. I'm not aware of anything like a universal, you know, semantic behavior definitional language. Sure, you can get agreement probably on style sheet facilities, so that you have a portable way to describe how you want something displayed. But once you actually want to do something I don't know any way to do that aside from writing code.

TNC: So XML does not try to address that problem.

TB: Absolutely not.

TNC: It's just a matter of structuring the data for certain applications and the applications know what to do with that data.

TB: That's right.

TNC: So this does not resolve the tag wars. Microsoft can come up with a new tag, and even though it is defined, in the DTD perhaps, Netscape won't know what to do with it.

TB: That's absolutely true. And as a result, given that the market is so split, it's very unlikely that anybody would use such a tag.

On the other hand, if Microsoft, Netscape or Joe's garage were to introduce a new tag, and also introduce along with it a set of portable freely available Java classes --in conformance to to the document object model-- and if this were freely inter-offerable on all the browsers, people might pick it up if it was useful.

"The right way to think about XML is just like a next generation of ASCII."

TNC: Okay, Tim, we have a question here. I just want to read it to you. By E-mail we received this. This is from Mike Spreitzer.

"I wonder if you could ask Tim if there is anything XML is not good for. For example, I've heard people advocating using XML at all particle levels above TCP. The thinking is, "Why define message syntax when you can just send XML? I've even heard people musing about using XML in programming languages instead of the kind of type data we usually see today."

TB: Well, I would agree with part of that. I would suspect that once you get above TCP, XML's a good candidate for the next level up or two. I mean, I agree. Why invent a new syntax?

On the other hand, there are lots of things that XML isn't good for. Although there's a rumor been going around that Microsoft is about to announce a COM to CORBA two-way bridge, based on XML, which sounds kind of weird to me.

The right way to think about XML is just like a next generation of ASCII. All XML does is allow you to encode a document in text and break it up into parts and give the parts names. And also it's nicely internationalized and it's got some other sugar. But that's basically all it is.

Now, that is a real handy thing to be able to do in a standardized lightweight way. But it's not -- you hear all this wild talk going on about a new object-oriented paradigm and multimedia convergence. Well, hey, that's not what we're doing. We're just trying to replace ASCII.

TNC: What you are describing has the characteristics of a protocol. You build functionality --applications-- on top of the protocol, and the protocol makes it possible to for applications of a same type to exchange information. ASCII made it possible for different computers to intercommunicate.

TB: That's exactly the idea. You know, and I think the ASCII is a good analogy. Nobody ever advocated themselves as being an ASCII expert. And applications didn't advertise themselves as supporting ASCII. But you couldn't get by without it. And I think that's kind of the role that XML is going to play.

TNC: Actually, a couple weeks ago we had Bob Bemer on the show, the father of ASCII.

TB: Oh, I would have loved to have heard that.

Style Sheets

TNC: Yes, it was very inspiring to have him on.

XSL is one of the specifications associated with XML. You mentioned CSS as a way to bind presentation to the data, to the tags.

Why is XSL needed if there is CSS?

TB: Well, it turns out that CSS1, which is what we have now, doesn't do a lot of things that professional publishers like to do. It is suitable for the Web in that it's compatible with Web culture. And anybody can learn it quickly and start to get some good results.

But there's a lot of things that publishers have routinely done that you can't do in CSS. So in reaction to this, a bunch of people, Microsoft was involved in this and also some other companies, proposed SSL, which is a much, much more advanced manipulation and presentation language that really solves an immensely wide range of problems.

Now, I think that XSL is basically a good idea and I like it. But it should be said that there is also, it's not the only game in town. The people who built CSS have come back and proposed CSS2, which goes a long way to address some of the perceived shortcomings of CSS. Furthermore, there's another proposal on the table from HP and some other people called Spice that also tries to address some of these problems.

So I think the two things you can say are, what we have today is CSS1 and, hey, you can do quite a few things with it so that's what we should use today because that's what we've got. And as it is clear that more advanced style sheet facilities are required, it is not clear at all who's going to win.

TNC: But work on XSL is continuing.

TB: As is work on CSS2, yes.

TNC: What features must these specifications include to be worthwhile for XML?

TB: Well, some of the obvious ones are reorganizing a document.

For example, suppose I have a document that has people's names, addresses and phone numbers. And for one reason or another, maybe they're not in the order I want to present them, or maybe they all come in, different orders in different documents. When I put it up on the screen, I always want to have the name first, the phone number second and the address third.

Well, one of the things that XSL does is it allows you to rearrange the document for display. You also need to do that, obviously, to generate a table of contents. You also need that, obviously, to do nicely illustrated cross-references.

Some other things you need are what's called generated text. So suppose I have a tag saying, hey, the chapter starts here. First of all, I need to be able to compute the chapter number and insert the word "Chapter" before the chapter number, so that when you see it on the screen it says Chapter 18.

These are just some of the kinds of things that we want in an advanced style sheeting capability.

Text Processing and Searching Automation

TNC: You worked a while ago on the New Oxford English Dictionary project. And I guess this was before the Web, and you were involved with indexing and other types of text-related functionality.

XML is also touted as being a tool that will make text processing automation easier. What facilitates this in XML?

TB: Well, automation of text processing in general and searching in particular. Text processing automation is pretty straightforward, because rather than having things tagged as bold or italic or 14 point, you have them tagged as title and part number and author and subject and so on. So that just makes automatic processing immensely easier with XML.

A specific example of search, tagging something as an XML item, should in theory enable you to do much more effective search than you can now, because you can search in the context of a part number or a title or a date of birth or something like that, because they're clearly identified as being what they are.

So, yes, that is one of the big chunks of the XML dream is searches much more powerful than we have today.

TNC: And also the fact that XML is what you call "well-formed" makes it easier to process?

TB: Well, that's right. The definition of XML is very tight and rigorous. And that's why you can have programs like mine, for example, that are less than 50K, and do the whole thing.

And that means you have the possibility, a very realistic possibility, of shipping little applets and so on around the Web that will actually do seriously intelligent things with XML. Because it's put together in such a way to make it easy and straightforward to parse.

TNC: Okay, Tim, we only have a minute left. Very quickly, is XML ready for prime time, in the sense that people, developers should start building for XML?

TB: Absolutely. XML 1.0 is frozen. It's not going to change anymore. There may or may not be other versions in the future, but XML 1.0 is a totally safe thing. It's ready to go for lots of applications. There's lots of excellent freeware tools out there.

TNC: What resources are available for developers and programmers?

TB: Oh, there's freeware processors of various kinds, various kinds of programming libraries. There are tools for authoring. There are tools for distributing and browsing and so on. And, of course, the big-name browsers are getting in line too.

TNC: And what's holding in the future for you?

TB: What's holding in the future for me? I'm bored.

TNC: You're bored.

TB: I'm doing XML for a year and a half now, and it's like doing ASCII. It's important, but it's done.

TNC: So you're going to take a break, a vacation.

TB: I'm going to work on other things.

TNC: Okay. Tim Bray, member of the XML Working Group, thanks a lot for joining us today from Vancouver and talking to us about XML.

TB: No problem.

TNC: Thank you very much.