OOXML Probably Set for ISO Ratification

Man of Honour
Soldato
Joined
2 Aug 2005
Posts
8,721
Location
Cleveland, Ohio, USA
Various reports are now becoming clear, indicating that Microsoft's OOXML (Office Open XML) office file format is going to be ratified by the ISO (typically International Standards Organization in English). This would mean that it would join ODF (Open Document Format) as an accepted international standard. This is the file format in use in Office 2007's .***x documents.

Without trying to sound too partisan, this process has been very complicated. The implementation documentation released by MS to ISO members is many thousands of pages in length, 6000+. In the standard some things are not well documented at all for the sake of maintaining backward compatibility with older MS formats without having to release implementation documentation for those formats. A common example is that some elements, say a table that was originally in an Office 97 document will not be re-implemented. Rather, the standard suggests that the implementer carry over the same structure of the earlier undocumented format. Critics suggest that this sort of practice makes the "standard" not fully implementable by any vendor that is not Microsoft.

Furthermore, they suggest that MS is not currently using the published standard with its current .docx, etc. documents and that they will never implement a 100% standard compliant format for the purpose of maintaining vendor lock-in. Vendor lock-in, in this case, means that Your organization has a bunch of documents in MS's format and, because nobody else can properly read the existing documents, you are continued to stay with MS Office. Critics say that this is an underhanded move to appear to bend to the will of international standards without losing their valuable market position by becoming fully interoperable.

What do you guys think about the inclusion of OOXML as an ISO standard? Do you think that it represents to the computer industry as a whole? We all know that the world seems to run on MS Office documents.

Do you agree with the critics? Are they a bunch of nutjob conspiracy theorists?
 
I don't know what all the fuss is about.

It seems that a lot of "open sauce zealots" believe that just because it will be a ISO standard that everyone will suddenly have to start using it... Well no... that's not what an ISO standard is about. An ISO standard is simply a way that Microsoft can prove to its big customers that there won't be any repeat of the .doc binary format becoming deprecated debacle. An ISO standard ensures that, with enough demand, there will still be products in 20-30 years time that can read the format. Whereas the .doc binary format is already starting to die a fast death because it is of archaic design.

Open Office can continue developing their own format. OOXML should not affect them in any way. Of course once OOXML has been ratified they may want to add support - just because there is no reason not to. Once it has been ratified then the format is set in stone. So basically they will only need to implement a file reader/writer once and then can forget about it more or less.
 
It seems that a lot of "open sauce zealots" believe that just because it will be a ISO standard that everyone will suddenly have to start using it...
I don't see or hear that at all. I think that's a misconception. More below...
NathanE said:
An ISO standard is simply a way that Microsoft can prove to its big customers that there won't be any repeat of the .doc binary format becoming deprecated debacle. An ISO standard ensures that, with enough demand, there will still be products in 20-30 years time that can read the format. Whereas the .doc binary format is already starting to die a fast death because it is of archaic design.
As I see it it's a bit of an underhanded move. Some big customers, like those you mention, particularly governments are requiring that documents be stored and distributed in interoperable formats so that they can, ostensibly, avoid the lock-in mentioned before. MS releases this and tells those customers that all is OK now because this is an approved standard. However, all of the docs currently being created are NOT compliant. They seem to want to give the appearance of standardization without actually giving up those customers by allowing competitors into the marketplace. Judging from *** ratification controversies and new-member-state debacles it would appear that it's all a dog and pony show to appease those customers who looked like they might switch to another format. Yes, there are plugins for Office that allow you to export in Open Document. Yes, there are plugins for other office suites that allow you to export in OOXML. However, if you can't be guaranteed that the document you exported in, say, AbiWord, your organization cannot even consider a switch since your business interest in being productive nowis vastly more important than being productive in the mysterious future.
NathanE said:
Open Office can continue developing their own format. OOXML should not affect them in any way. Of course once OOXML has been ratified they may want to add support - just because there is no reason not to. Once it has been ratified then the format is set in stone. So basically they will only need to implement a file reader/writer once and then can forget about it more or less.
The Mono guys at Novell, including Miguel de Icaza, have already written a reader for .***x that works pretty well. I don't however, think that the format will be set in stone when it's an official standard. When EMCA ratified it they have continued to make changes that have not been registered in the EMCA standard. Why stop development now if they didn't stop then?

Going back to your first point, it's not that people think that you'll somehow magically be compelled to use OOXML formats. It's that it's a sham to keep the status quo. Such zealots saw .doc as a lock because the format, as exported by, say, Office for Mac, was not the same as the format exported by Office 2003 which was not the same as the format exported by Office XP, etc., ad infinatum. They now see .docx as being identical but now it's dressed in a cloak of supposed interoperability to soothe uppity bureaucrats. The format as exported by other suites, even ones written by Microsoft are not and will not be the same as others, specifically the official current Office for Windows. They don't think that in 30 years time you'll be able to take an OOXML 1.3 (I'm making up that number) from 2009, read the 6000 pages of documentation for the OOXML 1.3 standard, and be able to completely and totally implement the standard. This is a significant barrier to entry in the open market.

Jeez, I sound like a frothy-mouthed zealot myself now. :p
 
Last edited:
The old .doc binary formats were.. erm... binary :). They were highly bespoke and usually there was a seperate parser written for each major Office revision. This however meant that older versions could rarely open .doc files created in a newer Office environment.

This is not the case with OOXML. In fact OOXML first debuted in Office 2003 (although it was not the default document format). Then it was upgraded in Office 2007 to support a few new features. But Office 2003 can still open Office 2007's OOXML files just fine... So that is some credible proof right there that the "lock in" created in the old 90's era of .doc/.xls binary formats has ended. There is a lot of conspiracy flying around in discussion of OOXML. I don't really think conspiracy has any place in this type of discussion. What people don't realise is that Office was first created in a computing era where every byte counted and spending 24 hours work to save 1KB was seen by development team leaders as cost effective. As with any succesful product eventually "feature creep" sets in which results in the code base just expanding at an almost exponential rate as more and more people get added to the development team and tasks assigned. Soon enough all the "original" crew that wrote the Office v1.0 are long gone and what is left is a team that doesn't really know much about the fundamental underpinnings of the file format. Eventually what is needed is a complete rewrite... and this is what happened in Office 2003 when they introduced the .***x file formats.

When I said "set in stone" I meant that particular revision. It's like XML 1.0 was ratified and hasn't been changed since. That doesn't mean further developments were halted... but never were those developments allowed to cause breaking changes that would affect earlier XML parsers. Although you rarely hear of it IIRC there is actually an XML 1.1 and 1.2 standard. But because 1.0 pretty much "knocked the nail on the head" the first time around you rarely see these revised standards being used.

Yes OOXML is not perfect. And in fact Microsoft is not proposing it to be the multi-platform "go everywhere" document format. They are just trying to standardise it to provide assurances to customers. They couldn't give a toss if a bunch of Linux/OSX geeks get wound up about it not being as "elegant" and "perfect" and "well designed" as ODF. That's not what Microsoft is trying to achieve by standardising it.

I fully expect Microsoft to keep revising OOXML. But I doubt they will ever make changes to it that render older file parsers unable to read the file correctly.

The thing with OOXML is it is actually a very simple standard. It only defines quite a nominal set of markup in its namespace. The rest is just proprietary Microsoft extensions... but these are of no concern to third party OOXML parsers as third parties shouldn't be looking to emulate the exact functionality and nuances of Office. Of course there may be areas that third parties DO want to support and that is why Microsoft has gone out of their way to document everything.
 
The thing with OOXML is it is actually a very simple standard. It only defines quite a nominal set of markup in its namespace. The rest is just proprietary Microsoft extensions... but these are of no concern to third party OOXML parsers as third parties shouldn't be looking to emulate the exact functionality and nuances of Office. Of course there may be areas that third parties DO want to support and that is why Microsoft has gone out of their way to document everything.
It seems to me that it's the preoprietary Microsoft extensions that are the hangup! Not all of the proprietary extensions mentioned are documented in a way that can be implemented by a 3rd-party.

Imagine that you have an old .xls file you created in Office 97. You open it with Office 2007, make changes, and save it as .xlsx. In that .xlsx file some of the data is stored in an undocumented-ish way, way described in the documentation as essentially "store this the was that Office 97 did." You then email it to your buddy who is using Gnumeric. Let's imagine that Gnumeric has implemented .xlsx as well as is theoretically possible. He opens it and finds that all is not as it should be. The only way to get it work exactly as it should is to use MS's parser found only in Office. The undisclosed parts are, perhaps, withheld due to licensing agreements; MS licensed a technology or method and now they don't have the right to release that other entity's code or procedure.

Now imagine the same situation but 50 years from now. Your boss tells you that you need to get the data out of these old .docx files. MS Office is long gone, having been replaced by trained monkeys who communicate through telepathic links who take your dictation. A ha!, you say. I can simply implement the OOXML ISO standard. you can't though. There isn't documentation that will tell you how to read some parts of the files. You can't contact Microsoft and sigh an NDA. You can't do anything about it if the original functioning parsers are gone. This is why the binary formats were being discouraged in the first place.

OOXML is, of course, MUCH better than the old formats but it's still not all the way there. I would be happier with it if they stood up and acknowledged the reality of what I described above, rather than wrapping themselves in marketing drivel about openness and interoperability. Yes, it's better. No, it's not as good as it could possibly be. They could say "Sure, it's not perfectly interoperable but there are good reasons for that such as X, Y, and Z. Things will be better as legacy formats go away and by doing this we can have a better time of it in the future."

I do suppose that it's the marketing that has me more miffed than anything.
 
Back
Top Bottom