Rate this page del.icio.us  Digg slashdot StumbleUpon

OOXML: Why the debate?

by Jonathan Robie

Contributing author: Melanie Chernoff

You probably don’t lose any sleep worrying that your word processor is saving files in the wrong format. You may have some old files that don’t open correctly, or someone might have sent you a spreadsheet that doesn’t work in anything but Excel, but you probably found some way to work around the problem.

But when data is important and needs to be used in different ways or archived for a long time, the format really does matter. It all boils down to one question: who owns your data? If your data can be used in a wide variety of applications, you own it. If it can only be used cleanly with one vendor’s applications, that vendor is really the one with control.

This is why standards are so important. A complex standard that can only be fully implemented by one vendor does nothing to solve this problem, especially when the format was designed for only that one vendor’s data. This is the crux of the debate over OOXML, an XML format designed for Microsoft’s office suite, which was submitted to ISO for fast-track standardization as DIS 29500. This proposal will be accepted or rejected on March 29, 2008.

In this first article, we will examine the background of ODF and OOXML. Next week, we’ll talk about the ISO certification process and how Red Hat is participating in this effort. The final article will discuss the implications of next month’s OOXML vote for Red Hat and the FLOSS community.

ODF: An ISO standard for office data

The Open Document Format–or ODF (ISO/IEC 26300)–is an XML format designed to exchange office document data. Originally developed by Sun, it has been reviewed and developed by OASIS (Organization for the Advancement of Structured Information Standards) since 2002. ODF was unanimously approved as an ISO standard on May 3, 2006. The ODF specification is a little over 700 pages long, was created by an open process that involved multiple vendors, and has been implemented in a variety of products, including Open Office®, KOffice, GoogleDocs, IBM® Lotus® Symphony, and Macintosh® TextEdit. The ODF standard is the only existing ISO standard for office document data.

Because many government bodies mandate that standards be used, ODF has benefited greatly by being an ISO standard. Government agencies need to share information widely, so they want to create and save documents in a format that can be accessed by multiple applications. They also need to ensure that important documents like patent applications and legal contracts can be read long into the future – an important consideration when word processors have difficulty with documents only a decade old. By January 2008, several government groups—including twelve national and seven regional organizations–have adopted pro-ODF policies. Technology research firm Gartner estimates that by 2011, 50% of government and 20% of commercial organizations will require ODF.

ODF is a smaller and simpler specification than Microsoft’s OOXML. ODF was designed to represent office documents; OOXML was designed to represent Microsoft Office applications.

OOXML: A proposed standard for Microsoft Office Data

Microsoft has been using XML in some file formats since 2000, and they provided full support for exporting office data to XML in Microsoft Office 2003. These XML formats were designed by Microsoft for the exchange of Microsoft Office data.

Office Open XML (OOXML) is a further development of the formats used in Microsoft Office 2003. OOXML is not only complex, it can not be completely implemented without access to inside information. Although its specification is more than 6,000 pages long, it contains various references to things that are defined only in Microsoft’s software, not in the specification itself.

The European Computer Manufacturer’s Association (ECMA) submitted the DIS 29500 proposal, and in their disposition of comments, they note that:

“Many National Bodies requested more complete documentation for some legacy application compatibility settings in DIS 29500, such as ‘AutoSpaceLikeWord95,’ ‘truncateFontHeightsLikeWP6,’ and others. ECMA agrees with this comment, and will provide the full information necessary to implement all compatibility settings within DIS 29500.”

Although many such features are being deprecated for use with legacy software, it makes no sense to put deprecated features in the first version of a standard to allow one vendor’s data without explaining how other applications should process it. And this is just one of over 1000 unique issues that have been submitted by the National Bodies. There’s simply not enough time for ECMA to fully resolve all the technical issues with OOXML by March 29, let alone review the resolutions. Microsoft is hoping the delegates will approve OOXML anyway, on the promise of a future release of this information. But the standard that the National Bodies are being asked to vote on is the one in front of them, not something they hope will materialize some time in the future.

Where do we go from here?

The fact that there are several XML formats for office data demonstrates that people want to use data from their office documents in a flexible, open way. Red Hat supports ODF because it is a simple standard designed by an open process to support multiple products and has already been approved as an ISO standard.

We believe Microsoft should review ODF and identify any missing functionality so that it can be added to the existing standard before creating a completely new standard.

OOXML–despite its complexity–is not currently well-defined enough to be fully implementable. It would take a great deal of time to resolve all of the issues that have been identified, and the current ballot resolution process simply does not provide enough time to fix these issues and create a truly open standard that all vendors can implement.

Next week we will look at the ISO certification process and how Red Hat has been participating in the discussion.

5 responses to “OOXML: Why the debate?”

  1. Cómo convertir de .docx a .doc at El Módem says:

    […] Nuestros amigos de Redmond tuvieron la genial idea de crear un nuevo formato para los documentos de la suite Office 2007: el Office Open XML (OOXML) en oposición al estándar abierto ODF. Actualmente hay una “esfuerzo” (que al parecer incluye “incentivos”) de parte de Microsoft para que OOXML sea aceptado como estándar. Recomiendo leer el post (en inglés) de Red Hat Magazine: OOXML. Why the Debate? y el post de Jugando a Crear: Apoya a ODF frente a OOXML. […]

  2. Zaine Ridling says:

    This is so true. I recognized this problem early on when I switched from the MS .doc format to ODF to preserve my master’s thesis and doctoral dissertation, two documents I had spent years researching and writing. Knowing that Microsoft had changed its .doc/.xls binary formats seven times over the years, the later versions making earlier versions unreadable even with Microsoft software, I knew there was a problem.

    ODF solves so many problems, and as Andy Updegrove argues, a universal open file format even becomes a human rights issue when you consider its ramifications around the globe and for the future.

    MS-OOXML? Not even Microsoft is committed to it, and that should tell you everything.

  3. Troy says:

    I recognized the format problem when I was tasked with changing a company’s WordPerfect 5.1 (*.WPD) documents into Word 6.0 (*.DOC) format. The transfer was automated, but it was not pretty in process or result. The worst part is that the organization went from one proprietary format for their document store to another, and so guaranteed that they would have to perform this process again.

    ODF presents significant opportunities to any business or arm of government that needs a low cost, convertible, and maintainable document store with the inherent ability to mine data from it.

  4. Remy Mudingay says:

    Let me say that I support the use of open standards, from open formats such as ODF and PDF, protocols such as TCP/IP and programming languages such as C#.
    I do not think it is unreasonable to hope that the ISO will reject OOXML as DIS 29500 standard since there are many issues that have been identified. However, if Microsoft are truly committed in making OOXML an open format then this can should be beneficial in giving the end user/developer choice in deciding which format to uses.

  5. Ruiyuan Li says:

    O’ My god,It’s my pain when I can do every thing with satisfied on *nix system except that when I want prepare doc for my boss.

Leave a reply