Making XML Precise, Concise, and Fast

What is ConciseXML?

ConciseXML refers to four open, language-independent specifications that make XML precise, concise, and fast. ConciseXML solves many of XML's limitations, making XML practical for a range of new applications. The ConciseXML specifications are completely free and open, and parsers are strongly encouraged to be open source.

ConciseXML Encoding

ConciseXML™ Encoding (CXE) is an XML 1.0 encoding that is designed to represent hierachial (and non-hierarchial) data, program logic, and document markup. An example of another XML encoding would be SOAP-encoding. CXE was designed to offer the benefits of ConciseXML while remaining compatible with standard XML parsers and tools. The specification can be found here.

ConciseXML Syntax

ConciseXML™ Syntax (CXS) is a backward-compatible superset of XML 1.0. It is more concise than ConciseXML Encoding, and has features making it ideal for representing both data and logic. CXS also has features to support efficient embedding of binary data. A draft specification of CXS is available here.

ConciseXML Reduced

ConciseXML™ Reduced (CXR) is a minimal subset of ConciseXML Syntax designed for small size and high-performance, it is compatible with CXS, but not compatible with XML 1.0 A draft specification of CXR is available here.

ConciseXML Binary

ConciseXML™ Binary (CXB) supports extremely high performance transmission of ConciseXML data by eliminating parsing. It is currently under development and if you are interested, please email info@clearmethods.com.


Features of ConciseXML

Conciseness

Many people avoid XML in many circumstances because of its verbosity. ConciseXML is as concise at representing logic as the "semi-colon delimited" syntax of C, C++, Java, and C#. ConciseXML is also as concise as the Comma Separated Value (CSV) syntax for data.

Precision

ConciseXML has no ambiguity of meaning. There is only one way to represent parts/fields of an object. This stands in contrast to XML where there can be many syntactic forms for the identical meaning or semantics. Here's a good test: Ask five developers to create a simple XML document that represents a sequence (ordered list) of three integer values. You will probably get at least five different results. If you get such variation for such a simple case, imagine the problems you'll encounter for any non-trivial case.


XML Compatibility

Is ConciseXML compatible with XML?

ConciseXML Encoding (CXE) is an XML 1.0 encoding, just as SOAP-encoding is an XML 1.0 encoding. CXE strictly adheres to the XML 1.0 specification.

ConciseXML Syntax (CXS) is a backward compatible superset of XML and does not conform to XML 1.0 or SGML. The only extensions are ones that remove XML constraints, but do not change the fundamental features of XML. Any XML document is also a valid CXS document and a CXS document may be converted into XML using ConciseXML Encoding without any loss of information. For each CXS extension to XML, there is an equivalent form in XML. A CXS expression or document can mix CXS and XML at all levels.


ConciseXML Extends XML

ConciseXML makes XML more flexible by eliminating many of the unnecessary constraints of XML. Here are the extensions:

  1. Attribute values can be any expression
    Example: <input size=3/> or
    <person birth=<date year=2002 month=10 day=2/>/>
    XML requires that all attributes values are quoted. That effectively requires that all values are of type string. Elements are often used to work around this limitation, but that presents another set of problems.
  2. Attribute keys can be any object
    Example:
    <thing 0="foo" <date 2002 10 10/>="mplusch"/>
    XML does not let an attribute key start with a digit or contain angle-brackets. That effectively limits attribute keys to only be strings. ConciseXML makes it possible to easily represent array-like fields with integer keys as well as any object by using a ConciseXML call syntax.
  3. Tagname of an element can be any expression
    Example: <foo.bar/>
    XML Namespaces are a step in this direction, but ConciseXML makes it possible to have any expression as the tagname of an element. The tagname may be a path or a call/tag.
  4. Attribute keys are optional
    Example: <date 2002 month=10 day=28/>
    In the CSV (Comma Separated Value) syntax and in all major programming languages, field or argument values are given by position, not by keyword.
  5. Closing tagname is optional
    Not only does this remove unnecessary clutter, but when ConciseXML is used as the syntax for dynamic languages, the tagname may not be known until runtime, and therefore the closing tagname must be optional.
  6. Top-level can be any expression, not just an element
    Example: true
    It is surprising difficult in XML 1.0 to create a document whose value is a simple type such as a string, number, or boolean value.
  7. Multiple top-level expressions
    The CSV file format and most programming languages allow multiple top level expressions. XML 1.0 only allows a single root element in a file, while ConciseXML permits any number of expressions at the top level.
  8. Attribute type
    In addition to a key and a value, attributes can also have an optional type that is delimited by an equal sign.
    Example: <thing some_key="some_value"=some_type/>
  9. Paths
    Expressions can be joined using dots. Paths are frequently used in programming language for expressing the traversal of a data structure or control flow between processing stages.
    Example: foo.bar and 1.<plus 1/>

ConciseXML Expressions

A ConciseXML expression can be:
  1. a simple value such as a number, a string, true, false, and null
  2. a complex value such as <book isbn=2323/> or <H1>hello</H1>
  3. a path -- a sequence of other expressions separated by dots: person.manager.2
  4. a name (also known as a symbol) such as 'foo' that references another expression

Example using all expression types:
<foo.bar xx=10 <date 2002 10 2/>.year />


XML was not designed for data

XML was not originally designed to represent data and using XML for data is often complex and ambiguous. To demonstrate this, ask someone who knows a programming language how to represent a simple data structure, such as the string "hello". The answer is trivial in almost every language because it would be the quoted string "hello". Now ask someone who knows XML the same question: How would you represent a string as an XML document?
If the person tells you that XML wasn't meant to handle that kind of data, or that you need the schema, or that you could do it in many different ways, you have seen one of the problems with XML. If even the most trivial example is hard, imagine how cumbersome it becomes for moderately complex problems.

ConciseXML file representing the string "hello" would be:

"hello"

In XML, returning a string might look something like this:

<value xsi:type="string">hello</value>

Articles and Papers

A Comparison of ConciseXML, S-Expressions, and XML, by Mike Plusch

The Trouble with XML, by Christopher Fry

ConciseXML Syntax and XML 1.0
A chapter from Water: Simplified Web Services and XML Programming by Mike Plusch, Wiley & Sons


Free and Open Licensing

To promote ConciseXML as a broad-based standard syntax, ConciseXML.org has adopted a completely free and open licensing policy. ConciseXML will never require a license fee or royalty payment. The use of the validation test suites will always be free, and ConciseXML.org encourages the development of Open Source and commercial ConciseXML parsers. Any parser may use the ConciseXML brand if it passes the ConciseXML test suite. If a product supports the ConciseXML syntax, the following logo may be used:


File extension of .CXS

The standard file extension for the ConciseXML syntax is .cxs. A lowercase file extension is preferred over uppercase. The mime-type for ConciseXML is text/cxs. Programs that store file data in ConciseXML format may prefer to use a file extension that relates to the application. For example, Water programs use the ConciseXML syntax, but have a .h2o file extension. ConciseXML is listed at FILExt.


Products using ConciseXML


The Water language is the first programming language to use ConciseXML syntax.


The Steam platform and IDE from Clear Methods, supports reading, writing, and manipulating both CXS and CXE.

If you have a product or service that uses the ConciseXML syntax, please send us an email.

Syntax vs. Language

A language has symbols that have a specific meaning. For example, HTML is a language because the tags have meanings associated with them. The XHTML language uses XML syntax.

A syntax is structural only. It only defines valid expressions. For example, XML is a syntax because it does not associate meanings with terms.

ConciseXML Standardization

ConciseXML.org has contacted OASIS and other standards organizations, but no working group has yet been formed. If you would like to help to establish ConciseXML as a standard, please email us. Visit this site for more developments in this area.


What is ConciseXML.org?

ConciseXML.org is an organization dedicated to the support and adoption of the ConciseXML syntax. ConciseXML.org was founded in June, 2003. Its first corporate member is Clear Methods, and the individual founding members are Christopher Fry and Mike Plusch.

Members support the goals of ConciseXML and encourage its use. There is no membership fee. To become a member, simply send an email with your name, company, and email address. Your information will only be used to send quarterly newsletters about ConciseXML.


Standard Types

Although ConciseXML does not have any standard types that need to be supported by a parser, the following symbols have special meaning by convention:

  1. true, false, null, optional, required
  2. negative values can be represented using a leading minus sign (Example: -3.2) or with the following ConciseXML expression (Example: <minus 3.2/> )
  3. rational values (also known as ratios) can be represented as <divide 5 2/>
  4. character <char "f"/>
  5. a generic object with fields: thing

Indentation Conventions

Readability improves when you indent ConciseXML in a consistent way. This document recommends specific rules for indenting ConciseXML. Courtesy of waterlanguage.org


ConciseURI

The URI encoding is one of the most widely used formats in the world. ConciseXML Reduced can be translated into the URI form-encoded format. This is the format that is used for the query part of the URI that starts with a question mark (?). The draft document shows examples in ConciseXML, ConciseURI, and JavaScript syntax.

Copyright 2003-2006, Clear Methods. All rights reserved. ConciseXML is a trademark of Clear Methods, Inc.