James Clark's Random Thoughts


Why Ballerina is a language

A new programming language is a lot of work, and the chances of any new programming language getting traction are small. Many new languages are created. Very few make it.

Perhaps even more work than the language is the platform - all the other things that are needed to make users productive with the language: standard library, package manager, IDE, debugger, documentation system, testing tools, etc. One way to reduce the cost of a new language is to leverage an existing platform, such as Java or .NET, and rely on that platform for some of the needed functionality.

Ballerina is a new programming language, and is also a platform. Although it's implemented on top of the JVM, it does not embrace the JVM. It is designed with the goal that we can do another implementation that does not use the JVM, and user code will run unchanged. (We do provide JVM interop features, but that is specifically for when you want to interop with existing JVM code.)

This raises an obvious question. Why? In this post, I want to address this question by explaining what there is in the Ballerina language and platform that could not be done except with a new language and platform. These can be grouped into three areas:

  • networking: networking abstractions, which are part of the language, and implementations of those abstractions provided by the standard library;
  • data and types: the kinds of values that the language operates on and the ways that the type system provides to describe these;
  • concurrency: how the language enables the program to describe concurrent execution of code and control concurrent access to mutable state

These three areas are fundamental: they could not be grafted onto another language. They are also deeply interconnected. In addition, there are some supporting features that are not so fundamental, but which together provide significant value.

This blog is not a complete answer to the "Why?" question. Ballerina's development is funded by WSO2 and WSO2's ultimate goal is to create a product that is useful to its customers. But Ballerina is not itself the product: both Ballerina the language and Ballerina the platform are free and open source. The product is separate: it's a cloud service that takes advantage of Ballerina's capabilities.

Network abstractions

The Ballerina language provides abstractions for both network services and network clients, but knows nothing about specific protocols. Protocol-specific library code is needed to make these abstractions available for a specific protocol. The standard library includes this for the following protocols:

The network abstractions for clients are more straightforward than for services. For clients, the network abstraction consists of a distinct kind of object, called a client object, which has a distinct kind of method, called a remote method, which represents outbound network messages. The standard library supports a protocol by providing an implementation of a client object for that protocol. The language provides a distinctive syntax for remote method calls on client objects and syntactically restricts where such calls can appear. This enables the Ballerina VS Code extension to provide a graphical view of a function or program, which uses a sequence diagram to show the interactions between client objects and remote services. This graphical view always remains in sync with the textual view, and both views are editable.

For services, Ballerina provides a distinct kind of object, called a service object. A remote method on a service object represents a network-callable method. Incoming network messages are dispatched to service objects by using objects implementing the language-defined Listener type. The standard library supports a protocol for services by providing an implementation of the Listener type for the protocol. The language also provides convenient syntax for a module to construct a Listener object, and to define a service object and attach it to a Listener object.

For many languages, the execution model of a program is simply to call a function, which represents the entry point of the program. In Ballerina, the services defined by a program's modules are the network entry points of the program, and this is incorporated into the execution model of a Ballerina program. When a program is executed, every module will first be initialized; this will construct and connect up the module's Listener and service objects. After all modules have been initialized, the program enters a listening phase, which makes the Listeners start accepting network input. The execution model also deals with shutting down services.

The language-provided network abstractions make a program's interaction with the network explicit. This is used to provide network observability. It is also the basis for the code-to-cloud support, which uses compiler extensions to generate artifacts needed for deployment to different cloud platforms (K8s, Azure, AWS). This is also the

Service objects support the concept of a resource method, which enables a more data-oriented view of services. This can be thought of as a network-oriented generalization of OO getter/setter methods, where get/set is generalized to the protocol-defined method (e.g. the HTTP method name get/put/post) and the property name is generalized to a path. The standard library provides implementations of this for HTTP and GraphQL services. This avoids the pain that comes from having to artificially combine the HTTP method name and the resource path into a single identifier (what OpenAPI calls the operationId). We are working on extending the resource method concept to client objects.

Normally when a service and client use a request-response message exchange pattern the remote method on the service can use its return value to provide its response to a request. But this is not always sufficient: in some cases, the service may want to control what happens if there is an error in sending the response; in other cases, they may be using a more complex message exchange pattern. Ballerina models this by passing a client object as an argument to the service's remote method; the service's remote method calls remote methods on this client object to send messages back to the client.

Data and types

Plain data

One of the most fundamental aspects of Ballerina is its focus on plain data. This is called anydata in Ballerina and is analogous to the POD (Plain Old Data) concept in C++. It is pure data, independent of processing that might be applied to the data.

Messages exchanged by network protocols are represented by plain data; the implementations of network protocols can automatically serialize plain data in a format appropriate to the protocol. In particular, plain data can be directly serialized to and from JSON in a simple, natural way.

The whole Ballerina platform is designed to maximize use of plain data. Objects, which bundle methods with data, are not plain data, and the platform uses plain data rather than objects, unless the specific functionality provided by objects is needed. Services and clients are represented as objects; the parameters and return values of remote methods are plain data.

Structured data throughout the platform is represented using the built-in map and array types, which are plain data, rather than using library-defined collection types. In addition to maps and arrays, Ballerina provides a built-in table type, which allows for collections with arbitrary plain data keys (maps have only string keys as in JSON); tables are automatically transformed into arrays of objects when serializing to JSON. The table type provides enough power that even sophisticated, complex programs can be written using only the language-provided collection types.

Ballerina has a structural type system, which has several features typically found in schema languages, such as unions and open records. The overall result is that Ballerina types for plain data work well as schemas for network messages. Subtyping is simple and flexible because it is semantic: types are thought as sets of values, with subtyping corresponding to the subset relationship between the corresponding sets. For example, a user-defined record type is a map. This allows the platform to easily convert between user-defined types and generic types (like anydata). Converting to the generic type is a no-op, because of the subtype relationship. In the other direction, the platform uses a language capability (similar to a type cast) to validate and convert the value to a user-defined type.

Types for services

The user defines a service by writing resource methods or remote methods. For the HTTP and GraphQL protocols, the user can define types (most often record types) and use them for the parameters and return values. The platform's Listener implementation for the protocol makes this just work: the incoming messages will be validated and converted using the parameter types. Annotations can be used to fine-tune this, for example to control whether a method parameter should come from a query parameter or the payload.

The platform can also use the service definitions to generate an IDL. For HTTP, this would be OpenAPI. The types specified for the parameters and return value are converted to a JSON schema.

This works for GraphQL in a similar way: the GraphQL Listener exposes a GraphQL service; it constructs the GraphQL schema for this service from the types in the resource methods; GraphQL introspection is used to make the schema available to clients at runtime.

For gRPC, the platform uses an IDL-first approach (the gRPC community's preferred approach). The platform allows a Ballerina service definition stub to be generated from the gRPC service definitions.

Types for clients

The platform uses two approaches to allow clients to work with typed data. The remote method on the generic client class can at runtime convert the response to a user-specified type passed as an argument.

Alternatively, the platform can generate an application-specific client class from the service's IDL. Note that this supports the same graphical view as the generic client. For GraphQL, the platform can generate an application-specific typed client using a user-specified set of GraphQL queries.


The graphical view of a function as a sequence diagram provided by the VS Code extension shows not only how the function accesses network services, but also the concurrent logic of the function. The language's worker-based concurrency primitives are designed for this graphical view. In the sequence diagram, each worker is represented by a vertical lifeline, and a message passed between workers is represented by a horizontal arrow between the corresponding lifelines. The textual representation is more complex, and requires the compiler to pair up sends and receives. The compiler can also detect potential deadlocks. These primitives have limited expressiveness compared to the concurrency primitives offered by most languages, but are much easier and safer to use in cases where this expressiveness is sufficient.

Ballerina allows programmers to make use of shared mutable state in a familiar way, yet the platform also allows user-defined services to be executed in parallel, with a compile-time safety guarantee that this will not cause data races. This leverages a combination of language features: a simple locking primitive, read-only types, and a concept of isolation. The last of these is a complex, multi-faceted feature, but the compiler can infer it within a single module. The overall effect is the compiler can check whether a service's access to mutable state is always properly locked; if it is, then the Listener implementation allows parallel execution of that service; if not, then the compiler can tell the user that they need to add locks.

The platform uses asynchronous IO throughout, but this is not exposed to the programmer. Async functions are not distinguished as a separate kind of function. The programmer can instead think in terms of logical threads of control, which Ballerina calls strands; these are similar to virtual threads proposed for Java, or goroutines in Go.

Supporting features


The language provides transaction-related features, which make it easier to code robust transaction logic and enable some logic errors to be caught at compile-time. (Note that this is not transactional memory.) These rely on their being a transaction manager provided by the runtime and standard library.

The language accommodates distributed transactions by allowing service and client remote and resource methods to be transaction-aware. It also supports a form of compensation by allowing participants in a distributed transaction to register code to be run when a distributed transaction completes. What makes distributed transactions work is not so much the language as the runtime and standard libraries: these provide a distributed transaction manager and support for transactions in the HTTP listener and client implementation..


The standard library provides support for accessing SQL databases. A SQL database is accessed using a client object. Database transactions integrate with the language's transaction features by making the remote methods on the SQL client object be transactional.

Data with types defined by the SQL schema are transformed into Ballerina values having user-defined record types using the same language features that other network clients use to transform data received from the server.

SQL queries are represented using a template feature similar to JavaScript template literals. This allows Ballerina values representing query parameters to be automatically converted to SQL values.

The language-provided stream type is used to return the results of a query. The language-integrated query feature can be applied to streams directly to allow for further program code to further refine the query results or combine them with the results of queries from other databases, without having to keep the full result in memory.

Configuration data

Most real-life programs need access to configuration data at runtime. Ballerina has language support for this. The language support consists just of allowing specific module-level variables to be declared as configurable; there can be a default for the value or it can be required to be specified in the configuration. The runtime uses a TOML file to initialize configurable variables.

Although this is a very simple language feature, it combines with other Ballerina language features (types, plain data, read-only) to provide a powerful capability: the structure and type of all configuration input to a program is known at compile time, which greatly facilitates the management of the data by higher-level layers.

Language-integrated query

Ballerina provides a language-integrated query feature, which is a generalization of the list comprehensions found in many programming languages. The syntax is similar to C# LINQ declarative query syntax. But whereas the semantics of the C# LINQ syntax are defined in terms of a desugaring into method calls, the semantics of the Ballerina query syntax (which are inspired by XQuery FLWOR expressions) are defined directly in terms of operations on Ballerina's built-in collection types.

The table collection type and query are designed to work nicely together. Tables are similar to lists of records with a primary key. List comprehensions can be extended to handle these more smoothly than maps, where the key and value are separate. Queries have a join clause that turns into a hash join when used with tables.

Query allows many data transformations to be written in a declarative way, using expressions rather than statements, which enables a graphical user interface based on data flow.


Ballerina has a separate xml data-type, modeled after XQuery, which also counts as plain data.

The platform supports two ways of serializing xml values. When the entire network message is XML, then the xml value is serialized as an XML document. When an xml value is included within a structure serialized as JSON, the xml value is serialized as a JSON string. This is convenient when the xml value is being used to represent HTML.

The language-integrated query feature also works with xml: XML structures can be used as input and/or output to a query. This combines with a specialized XPath-like XML-navigation syntax.


The long-term vision for Ballerina includes a number of important features which are not yet implemented, but which have a foundation in existing language features.

  • Event streams (unbounded streams of records with timestamps). Be able to both generate them and query them (using various kinds of windows). Related to this is more first-class support for a client subscribing to a stream of events from the server.
  • Network security. Language support to help the user avoid network security problems (we have experimented with a feature similar to tainting in Perl); this can leverage the explicitness of network interactions in Ballerina.
  • Service choreography. Be able to write a single description that describes how multiple services interact and use that to derive the types of individual services. This could handle services implemented in other programming languages by using Ballerina service types as an IDL.
  • Workflow. Support long-running process execution. Be able to suspend a program and later resume it as a result of an incoming network message. This also requires that transactions get better support for compensation.


Ballerina Programming Language - Part 1: Concept

In the previous post, I talked about the context for Ballerina. In this post, I want to explain what kind of programming language it is. We can summarize this as a number of design goals:

  1. Provide abstractions for networking
  2. Use sequence diagrams as the visual model
  3. Minimize cognitive load
  4. Leverage familiarity
  5. Enable a semantically-rich static program model
  6. Provide a complete platform, not just a language
  7. Allow multiple implementations, based on different runtime environments

This is not exactly the list we started out with: it has evolved in the light of experience.

I will talk about each of these points in turn. The first two of these are a bit different. They correspond to the two fundamental features that make Ballerina unique.


The primary function of an ESB is to send and receive network messages. So language-level support for this is central to the Ballerina project.

On the sending side, the key abstraction is a remote method. A remote method is part of a client object; there is a distinct syntax for calling remote methods. A program sends a message by calling a remote method; the return value of the remote method can describe the response to that message. Implementation of remote methods is typically provided by library code or is auto-generated; each protocol will have its own client object implementation. Application code calls remote methods on client objects..

On the receiving side, the key abstraction is a resource method; a resource method is part of a service. Application code provides services by implementing resource methods. This works in conjunction with listener objects. Implementation of listener objects is typically provided by library code; each protocol will have its own listener object implementation. Listener objects call resource methods on services provided by application code.

There's a final twist that ties together the sending and receiving side: resource methods are typically passed a client object to allow them to send messages back to the client.

So what is gained by providing language-level support for networking?

Most importantly, it enables a visual model that shows the program's behaviour in terms of these abstractions: the visual model can show how the program interacts using network messages. This links up with the second unique feature of Ballerina - the use of sequence diagrams as the visual model.

It also provides a purpose-designed syntax, which does not require the developer to jump through a series of hoops. You use the language-provided syntax and it just works. It's as easy as writing a function. This is not all that important for a large program. But many programs that perform integration tasks are small, and with small programs reducing the ceremony matters. You could compare this with how AWK takes care of opening files and iterating over each line of the file: it's not hard to do, but for a small program the fact that AWK takes care of this for you is a significant convenience.

Related to this is that Ballerina's model of program execution incorporates the concept of running as a service. You don't have to write an explicit loop waiting for network requests until you get a signal. The language runtime deals with all that for you. Again, not revolutionary, but it makes a difference.

The final advantage relates to typing. At the moment the type of a resource method is just a function type, and the type of a service is just a collection of these function types. But we want to do better than this. The type of a resource method should capture not just the type of its parameters and return value, but the type of the messages that it expects to receive, and the type of the message that it will send in response. (The former is at the moment partially captured by annotations, which can be generated from, for example, Swagger/OpenAPI descriptions) Furthermore, the type of a service should capture not just the type of each message exchange separately, but also the relationship between the exchanges. This is usually called session typing and is an active area of research.

Sequence diagrams

WSO2's experience from working with customers over many years has been that drawing a sequence diagram is typically the best way to describe visually how services interact. An ESB's visual model is typically based on dataflow model, which works well for simple cases but is not as expressive. So one big idea underlying Ballerina is that you should be able to visualize a function or program as a sequence diagram.

It is important to understand that the visualization of Ballerina code as a sequence diagram is not simply a matter of tooling that is layered on top of the Ballerina language. It took me a long time to really grok Sanjiva's concept for how the language relates to sequence diagrams. My initial reaction was that it seemed to me like a category error. Sequence diagrams are just a kind of picture. What's that got to do with the syntax and semantics of a programming language?

The concept is to design the syntax and semantics of the language's abstractions for sending network messages, for in-process message passing and for concurrency so that they have a close correspondence to sequence diagrams. This enables a bidirectional mapping between the textual representation of a function in Ballerina syntax and the visual representation of the function as a sequence diagram. The sequence diagram representation fully shows the behaviour of the function as it relates to concurrency and network interaction.

The closest analogy I can think of is Visual Basic. The visual model of a UI as a form is integrated with the language semantic to make writing a Windows GUI application much easier than before. Ballerina is trying to do something similar but for a different domain. You could think of it as Visual Basic for the Cloud.

Cognitive load

Programming languages differ in the demands they make of a programmer. One way to look at this is in terms of different developer personas, such Microsoft's Einstein, Elvis and Mort personas. But it's hard to do that without implying that one kind of developer is inherently superior to another, and I don't think that's a helpful way to look at things. I prefer to think of it like this: a programming language both gives and takes. It gives abstractions to make it convenient to express solutions, and it gives the ability to detect classes of errors at compile time. But it takes intellectual effort to understand the abstractions that are provided and to fit the solution into those abstractions. In other words, it relieves the programmer of some of the cognitive load required to write and maintain a program, but it also imposes its own cognitive load. Every programming language needs to strike a balance between what it gives and what it takes that is appropropriate for the kind of program for which it is intended to be suitable.

For Ballerina, the goal has been for it to make only modest demands of the programmer. Integration tasks are often quite mundane; people just want to get things working and move on. But these integrations, although mundane, can be critically important to a business: so they need to be reliable and they need to be maintainable. So the language tries to nudge programmers in the direction of doing things in a reliable and maintainable way.


One way to reduce cognitive load is to take advantage of people's familiarity with programming languages. Specifically, Ballerina tries to take advantage of familiarity with programming languages in the C family, such as C, C++, Java, JavaScript and C#. This applies to both syntax and semantics. It is not a hard and fast rule, but a guideline: don't be different from C without a good reason, and elegance does not by itself count as a good reason. A good example would be the rules for operator precedence: the C rules are quite a bit different from what I would design if I was starting from scratch, but the benefits from better rules just aren't enough to make it worth being different from all the other languages in the C family.

Semantically-rich static program model

I have struggled to find the right phrase to describe this. It is a generalization of static typing. The idea is that the language should enable programs to describe their semantic properties in a machine-readable way. The objective is to enable tools to construct a model of the program that incorporates these properties, and then use that model to help the developer write correct programs. This ties up with the cognitive load point. A semantically-rich model enables more powerful tools, which help reduce the effective cognitive load on the programmer

"Static" means that a tool can build a model of the program just by analysing the source code, without needing to execute the program. Often this is called "compile-time", but that doesn't seem appropriate for the way an IDE will use this model. Visual Basic used to call it "design time", but that seems a bit narrow too: continued maintenance is just as important as initial design.

For types, it means we want a static type system. But our approach to static typing is pragmatic. The static type system is there to help the programmer. We don't want the static system to be so sophisticated or so inflexible that it becomes an obstacle to writing programs. The goal is not to statically type as much as possible, but to statically type to the extent that it is likely to be helpful to the programmer writing the kinds of program for which we intend Ballerina to be used.

Types are just one kind of semantic richness. There are many others.

  • Sequence diagrams depend on building a model of the program where sends and receives are matched up.
  • Documentation that is structured, not simply free-form comments, can be checked for consistency with the program, and can be made available through the IDE.
  • Properties of services and listeners can be used to automate deployment of Ballerina programs to the cloud.


There's a distinction between the core of a programming language, which defines the syntax of the language and the semantics of that syntax, and the surrounding ecosystem. Often the core comes first, and the ecosystem develops organically as the core gains popularity. A lot of the utility of any language comes from the surrounding ecosystem.

In Ballerina, we refer to the core as "the language" - it's the part that's defined in the language specification. With Ballerina, the language has been designed in conjunction with key components of the surrounding ecosystem, which we call the "platform".

The platform includes:

  • a standard library
  • a centralized module repository, and the tooling needed to support that
  • a documentation system (based on Markdown)
  • a testing framework
  • extensions/plug-ins for popular IDEs (notably Visual Studio Code).

This all takes a lot of work, and is a big factor in why Ballerina has required such a large investment of resources from WSO2.

Multiple implementations

Although the current implementation of Ballerina compiles to JVM byte codes, Ballerina is emphatically not a JVM language. We are planning to do an implementation that compiles directly to native code and we've started to look at using LLVM for this. I suspect that an implementation targeting WebAssembly will also be important long-term.

We have been careful to ensure that the language semantics, particularly as regards concurrency, are not tied to the JVM. This was part of the motivation for the initial proof-of-concept implementation approach, which compiled into bytecode for its own virtual machine (BVM), which was then interpreted by a runtime written in Java. Although the 1.0 implementation compiles directly to the JVM, it is not an entirely straightforward mapping; it takes some tricks to implement the Ballerina concurrency semantics (similar to how Kotlin implements coroutines).

There are languages that are defined by an implementation and there are languages defined by a specification. For a language with multiple implementations, it is much better if the language is defined by a specification, rather than by the idiosyncrasies of a particular implementation. The Ballerina language is defined by its specification. This specification does not in any way depend on Java.

Initially, the specification was a partial description of the implementation. But now we have evolved to a situation where the implementation is done based on the specification. From a language design point of view, we are ready for multiple implementations. It is "just" a matter of finding resources to do the implementation. One of my hopes in writing this sequence of blog posts is somebody outside WSO2 will feel inspired to their own implementation. We would be more than happy to work with anybody who wants to take this on.


There has long been a distinction, originally due to John Ousterhout, between systems programming languages, and scripting or glue languages. Systems programming languages are statically typed, high performance and designed for programming in the large. Scripting/glue languages are dynamically typed, low performance and designed for programming in the small.

There are elements of truth in this distinction, but I see it more as a spectrum than as a dichotomy, and I see Ballerina as being somewhere in the middle of that spectrum. It has static typing, but it's much less rigid than the kind of static typing that systems programming languages have. It is capable of decent performance: it should be possible to make it quite a bit faster than Python, but it will never rival Rust or C++. It's not designed for programs with hundreds of thousands of lines of code, but it's also not designed for one-liners. Here's how I would place Ballerina on the spectrum relative to some other languages:

  • Assembly
  • Rust, C
  • C++
  • Go, Java, C#
  • Ballerina
  • TypeScript
  • Python, JavaScript
  • PowerShell, Bourne shell, TCL, AWK

Go is a bit hard to place relative to Java/C#. In some ways, it's more on the systems side (no VM); in some ways, it's more on the scripting side (typing). I would put Ballerina between Go and TypeScript.

In future posts, I will get into the concrete language features that these design goals have led us to. The details of the core language are in the language specification. The rest of the platform does not yet have proper specifications, but there is lots of documentation on the web site.


Ballerina Programming Language - Part 0: Context

Well, it's been 9 years since my last blog post. It's been an eventful period on real life: I got married, we have two children, I became a Thai citizen, built a house and had major back surgery.

For the last 18 months, I have been working on the design of a new programming language called Ballerina. Version 1.0 of Ballerina has just been released, so now is a good time to start explaining what it's all about. In subsequent posts, I will delve into the technical details, but in this post I want to provide some context: the "who" and the "why".

TL;DR Ballerina was designed to be the core of a language-centric, cloud-native approach to enterprise integration. It comes from WSO2, which is a major open source enterprise integration vendor. I have been working on the language design and specification. I think it has potential beyond the world of enterprise integration.

The main person behind Ballerina is Sanjiva Weerawarana. I've known Sanjiva since around 1999 (20 years!), when we were both on the W3C XSL WG doing XPath and XSLT 1.0. Sanjiva at that time was working for IBM Research (where his boss at one point was Sharon Adler, who I had worked with on the ISO DSSSL committee).

This was the era of peak XML, before JSON was invented, and people were using XML for all sorts of things for which it was not very well-suited, including SOAP and the whole Web Services stack built on top of that. Sanjiva worked on several important parts of that including WSDL and BPEL.

Around 2005, Sanjiva decided he wanted to leave IBM and start a company with some fellow IBMers. He is from Sri Lanka, and wanted to go back. At that time, I was working for the Thai government. I had persuaded them to start an open source promotion activity, and I was running that for them (one day I should write a blog about that).

On Boxing Day 2004, there was a huge tsunami in the Indian Ocean, which was a disaster for several countries including Thailand and Sri Lanka. As part of the recovery process, the Thai government had organized an international IT conference in Phuket at the beginning of 2005. Sanjiva came to talk about Sahana, which was an effort started in Sri Lanka to use open source to help with recovery from the tsunami.

On the sidelines of the conference Sanjiva pitched me the idea for the company, at that time called Serendib Systems (the word serendipity comes from Serendip, which is an old name for Sri Lanka). The idea was to do open source related to web services, based in Sri Lanka. It was at the intersection of a number of my main interests at the time (XML and open source in developing countries), and I had confidence in Sanjiva, so it wasn't a hard decision to invest.

The name was changed to WSO2 (WS as in web services, O2 as in oxygen), Sanjiva took the role of CEO and I joined the board. WSO2 has grown steadily in the 14 years since it was founded, and now has about 600 employees. It has remained an open source company and it has developed a comprehensive open source enterprise integration platform. You may well never have heard of WSO2; we have always been rather better at the technical side of things than the marketing side. But we are actually a major vendor in the open source enterprise integration space, with lots of global Fortune 500 customers. In fact, there’s some Gartner report that says we are the world’s #1 open source integration vendor, although I’m not quite sure on what metric.

For quite some time, the workhorse of enterprise integration has been the Enterprise Service Bus (ESB). An ESB sends and receives network messages over a variety of transports, and there is a configuration language, typically in XML, that describes the flow of these messages. The configuration language can be seen as a domain-specific language (DSL) for integration. It supports abstractions like mediators, endpoints, proxy services and scheduled tasks, which allow a given message flow to be described at a higher-level than would be possible if the equivalent code were written in a programming language such as Java or Go. ESB products (including WSO2's) typically include a GUI for editing the configuration language. The ESB's higher-level abstractions allow for a much more useful graphical view than would be possible with a solution that was written in a programming language.

The fact that an ESB is not a full programming language has important consequences. It means that at a certain point you fall off a cliff: there are things you simply cannot express in the XML configuration language. ESBs typically deal with this by allowing you to write extensions in Java. In practice, this means that complex solutions are written as a combination of XML configuration and Java extensions. This creates a number of problems. First, the ESB is tied to Java. 10 years ago that wasn't really a problem, but increasingly Java is the new COBOL. The cool kids are interested in Go, TypeScript or Rust and would not even consider Java. Oracle's stewardship of Java does not help. Second, the Java extensions are a black box as far as the graphical interface is concerned. Third, multiple languages creates additional complexity for many aspects of the software development process: build, deployment, debugging. Fourth, it's bad in terms of the cognitive load that it places on the developer team: the developers have to learn two quite different languages, and continually switch gears between them.

The other fundamental problem with the ESB concept is that is designed for a centralized deployment model. The idea is that the IT department of an enterprise runs the Enterprise Service Bus for the entire enterprise. It is not only the large footprint of an ESB that pushes in this direction, but also the licensing model: ESBs are typically not cheap and are licensed on a per-server basis. If you think of the XML configuration language as a domain-specific programming language, and of the ESB as the runtime for that language, you in effect have one large program, controlling integration across the entire enterprise. Furthermore, this program is not written in a pleasant, modern programming language, with support for modularity, but is rather just a pile of XML. As you can imagine, this is not good for agility or DevOps.

This is the background that led to the creation of Ballerina. The high-level goal is to provide the foundation of a new approach to enterprise integration that is a better fit for current computing trends than the ESB. Obviously, the cloud is a hugely important part of this. The Ballerina concept evolved over a number of years. I see three stages:

  1. Let’s do a better DSL that looks more like a programming language!
  2. Let’s make it full programming language!
  3. Let’s take a shot at becoming a mainstream programming language!

Stage 2 marks the start of the Ballerina project, and was when the name was chosen; that happened in August 2016.

My first involvement with Ballerina was at the beginning of 2017, when Sanjiva asked me to help with the design of the language support for XML. But I only started to get really deeply involved in Ballerina in February 2018. At that point there was already a working, proof-of-concept implementation. Sanjiva asked me to help write a language specification.

When I started, we did not think it would take all that long for me to write a specification. We were completely wrong about that! It's been 18 months already, and it is still a work-in-progress. What happened is that as we dug into the details of the language, it became apparent that there was a lot of scope for improvement in the design. The job turned out to be more about refining and evolving the language design, rather than just documenting what had been implemented. As it became clearer than the goal was to try eventually to become a mainstream programming language, so the quality bar for the implementation needed to be raised.

Sanjiva's primary area of expertise is distributed systems, and WSO2's collective expertise is centered around enterprise middleware, rather than programming language design and implementation. When they started the Ballerina project, I think they underestimated the enormity of the project that they had taken on, as did I to some extent. As I have been wrestling with the Ballerina language design, I have gained a much better appreciation of just how hard programming language design is. I have looked at many other programming languages for inspiration. I've been incredibly impressed by how good the current generation of programming languages are. I would particularly highlight TypeScript, Go, Rust and Kotlin. Each of them has a very different language concept, but every one of them has done an amazing job of designing a programming language that realizes their concept. I take my hat off to their designers.

I should say something about what a version of 1.0 means. I should first explain first we make a distinction between the implementation version and the specification version. 1.0.0 is the implementation version. Language specifications are labelled chronologically (it's a living standard!). The 1.0.0 implementation is based on the language specification labelled 2019R3, which means the 3rd release of 2019.

1.0 does not mean that we have got either the language design or implementation to where we want it. If we lived in a world unsullied by commercial or competitive reality, we could easily spend a couple of years extending and improving the design and implementation. But WSO2 is not a huge company, and we have already made a very substantial investment in Ballerina (of the order of 50 engineers over 3 years). So we need to get something out there, so that we can get some proof points to justify continued investment. The benchmark for 1.0 is whether it works better for enterprise integration than our current ESB-based product. It needs to be sufficiently stable and performant that we can support it in production for enterprise customers.

We also have a reasonable degree of alignment between the language specification and the compiler: what the compiler implements is a subset of what the specification describes, with a couple of caveats. The first caveat is that there are some non-core features that are not quite stable. These are labelled "preview" in the specification. We expect to stabilise these soon, and that will involve some minor incompatible changes. The second caveat is that the implementation has some experimental features, which are not in the specification; we plan that the language will eventually include features that provide similar functionality.

The language design described by the current specification has two fundamental features that are unique (at least not part of any mainstream programming language). Its combination of other features is also unique: each feature is individually in some language, but no language has all of them. I think the language design is interesting not just for enterprise integration, but for any application which is mainly about combining services, whether consuming them or providing them. As things move to the cloud, more and more applications will fall into this category. Although the current state of the language design is interesting, I think the potential is even more interesting. Over the next year or two, we will stabilize more of the integration-oriented language features, which will make Ballerina quite different from any other programming language. Unfortunately, it takes a lot of work to get the general-purpose features solid and that has to be done before the more domain-specific features can be finalized.

Overall, the 2019R3 language design and the 1.0 implementation are an initial, stable step, but there is still a long way to go.

In future posts, I will get into the design of the language. In the meantime, you can try out the implementation and read the specification. The design process was initially quite closed, but has gradually become more open. Most of the discussion on the spec happens in issues in the spec's GitHub repository. Major new language features have public proposals. Comments and suggestions are welcome; the best way to provide input on is to open a new issue.

See the next post in the series.


More on MicroXML

There's been lots of useful feedback to my previous post, both in the comments and on xml-dev, so I thought I would summarize my current thinking.

It's important to be clear about the objectives. First of all, MicroXML is not trying to replace or change XML.  If you love XML just as it is, don't worry: XML is not going away.  Relative to XML, my objectives for MicroXML are:

  1. Compatible: any well-formed MicroXML document should be a well-formed XML document.
  2. Simpler and easier: easier to understand, easier to learn, easier to remember, easier to generate, easier to parse.
  3. HTML5-friendly, thus easing the creation of documents that are simultaneously valid HTML5 and well-formed XML.

JSON is a good, simple, extensible format for data.  But there's currently no good, simple, extensible format for documents. That's the niche I see for MicroXML. Actually, extensible is not quite the right word; generalized (in the SGML sense) is probably better: I mean something that doesn't build-in tag-names with predefined semantics. HTML5 is extensible, but it's not generalized.

There are a few technical changes that I think are desirable.

  • Namespaces. It's easier to start simple and add functionality later, rather than vice-versa, so I am inclined to start with the simplest thing that could possibly work: no colons in element or attribute names (other than xml:* attributes); "xmlns" is treated as just another attribute. This makes MicroXML backwards compatible with XML Namespaces, which I think is a big win.
  • DOCTYPE declaration.  Allowing an empty DOCTYPE declaration <!DOCTYPE foo> with no internal or external subset adds little complexity and is a huge help on HTML5-friendliness. It should be a well-formedness constraint that the name in the DOCTYPE declaration match the name of the document element.
  • Data model. It's a fundamental part of XML processing that <foo/> is equivalent to <foo></foo>.  I don't think MicroXML should change that, which means that the data model should not have a flag saying whether an element uses the empty-element syntax. This is inconsistent with HTML5, which does not allow these two forms to be used interchangeably. However, I think the goal of HTML5-friendliness has to be balanced against the goal of simple and easy and, in this case, I think simple and easy wins. For the same reason, I would leave the DOCTYPE declaration out of the data model.

Here's an updated grammar.

# Documents
document ::= comments (doctype comments)? element comments
comments ::= (comment | s)*
doctype ::= "<!DOCTYPE" s+ name s* ">"
# Elements
element ::= startTag content endTag
          | emptyElementTag
content ::= (element | comment | dataChar | charRef)*
startTag ::= '<' name (s+ attribute)* s* '>'
emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
endTag ::= '</' name s* '>'
# Attributes
attribute ::= attributeName s* '=' s* attributeValue
attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                 | "'" ((attributeValueChar - "'") | charRef)* "'"
attributeValueChar ::= char - ('<'|'&')
attributeName ::= "xml:"? name
# Data characters
dataChar ::= char - ('<'|'&'|'>')
# Character references
charRef ::= decCharRef | hexCharRef | namedCharRef
decCharRef ::= '&#' [0-9]+ ';'
hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
namedCharRef ::= '&' charName ';'
charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
# Comments
comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
# Enforce the HTML5 restriction that comments cannot start with '-' or '->'
commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
# As in XML 1.0
commentContentContinue ::= (char - '-') | ('-' (char - '-'))
# Names
name ::= nameStartChar nameChar*
nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
# White space
s ::= #x9 | #xA | #xD | #x20
# Characters
char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
forbiddenChar ::= surrogateChar | #FFFE | #FFFF
surrogateChar ::= [#xD800-#xDFFF]



There's been a lot of discussion on the xml-dev mailing list recently about the future of XML.  I see a number of different possible directions.  I'll give each of these possible directions a simple name:

  • XML 2.0 - by this I mean something that is intended to replace XML 1.0, but has a high degree of backward compatibility with XML 1.0;
  • XML.next - by this I mean something that is intended to be a more functional replacement for XML, but is not designed to be compatible (however, it would be rich enough that there would presumably be a way to translate JSON or XML into it);
  • MicroXML - by this I mean a subset of XML 1.0 that is not intended to replace XML 1.0, but is intended for contexts where XML 1.0 is, or is perceived as, too heavyweight.

I am not optimistic about XML 2.0. There is a lot of inertia behind XML, and anything that is perceived as changing XML is going to meet with heavy resistance.  Furthermore, backwards compatibility with XML 1.0 and XML Namespaces would limit the potential for producing a clean, understandable language with really substantial improvements over XML 1.0.

XML.next is a big project, because it needs to tackle not just XML but the whole XML stack. It is not something that can be designed by a committee from nothing; there would need to be one or more solid implementations that could serve as a basis for standardization.  Also given the lack of compatibility, the design will have to be really compelling to get traction. I have a lot of thoughts about this, but I will leave them for another post.

In this post, I want to focus on MicroXML. One obvious objection is that there is no point in doing a subset now, because of the costs of XML complexity have already been paid.  I have a number of responses to this. First, XML complexity continues to have a cost even when XML parsers and other tools have been written; it is an ongoing cost to users of XML and developers of XML applications. Second, the main appeal of MicroXML should be to those who are not using XML, because they find XML overly complex. Third, many specifications that support XML are in fact already using their own ad-hoc subsets of XML (eg XMPP, SOAP, E4X, Scala). Fourth, this argument applied to SGML would imply that XML was pointless.

HTML5 is another major factor. HTML5 defines an XML syntax (ie XHTML) as well as an HTML syntax. However, there are a variety of practical reasons why XHTML, by which I mean XHTML served as application/xml+xhtml, isn't common on the Web. For example, IE doesn't support XHTML; Mozilla doesn't incrementally render XHTML.  HTML5 makes it possible to have "polyglot" documents that are simultaneously well-formed XML and valid HTML5.  I think this is potentially a superb format for documents: it's rich enough to represent a wide range of documents, it's much simpler than full HTML5, and it can be processed using XML tools. There's an W3C WD for this. The WD defines polyglot documents in a slightly different way, requiring them to produce the same DOM when parsed as XHTML as when parsed as HTML; I don't see much value in this, since I don't see much benefit in serving documents as application/xml+xhtml.  The practical problem with polyglot documents is that they require the author to obey a whole slew of subtle lexical restrictions that are hard to enforce using an XML toolchain and a schema language. (Schematron can do a bit better here than RELAX NG or XSD.)

So one of the major design goals I have for MicroXML is to facilitate polyglot documents.  More precisely the goal is that a document can be guaranteed to be a valid polyglot document if:

  1. it is well-formed MicroXML, and
  2. it satisfies constraints that are expressed purely in terms of the MicroXML data model.

Now let's look in detail at what MicroXML might consist of. (When I talk about HTML5 in the following, I am talking about its HTML syntax, not its XML syntax.)

  • Specification. I believe it is important that MicroXML has its own self-contained specification, rather being defined as a delta on existing specifications.
  • DOCTYPE declaration. Clearly the internal subset should not be allowed.  The DOCTYPE declaration itself is problematic. HTML5 requires valid HTML5 documents to start with a DOCTYPE declaration.  However, HTML5 uses DOCTYPE declarations in a fundamentally different way to XML: instead of referencing an external DTD subset which is supposed to be parsed, it tells the HTML parser what parsing mode to use.  Another factor is that almost the only thing that the XML subsets out there agree on is to disallow the DOCTYPE declaration.  So my current inclination is to disallow the DOCTYPE declaration in MicroXML. This would mean that MicroXML does not completely achieve the goal I set above for polyglot documents. However, you would be able to author a <body> or a <section> or an <article> as MicroXML; this would then have to be assembled into a valid HTML5 document by a separate process (albeit a very simple one). It would be great if HTML5 provided an alternate way (using attributes or elements) to declare that an HTML document be parsed in standards mode. Perhaps a boolean "standard" attribute on the <meta> element?
  • Error handling. Many people in the HTML community view XML's draconian error handling as a major problem.  In some contexts, I have to agree: it is not helpful for a user agent to stop processing and show an error, when a user is not in a position to do anything about the error. I believe MicroXML should not impose any specific error handling policy; it should restrict itself to specifying when a document is conforming and specifying the instance of the data model that is produced for a conforming document. It would be possible to have a specification layered on top of MicroXML that would define detailed error handling (as for example in the XML5 specification).
  • Namespaces. This is probably the hardest and most controversial issue. I think the right answer is to take a deep breath and just say no. One big reason is that the HTML5 does not support namespaces (remember, I am talking about the HTML syntax of HTML5). Another reason is that the basic idea of binding prefixes to URIs is just too hard; the WHATWG wiki has a good page on this. The question then becomes how does MicroXML handle the problems that XML Namespaces addresses. What do you do if you need to create a document that combines multiple independent vocabularies? I would suggest two mechanisms:
    • I would support the use of the xmlns attribute (not xmlns:x, just bare xmlns). However, as far as the MicroXML data model is concerned, it's just another attribute. It thus works in a very similar way to xml:lang: it would be allowed only where a schema language explicitly permits it; semantically it works as an inherited attribute; it does not magically change the names of elements.
    • I would also support the use of prefixes.  The big difference is that prefixes would be meaningful and would not have to be declared.  Conflicts between prefixes would be avoided by community cooperation rather than by namespace declarations.  I would divide prefixes into two categories: prefixes without any periods, and prefixes with one or more periods.  Prefixes without periods would have a lightweight registration procedure (ie a mailing list and a wiki); prefixes with periods would be intended for private use only and would follow a reverse domain name convention (e.g. com.jclark.foo). For compatibility with XML tools that require documents to be namespace-well-formed, it would be possible for MicroXML documents to include xmlns:* attributes for the prefixes it uses (and a schema could require this). Note that these would be attributes from the MicroXML perspective. Alternatively, a MicroXML parser could insert suitable declarations when it is acting as a front-end for a tool that expects an namespace well-formed XML infoset.
  • Comments. Allowed, but restricted to be HTML5-compatible; HTML5 does not allow the content of a comment to start with -or ->.
  • Processing instructions. Not allowed. (HTML5 does not allow processing instructions.)
  • Data model.  The MicroXML specification should define a single, normative data model for MicroXML documents. It should be as simple possible:
    • The model for a MicroXML document consists of a single element.
    • Comments are not included in the normative data model.
    • An element consists of a name, attributes and content.
    • A name is a string. It can be split into two parts: a prefix, which is either empty or ends in a colon, and local name.
    • Attributes are a map from names to Unicode strings (sequences of Unicode code-points).
    • Content is an ordered sequence of Unicode code-points and elements.
    • An element probably also needs to have a flag saying whether it's an empty element. This is unfortunate but HTML5 does not treat an empty element as equivalent to a start-tag immediately followed by an end-tag: elements like <br> cannot have end-tag, and elements that can have content such as <a> cannot use the empty element syntax even if they happen to be empty. (It would be really nice if this could be fixed in HTML5.)
  • Encoding. UTF-8 only. Unicode in the UTF-8 encoding is already used for nearly 50% of the Web. See this post from Google.  XML 1.0 also requires support for UTF-16, but UTF-16 is not in my view used sufficiently on the Web to justify requiring support for UTF-16 but not other more widely used encodings like US-ASCII and ISO-8859-1.
  • XML declaration. Not allowed. Given UTF-8 only and no DOCTYPE declarations, it is unnecessary. (HTML5 does not allow XML declarations.)
  • Names. What characters should be allowed in an element or attribute name? I can see three reasonable choices here: (a) XML 1.0 4th edition, (b) XML 1.0 5th edition or (c) the ASCII-only subset of XML name characters (same in 4th and 5th editions). I would incline to (b) on the basis that (a) is too complicated and (c) loses too much expressive power.
  • Attribute value normalization. I think this has to go.  HTML5 does not do attribute value normalization. This means that it is theoretically possible for a MicroXML document to be interpreted slightly differently by an XML processor than by a MicroXML processor.  However, I think this is unlikely to be a problem in practice.  Do people really put newlines in attribute values and rely on their being turned into spaces?  I doubt it.
  • Newline normalization. This should stay.  It makes things simpler for users and application developers.  HTML5 has it as well.
  • Character references.  Without DOCTYPE declarations, only the five built-in character entities can be referenced. Things could be simplified a little by allowing only hex or only decimal numeric character references, but I don't think this is worthwhile.
  • CDATA sections. I think best to disallow. (HTML5 allows CDATA sections only in foreign elements.) XML 1.0 does not allow the three-character sequence ]]> to occur in content. This restriction becomes even more arbitrary and ugly when you remove CDATA sections, so I think it is simpler just to require > to always be entered using a character reference in content.

Here's a complete grammar for MicroXML (using the same notation as the XML 1.0 Recommendation):

# Documents
document ::= (comment | s)* element (comment | s)*
element ::= startTag content endTag
          | emptyElementTag
content ::= (element | comment | dataChar | charRef)*
startTag ::= '<' name (s+ attribute)* s* '>'
emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
endTag ::= '</' name s* '>'
# Attributes
attribute ::= name s* '=' s* attributeValue
attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                 | "'" ((attributeValueChar - "'") | charRef)* "'"
attributeValueChar ::= char - ('<'|'&')
# Data characters
dataChar ::= char - ('<'|'&'|'>')
# Character references
charRef ::= decCharRef | hexCharRef | namedCharRef
decCharRef ::= '&#' [0-9]+ ';'
hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
namedCharRef ::= '&' charName ';'
charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
# Comments
comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
# Enforce the HTML5 restriction that comments cannot start with '-' or '->'
commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
# As in XML 1.0
commentContentContinue ::= (char - '-') | ('-' (char - '-'))
# Names
name ::= (simpleName ':')? simpleName
simpleName ::= nameStartChar nameChar*
nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
# White space
s ::= #x9 | #xA | #xD | #x20
# Characters
char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
forbiddenChar ::= surrogateChar | #FFFE | #FFFF
surrogateChar ::= [#xD800-#xDFFF]