XLIFF of the Future: an Interview with David Filip

Posted by Jim Compton on Thu, Jul 27, 2017 @ 10:10 AM

XLIFF of the Future: an Interview with David Filip

In my article Is It Time to Kiss the Digital File Goodbye?, I cite the XLIFF OMOS (“XML Localization Interchange Fragment Format Object Model and Other Serializations”) initiative as an example in support of the assertion that a file-less future is achievable in the localization industry.

Since then, I had the pleasure of chatting about the project in more detail with David Filip—technologist, research fellow at ADAPT, and chair of the XLIFF OMOS committee.

Read on to geek out about the future of XLIFF and how you can take part in shaping it.

JIM: Can you give me the high-level overview of what gave birth to the project, and what it is trying to accomplish?

DAVID: We were working hard on the XLIFF 2.0 project. That was back in 2012-2013; the major work on the technical elaboration of the standard—the traditional XML serialization. We were almost done. In 2014 we were basically in the formal approval rounds, getting it rubber-stamped by the wider standardization organization. And some Microsoft people came with this idea to see a JSON (JavaScript Object Notation) serialization of XLIFF.

A few people got quite enthusiastic about that, and Yves Savourel from Okapi started immediately prototyping possible JSON serializations.

We knew that we first need to abstract a general model that is not XML-centric, and from that general abstract object model go to maybe other serializations: not only to JSON. JSON is obviously the first target because it is so sexy nowadays. But if you have the abstract object model, you’re coming to what “XLIFF OMOS” or “XOMOS” means: Object Model and Other Serializations.

It has XLIFF in the name to make it absolutely clear that it is a sister committee to the original XLIFF. We were cautioned by the OASIS administration that we cannot actually do this work on the original XLIFF committee, because it was mandated back in 2001 to only work on XML vocabularies. It is really important in standardization—for your legal footing—to do only the chartered work. If you don’t do only the chartered work, you’re basically leaving the safe haven of the standardization body and are no longer protected by their IPR policy, because this is only valid for the chartered scope.

David Filip.jpg
David Filip: technologist, research fellow at ADAPT, and chair of the XLIFF OMOS committee

There was a clear business need for doing the other things—for extracting the abstract business constraints and object model constraints from the XML serialization—because there’s no doubt great value in it. But you need to unleash it. You need to make it available for the other technologies; you need to decouple it from the XML serialization first. 

JIM: What in terms of industry capabilities becomes possible with this alternative serialization for XLIFF?

DAVID: The honest answer: not much now because it’s a brand new technology. Currently we have three strands of work in the XOMOS committee and two of them are directly relevant to what you are interested in. One is the object model which is basically the OMLIFF—object model or localization interchange fragment format. In “LIFF” the “FF” is not “file format” but “fragment format”. But still, we want this LIFF to be recognizable as a sister to XLIFF.

I started recently working on the prose representation. The prose representation is pretty much taking the structure of the XLIFF standard but stripping it from XML-ism, making it XML independent. Already, working on the UML presentation helps a lot in decoupling it from XML thinking.

Also, in parallel we work on the JLIFF (JSON Localization Interchange Fragment Format) strand, which is just another GitHub repo. People are looking at XLIFF examples and they’re also looking at the UML diagram from the OM repo, and they’re trying to come up with a suitable JSON representation, captured as a JSON schema. I think the progress is really great. We have attracted Robert van Engelen from Genivia, which is a small API company in the US. Robert, their CTO, is also a professor of computer science. He really is a brain hacker in terms of JSON schema, so we have a great skillset on the committee and representatives from Microsoft, SDL, Intel, Spartan Software, Vistatec, Genivia and Okapi/ENLASO.…

I’m the chair of the committee and am overseeing all three tracks (the third track is on TBX mapping, so not directly relevant here) of the development. I am active as the lead editor on the object model, but am rather moderating what is happening on the JLIFF strand

JIM: Are there people in the industry who have plans to use the standard, or are developing against it today?

DAVID: The whole project originated from industry need and we do have industry on the committee. We are excited about having the first prototype hopefully ready for LocWorld’s FEISGILTT and XLIFF Symposium on October 31 and November 1st.

It is really moving fast and people are interested. We have Spartan Software, Vistatec, and ENLASO all prototyping as the schema is being developed by Genivia. At this point you obviously can’t build anything production-ready on that, because it’s in flux. But it is really in the exciting time, developing a technology from scratch.

Moravia’s Ján Husarčík sent a whole lot of important and useful comments to the XLIFF 2.1 for public review of the next full release of the classical XML serialization. He was looking at the JLIFF schema and said it probably will be cool for Moravia’s MT broker architecture. This is the first positive voice from outside the committee, I would say.

JIM: Are there other programs that you’re aware of, or other standards that are getting alternative serialization? Are there other “OMOS” programs?

DAVID: Certainly not in the localization industry. I think that we are absolutely on the forefront. And it’s kind of important to do it as fast as possible, basically to prevent people on the JSON front to invent things from scratch in a slightly different way that would be semantically not interoperable.

In other areas, this is happening all the time. All over the place you can see micro-services and service architectures, messaging architectures and things like that, which need standard formats for exchanging data. So, it’s quite common in industry to have different bindings.

For instance, the CMIS standard (which is another OASIS standard) is for content management systems inter-operability. Basically, they are working in different bindings. One of them is XML Atom binding. Another is JSON browser binding. So, it is quite common that standards are—in more mature or less fragmented parts of IT industries—trying to reach both the XML and JSON worlds for generalized capabilities, for different sorts of serializations and bindings

JIM: Do you have any advice for how folks can prepare their operations or technology for this new modality for XLIFF?

DAVID: In general I keep telling people you shouldn’t wait for the standard to hit you. You should try to engage with the standard while it’s in the making, not complain later on that something was done in a way that you don’t like. That’s no good.

Standards are an altruistic piece of technology created by the community for the community. And if you are not part of the community, if you are not trying to influence the standard when it’s happening, then you are not making it easy for the standard-makers to make the standard. And you are making it hard for you to adopt it.

So, the best thing is to go join the OASIS committee and help them develop the standard and you will be aware of what’s happening there and you will be able to prepare your tools for the standard.

We do the development on GitHub and GitHub is open for general feedback. So you don’t need to be a committee member to give general feedback to the development.

When the standards get to the formal approval phase, they start going for public reviews. And for engaging in the public reviews you don’t need any membership, anything, everyone can send feedback to the TC comment list.

At the very least, people should engage with the standard when it gets to the public reviews. We expect a demo of a prototype this autumn and the start of formal publishing procedures of the standard early next year. So the public reviews should start happening sometime in the course of 2018. Then, people should really look out for the public reviews and engage with them. That would be the best way to prepare for it.

Another option to engage with that type of development is TAPICC, a GALA-initiated project for making the “the industry API.” You even don’t need to be a GALA member to participate in TAPICC, you just need to sign up, click through an open source contributor agreement, and you can participate. So GALA TAPICC is another good venue to influence the development of the JLIFF and object model if you aren’t necessarily capable of doing it through OASIS.

Thanks David for sharing!

 

For me, the XLIFF-OMOS project it is an exciting development, as it represents the possibility of leveraging XLIFF’s strengths—including its advantage of being a well-adopted standard—to solutions that aren’t file-centric. It’s an innovative step for localization towards our new world of agile, web-based, API-fueled, system-to-system process capability.

What do you think? What new possibilities do you see from having “serialization alternatives” to XLIFF?

Topics: Localization Technology

All

Read more from our blog

Featured Post
This is What a Highly Mature Localization Program Looks Like
Moravia and Microsoft Office team up to share how localization program challenges were met through program change and evolution—for both supplier and customer.

Lee Densmer

Most Popular Post
English is Weird: Starting With the Word 'UP'
English is difficult to learn. One example is the use of the word 'UP', and the vast number of concepts that contain it.

Lee Densmer

Subscribe

Follow us    

Other Moravia Blogs

Why the Differences Between Pharma and Device Translations Matter
アフリカのロングテール言語のローカリゼーション:課題と見返り
关于SEO 本地化不得不说的事
Globally Speaking is a program for and from localization professionals.