A language is difficult enough to master with all its nuances. It’s even trickier if you want to come up with complex rules describing its behavior, and it becomes quite a challenge if you want to use those rules to put the right words together to convey the intended meaning. Now expand that from a single language to a few hundred, and you have the perfect playground. A playground where linguists, hackers, and data enthusiasts can get together and create some cool new things.
A few weeks back, Moravia partnered with Kiwi.com to do exactly that. In our first ever hackathon, we put together sample data from both the localization and travel industries and let a handful of teams play with it. Our hackers were from all walks of life: we had programmers working for major corporations as well as freelancers, system architects, data analysts, linguists, researchers, and students. The common denominator was the eagerness to build something cool.
Now let me dive deeper into what the teams built. One created a cool multilingual iPhone Travel Assistant app. It lets you quickly search for plane tickets and hotels, utilizing speech-to-text for input and text-to-speech for output. A great example of how you can utilize existing APIs to very quickly put together a specialized app.
- Find a bunch of clever people eager to take things apart and put them back together (not necessarily in the same way).
- Give them shiny tools and interesting data to play with in a cozy place where they can work and (optionally) sleep for a day or two.
- Ensure a continuous supply of coffee, pizza, and beer (if you’re hacking in the Czech Republic).
- Have coaches stir the batch when needed.
- Be impressed with the results. :)
Another team came up with a bit of a cryptic motto: “Dibuk. Read a book!”. They put together a web app where you can upload/link an ebook and it gives you a list of the most difficult words in the book along with their translations. Let’s be honest, even native speakers could use it, as some words are difficult no matter what. Learning those words in advance will help people stay immersed while reading the book. It's a nice example of utilizing natural language processing. The word difficulty baseline comes from an analysis of millions of sentences from Wikipedia, which were passed through a rather advanced lemmatization. The difficulty score itself is heavily based on the tf-idf principle with some tweaks.
Then there was a team working on a chatbot that provided customer support by pre-processing users’ requests. In a nutshell, it is a flow driven by an analysis of the user’s intent. As long as the user’s intent is unambiguous, the bot is in charge of answering. The moment it becomes unclear what the user needs, the flow switches to a real support person (passing to the operator the suggestion of what the intent could have been and the previous communication with the bot).
Last but not least, one of the teams created a large-scale crawler of multilingual news articles (luckily, we provided the teams with plenty of AWS credits). Users could ask a chatbot in their native language to watch out for specific keywords. If an article containing those keywords got published anywhere in the world, in any language, then users received the notification and also the translation of that article.
And the winner is...
In summary, the results of the 24-hour effort were astonishing. Frankly, it was difficult to pick the winner, as all teams did great, especially considering the time constraints. If you’re curious who won, it was Dibuk, the vocabulary builder, but that’s not the main point. The moral of the whole hackathon is that even if you’re a fairly nimble company, such as Moravia or Kiwi.com, you can still get inspired by the fresh ideas people will have when you let them be creative. Are we going to do another hackathon? Hell yeah!