This was our first Demo Night event of 2020 and we were really glad it was hosted in support with MyPlanet. In this edition of the demo night, we had startups showcasing their products from the UK, Latvia, USA, and Canada that have developed amazing applications in various dimensions of voice by utilizing image and speech processing, incorporating emotional intelligence and couples counselling, assigning addresses to each location on the planet, supporting full voice-based social network, and simplifying the development of voice applications.
Meet the Startups
Liopa is a spin-out from Queen’s University Belfast and the Centre for Secure Information Technologies (CSIT).Liopa was incorporated in November 2015 and are commercialising over 10 years of research in the field of Speech and Image processing with particular focus on the fusion of Speech and Lip movements for robust speech recognition in real-world environments.
What3words is a really simple way to talk about location. It has assigned each 3m square in the world a unique 3-word address that will never change and its vision is to become a global standard for communicating location. People use what3words to find their tents at festivals, navigate to B&Bs, and to direct emergency services to the right place.
We Thank Them For Supporting The Event
Myplanet is a software studio and certified BCorp, which helps the world’s most influential organizations to craft intelligent software in order to empower their employees and engage their customers. Myplanet owns deep capabilities in data-driven design, AI implementation. Check out their website here.
What languages are supported by the Liopa?
Liopa: Yes, so today, we only support English. We will support other languages on a business need basis. And for us as with any other speech recognizer, it's just a matter of developing training data in that language. So transcriptions of people saying certain set phrases in that language.
What's the processing time, like, for videos of different lengths? A lot of audio processing systems now have an, I'm not gonna say "real-time" but "near real-time". So they can sort of simultaneously upload and get back responses. Is that possible now with your system, that's something you look to include?
Liopa: Our processing is faster than real-time. So what I mean by that is, if you had a clip of five seconds duration, it would take less than five seconds to process it. What you're talking about there in this sort of immediate response is this sort of streaming mode where you would do the processing with star as you're still talking. So that actually requires a different processing chain. So we can do that generally. you'd lose a bit of accuracy. When you do that. If you've used the voices system on your phone, you might notice when you're talking to that it'll try and transcribe as you speak, but when you finished, it will actually reverse and maybe change what you said. And so that's one way around that problem. So the short answer is, yes, we can do it. But we don't have a demo of that today. What you saw there was the batch mode. So you make a recording, and then you upload it to the server, and it produces the results. And so the idea there is to prove that it works. And so the streaming option, which is what you suggested, though, is we're currently working on
What is your current business model (Liopa)?
You said at the last minute about the privacy without storing the data, or seeing the data or analyzing what's going on, I'm just curious how you improve your own models (Asya)?
Is either of you married and for how long?
The freedom to operate for machine vision-based social engineering is heavily patented. So the freedom for others to operate in this space was literally thousands of patents that would otherwise limit this kind of application. Nothing's impossible. So I'd love to know your thoughts on how you are currently in a plan to navigate the world of patent protection in this space?
I'm just wondering if you've got any feedback about the accuracy of the emotions like you did the training of the visual data set, which obviously has some correlation with the voice. But you know, there's the context of the words, language, how people communicate in different ways. And I am just interested to see, you know, the feedback or what you're hearing from the people actually using the application?
How are you doing right now in terms of revenue? Do you feel a lot of traction With your current audience?
What is your business model? So how do you make money? Because I feel like to really roll this out you'd want to just redo the entire address across all applications (What3Words)?
My question has to do with sort of the word selection. So that's a lot of grid squares on Earth. I assume you didn't choose them all manually. So my question is, is there a correlation between locations and what words are chosen? And I guess relatedly if you are the lucky business, for example, that is on the square like smelly fart apple. How do you deal with that being your address?
In one of your slides, you mentioned about Uber API. I was curious to know that if Uber agreed to let you access their API and the other one is How do you approach the vertical space?
Like a building with hundreds of floors?
I just wanted to understand how is VoiceFlow tool better than Dialogflow because one thing I observed in dialogue flow was that maintaining the context of the conversation is very difficult. If you say something and then you continue it with another, the other thing that the user sees, it is difficult to organize that flow because then it goes into many flows and the last sentence doesn't really recognize word context. So how do you manage a context in Voiceflow?
You said that Voiceflow supports all of the SSML stuff? Is that the one in the W3C standard? Or is that like what Dialogflow and or Alexa support because they both have limited feature sets for SSML?