By Yami Payano, CEO - Sign-Speak and Nicholas Wilkins, CTO - Sign-Speak
Sign-Speak is a first-of-its-kind sign language recognition engine designed to provide equitable access to Deaf and Hard of Hearing individuals. At Sign-Speak, we provide solutions to enable bi-directional communication between English and ASL. To do this we employ a variety of technologies from computer vision and machine learning to avatars. For communication to be truly seamless, we strive for the lowest latency possible. To that end, we’ve worked with Verizon to achieve low latency at the edge.
An introduction to Sign-Speak
Sign-Speak is a communication platform that translates between ASL and English to allow Deaf and Hard of Hearing people to easily communicate with hearing non-signers. Sign-Speak was born out of one of our co-founder’s (Nikolas Kelly) frustrations as a Deaf individual interacting and communicating with hearing non-signers. Specifically, Sign-Speak provides an API to enable video streams of ASL to be interpreted into spoken or written English (via machine learning), and English streams (both spoken and written) to be interpreted into ASL (via an avatar). This bi-directional communication in ASL is crucial as many Deaf and Hard of Hearing individuals’ first and primary language is ASL — a fully-fledged language with a distinct grammar and syntax. Being asked to constantly use written English is fatiguing and impractical for many individuals.
Our solution operates on any camera-enabled device anywhere, at any time. Specifically, we provide a Software as a Service API that can be integrated into any application or service with minimal impact on end-user performance. To do this, we offload all of our processing to our backend servers, to enable accurate and real-time inference on any device regardless of computing power.
We have several sample solutions built off of our API. One notable application is our ASL chatbot. This chatbot uses our API to allow Deaf and Hard of Hearing individuals the ability to easily communicate with internet chatbots for customer support all while using their native language. The user signs their questions into a chatbot and is guided along by an avatar who prompts them to answer several questions.
Engineering performant ML is no longer enough
Sign-Speak’s real-time inference currently is run on top of a microservices architecture hosted in Kubernetes running off of Google Kubernetes Engine (GKE). Our cluster auto-scales to the request-load to ensure that we are always performant. Notably, to enable us to drive down costs, we run all our ML on specialized CPU models. This is performant, and scalable, due to the asynchronous system we’ve engineered. As a result, most of our end-to-end delay currently comes from transport latency, not that of the model itself.
Interpreter lag time and the importance of latency
All interpreted (or translated) conversations have a “lag-time” for processing. Lag time is roughly defined as the time which must be waited from a message being uttered in a source language (e.g., English) to being translated into the destination language (e.g., ASL). For Sign-Speak, our lag time consists of roughly three buckets:
- Latency: the time it takes for frames to stream from device to our cloud (has an average of 600ms latency)
- Processing: the time it takes for the frames and underlying linguistic information to be processed (has an average of 300ms latency)
- Inherent: the time which must be waited to enough of the full sentence to be able to start translating (varies depending on phrase and phrasal structure)
While the third inherent lag time is a necessity due to differences in grammatical structure, latency and processing time eat up a sizable chunk of our overall lag time.
To be able to effectively facilitate conversations, both on the web, and in person, we want to drive down lag time (especially processing and latency lag time) to provide as seamless an experience as possible.
Pilot testing Verizon 5G Edge with AWS Wavelength
In our experience, Verizon 5G Edge with AWS Wavelength was an opportunity to level the playing field from best-in-class home experiences within the home and the universe outside of it.
To get started with our pilot testing efforts, we needed to migrate a portion of our stack over to AWS. As we use Kubernetes and Terraform, this migration was trivial, requiring less than several hours.
To test the impact of AWS Wavelength, we created two duplicates of our stacks: one within the us-east-1 region and one within the Boston Wavelength Zone. We measured the overall round-trip latency from a Verizon device to both of these stacks. The results (and a sample application!) are shown below:
As you can see, translation is significantly faster via edge computing. This over 40% speedup is clearly noticeable by and impactful to the end-consumer of our services.
One additional optimization often used to speed up ML applications is switching from CPU-backed virtual machines to GPU. In our experiments, this switch often saves around 30–40% of the total latency budget, depending on the architecture, batching, etc. Wavelength, coincidentally, saves roughly the same amount of latency as switching to GPU instances. Together, these two optimizations unlock unmatched performance for our mobile application to deliver frictionless translation.
Overall, running this test was surprisingly easy (took less than an afternoon to throw together an MVP), as Verizon provided Terraform modules that enabled an easy setup of Kubernetes clusters at the edge. As our entire architecture is fully elastic, we were able to deploy an entire duplicate cluster to the edge at minimal costs.
At Sign-Speak, we envision a world with functional equivalency in all devices for the Deaf or Hard of Hearing consumer. D/HH individuals pay for the same services that hearing individuals do while not being able to utilize the same function as those who can hear. This can be changed with the use of technology today. Integrating Sign-Speak API will be a plug-and-play solution for companies that are looking to address Diversity, Equity and Inclusion from a holistic perspective.
Through Verizon 5G edge we’ve been able to supercharge this reality and provide real-time low-latency interactions. We’re excited by the prospect to partner with Verizon to make the future more accessible for all and unlock digital inclusion for Deaf and Hard of Hearing individuals like never before. Sign-Speak envisions a future without communication barriers, where any Deaf or Hard of Hearing individual can communicate with any Hearing individual anywhere at any time. If you’re interested in learning more about our technology, piloting our tech at your business, or investing, please reach out to email@example.com or firstname.lastname@example.org.