Toshiba Digital Solutions has built up an extensive track record with Systems of Record (SoR) such as company and social infrastructure enterprise systems. One of the projects representative of its digital transformation activities was its development of "coestation." The coestation is a completely new speech platform service that uses Toshiba's speech technologies to easily generate human voices and connect "people who want to use voices" with "people who want their voices to be utilized." In this issue, we introduce the key points of the development of coestation, which used methods required of Systems of Engagement (SoE) to stimulate needs and create and evolve new markets and new businesses.
"Let me do it!" That's what I said, without a moment's hesitation, when I was approached regarding Toshiba Digital Solutions' new business creation project. The project's theme was the creation of a completely novel platform which leveraged Toshiba's speech technologies. The innovative concept was to collect and store peoples' voices as a digital data and provide this data in various forms as services.
It would use state-of-the-art digital technologies such as the Internet of Things (IoT) and Artificial Intelligence (AI) to create new systems and services that directly contributed to business. Achieving this kind of digital transformation (DX) with SoE was no easy task.
In some cases, DX is performed by making bold investments to acquire outside technologies and resources, but the coestation was developed with an eye on ROI, minimizing expenses. Furthermore, in order to clarify the final shape of the service under the development, it required the identification of what and what degree should be done to rapidly respond to changing needs and environments, what and what degree could do to keep up with the evolution of state-of-the-art technologies.
The Solutions Center, which has a track record of developing systems for a wide range of business types and operations, has a culture of enthusiastically tackling new challenges. Our IT industry customers are highly attuned to the cutting-edge technologies that are constantly being created. As times change, the shapes of our customers' businesses are also changing, at an accelerating pace. We keep our eyes on the future vision of our direct customers, and of our customers' customers, while actively deploying various state-of-the-art digital technologies to create solutions and offer operation support.
Because of this background, I felt no resistance to this unprecedented project, but rather a strong sense of motivation from the members in the department handling the project. People spontaneously spoke up, saying "I'm also interested" and "Tell me, too." We formed a team, primarily of young engineers in their 20s and 30s, that brimmed with a willingness to take on new challenges.
The goal of coestation was to create a platform service that would support the realization of "a world in which everything speaks."
Dialog-based UIs that use natural language to control machines are being employed in smartphones, smart speakers, car navigation systems, and other products (devices), and are already becoming a part of our everyday lives. People can easily and painlessly enjoy various experiences by speaking to their devices. This speech-based operation will continue to evolve and rise in prevalence. The value of voice is transforming significantly, and the future depicted in science fiction (SF) of talking to AIs via wearable devices and robots in our day to day lives may one day be a reality.
The concept we developed was a service that connected "people who want to use voices" with "people who want their voices to be utilized." First, we recorded peoples' voices. We used Toshiba's audio technologies to learn the characteristics of peoples' voices and generate "coe" (voice dictionary) data – alter egos of peoples' voices. These "coe" could then be made to speak arbitrary text for use in a variety of communication (Fig. 1).
We also prepared methods for voice suppliers to acquire the rights to their voices, and various means of distribution for using the voices in different forms. Our goal for the future is to collect various voices from people around the world, and use this massive collection of "coe" in a voice platform that can be used to select the optimal "coe" and use it with robots, wearable computers, games, social networks, and other content.
We decided that the first step of this coestation project would be to create a smartphone app that offered the service to general users. The development project set off, led by project team members overflowing with a drive to innovate.
To develop coestation, we needed to overhaul our previous approach from the ground up. In conventional development, which is primarily focused on enterprise systems, detailed designs are created based on thorough studies during the initial process stage. Detailed resource allocation is then performed, and the project is implemented by following the planned procedure. We excel at this type of waterfall development, which is focused on high quality and stable operation, and have an extensive track record.
However, this project sought to create a service for the general public, and required flexibility and agility. How would the coestation be used? What functions would it need to be provided with in order to achieve this? What kind of market size could we expect? Although the ultimate vision of the application was still unclear, we decided to start by performing scalable development at minimal cost. This is what led us to choose cloud native development. Cloud native development would make it possible to scale the system automatically based on the amount of traffic and requests, without fixing the scale of the IT infrastructure, such as the number and capabilities of servers. This would in turn help reduce expenses. We also chose to use agile development, as it would allow us to immediately confirm and improve operation as functions were completed. We used short cycles of individual function design, implementation, and testing, which made it easy to provide feedback to the design and development processes. This enabled us to flexibly and rapidly adjust the trajectory of the project to meet changes in customer needs. We also introduced Continuous Integration (CI)/Continuous Delivery (CD) tools to automate building, testing, and deployment, which were previously done by hand. Testing is performed automatically, without the need for people to direct it, and the results are fed back and the development environment is always kept in the most current state.
The development approach of detailing requirements in actual working systems, instead of on paper, and the use of tools that support efficient work improved project productivity and produced an environment that was more conducive to collaboration between members. They made it possible to immediately implement new ideas as functions and enabled employees to promptly evaluate them and discuss what direction to take to implement improvements. This resulted in more frank exchanging of opinions within teams and greater communication, and significantly contributed to increasing the allure of the service (Fig. 2).
Furthermore, as the service is directed at the general public, it required not only knowledge about the evaluations and procedures used inside and outside the company for conventional business system commercialization, but also a great deal of support from individual departments within the company. This support, together with repeated implementation and testing, enabled the product to overcome its final challenge, outside review, without problem. This created a tremendous sense of joy and, at the same time, a strong desire throughout the company to support the tackling of new challenges.
In April 2018, we released the coestation smartphone app*. Using the agile development cycle of implementation and testing, we implemented various functions for generating and adjusting voices from voice recordings and provided them through a user-centric, easy to use interface. As a result, this app is now being used by a wide range of users, ranging in age from teens to users in their 40s, to easily experience a new world of voice communication.
* As of May 2019, the app is available for iPhones and iPads, and over 30,000 accounts have been registered.
In November, we released two services for business users: an easy to use, web browser-based editor for creating and editing audio content such as narration and guidance information, and a Web API (cloud service) for a speech synthesis engine that converts user-designated text into speech in real time. Users can select services that fit their needs and easily synthesize speech which can be used in a wide range of applications, such as store PA announcements, train station announcements, games, and animated works.
New services which use "coe," developed through co-creation with customers, are starting to make their appearance.
We are also building security mechanism that prevents others from using “coe” on their own "coe" and systems that make it possible to enjoy the convenient service with peace of mind while securely protecting voice-related rights.
Speech-based communication is growing and the value of voice is about to undergo a major transformation. In the midst of these changes, we have successfully sown the seeds of the future with coestation. In the future, the development team and operation & maintenance team aim to develop solutions through seamless collaboration, to improve their team strengths, and to expand functions while actively listening to and reflecting user feedback. Toshiba Digital Solutions with the coestation is aiming to play a central, leading role in the evolution of voice culture.
* The corporate names, organization names, job titles and other names and titles appearing in this article are those as of May 2019.