Voice related technology, combined with AI (Artificial Intelligence) and machine learning, presents an opportunity for a revolutionary jump in how intelligence is processed and exploited by Defence. What if I told you that this article was transcribed using voice in under an hour and we could use the same technology to automatically transcribe radio logs, bulk analyse intelligence data and automatically match identities of persons of interest whilst patrolling.
Voice technology has come a long way since the early office speech software of the late 90s and early 00s. Devices all around us are better at interacting by voice, almost to a conversational level. The main benefits in this instance are that it allows us to operate hands free and it is predicted that the proportion of internet searches by voice will continue to rise. The rate of growth in this area is rapid, given the first major voice assistant, Siri, only appeared in 2011 with follow-ons from Amazon (Alexa) and Microsoft (Cortana) in 2014. Business use cases are even more progressive, especially in the use of call management and triage at major call centres. Entire conversations can be transcribed into text, and then the vast quantities of data can be analysed for patterns to improve the customer experience (or provide the call hander with suggestions to increase the likelihood of a sale). The technology is readily available now and is becoming increasingly normalised in society, so what benefits does it offer Defence?
Two use cases will be put forward, but there are many more that could be considered, such as automatically transcribing tactical radio logs and automating the SPOC (Single Point Of Contact) IT phone systems. The use cases do not consider routine office business, which it could be argued is where the greatest impact could be delivered with the ability of voice technology to automatically transcribe meetings, capture action items, prepare summaries and schedule future meetings.
Use case – Dismounted Close Combat
We have reached the point where dismounted soldiers will soon carry some form of smart device with them, usually to provide real-time situational awareness in a small form factor such as ATAK (Android Tactical Assault Kit). In previous operations we have carried additional devices that can be used to biometrically enrol individuals encountered on patrol, which are able to alert the operator if the individual has been enrolled previously or is wanted for detainment. This process took time and had the drawback of carrying additional equipment, but is likely to remain as a core capability with the tender to replace the current Biometric Data Capture System issued. What if the identity of the person could be verified in the background whilst they are being questioned, effectively as an app running on the existing situational-awareness device (ATAK)?
This technology is already used by banks such as HSBC and even HMRC so it is at a high technology readiness level (TRL), but apparently not without flaws. This does not suggest using it to post a cruise missile through someone’s front door, but to enhance situational awareness at the tactical level. What if the same device could provide automatic real time translation to the tactical user? Going one step further, it could automatically document the conversation as text in near real time and upload it to a higher formation or into a central intelligence database for future processing. It might even be able to assist with dialects, indicating an individual’s regional dialect, which could then be used to support questioning. In summary the main benefits at the tactical level are as a means of confirming identity, or building identity intelligence, and as a means of real time automatic translation and transcription.
Use case – Intelligence Analysts
The second use case covered steps back a level to the intelligence analysts that could be situated forward with the BG (Battle Group), or operating at Brigade or Divisional HQ. It is at this point that fusion of intelligence sources will first take place. The single greatest constraint on the processing of information at this level is the sheer quantity; there is so much that it would take too long to go through everything manually. What if software could run over content extracted from all of the mobile phones, physical media (DC/DVD/etc), laptops and computer hard drives, and subsequently transcribe all of the audio and video files it found. This could then be searched for key word analysis, or as at the tactical level be used to identify the speaker to enhance the intelligence picture. A human would still need to be in the loop, but this would allow more effective prioritisation and further processing could be used to spot trends (such as the use of new code words) and once again the identity of suspects could be identified and passed onto the watchlist used at the tactical level. This scale of this voice to text transcription could be expanded to a theatre or national exploitation facility. At this level information be integrated with other national and international agencies in a step towards fusion doctrine.
Obstacles That is not to say that this future of real time automatic translation and voice recognition is immediately available to us, there are many challenges that need to be overcome, some of which are quite significant. The ethical and policy hurdles are probably the most difficult, this is a legislative minefield that could bog down the delivery of any capability for years by which point any technological advantage that could have been gained has eroded. With the recent Home Office Biometrics Strategy, some progress is being made here but that is contrasted with HMRC having to delete 5 million voice files as the manner they were collected broke privacy rules. Data management is another challenge that encompasses both the difficulty of dealing with vast quantities of data as well as the standardisation of formats between various systems so that everything gets along nicely. How do we decide what should be transported across the enterprise and do we have the broadband pipe to do so?
Not all of the technology discussed here can just be rolled out to the close combat soldier, a significant hurdle is the ability to process this content in a disconnected tactical environment without using some form of reach-back processing (as most current voice assistants do). As processing power on small devices increases this will become less of a problem. To identify someone using their voice you first need a known sample to compare against which requires some initial collection, but records can still be created against unknown persons of interest. The key challenges of integration, training and equipment would also need to be addressed to make this a cohesive capability. It is clear then that there are many areas that need consideration before the military adoption of this technology which is perhaps why it may be several years before we see the widespread introduction of such capabilities, but it is surely only a matter of time.
This might all sound highly fantastical but the technology for different areas touched on is rapidly evolving. Identity matching by voice is already in commercial use in highly sensitive areas such as banking and tax management and there is exponential growth in voice transcription in call centre management and health services. So, what would need to happen next? Firstly, the benefits that could be brought to Defence use cases, such as those suggested above, need properly mapping out. Then a route can be plotted to achieve those benefits, identifying areas for quick wins and the technological critical path, and fully defining the requirement. Lastly some financial resource would be needed, which presents a challenge in the current financial and economic climate. The world is rapidly changing, this is an area that presents an evolutionary jump in capability across many areas, we just need to have the vision and commitment to make the most of this opportunity.