The military information sciences competency’s focuses on “underpinning sciences, physical autonomy, and enablers required to provide timely, mission-aware information to humans and systems at speed and scale for all-domain and coalition operations.”1 The competency’s mission is to “research and develop component technologies for machine-assisted perception, machine-assisted decision-making and autonomous teammates that enable the Army to compete with near-peer opponents in contested multidomain environments.”2 For this review, the competency chose to divide and present their work into three research thrusts, rather than by core competencies. These three research thrusts are (1) look farther, (2) think faster, and (3) act smarter.
The look farther research thrust focuses on data and knowledge applicable to distributed operations, decision advantage, and defeat pathways. This includes machine-assisted battlefield perception and understanding using distributed networked sensors. Research in these areas includes the use of algorithms that produce accurate integration and semantic interpretation of diverse, high volume machine-analyzed sensor data, for machine-assisted situational awareness. The think faster research thrust focuses on learning and reasoning applicable to decision advantage. The research uses reinforcement learning (RL) and game theoretical approaches to produce tactically sensible courses of action, involving planning and task allocation processes for manned and unmanned platforms that decompose complex tasks into inter-related subtasks, with understanding of the diverse capabilities and limitations of each member of the team. The act smarter research thrust focuses on action and collaboration applicable to distributed operations and defeat pathways. Research in this thrust will enable intelligent unmanned aerial vehicles (UAVs) and unmanned ground vehicle (UGV) platforms to perform autonomous navigation, path planning, and fast, dynamic driving and flying through complex, dense, highly unstructured ground environments. Reasoning will allow the platform to maneuver effectively with respect to threats and friendly supporting assets, and do so collaboratively with multiple, heterogeneous autonomous and semi-autonomous systems.3
The teams within this research vein focus on understanding the principles and methods of inferencing, planning, and learning that accelerate individual, organizational, and autonomous agent decision-making. The act smarter research thrust has a heavy focus on complex behaviors for ground autonomy and C-4 multidomain operations, and robotics and physical autonomy. They develop autonomy algorithms compositionally that enable intelligent systems to operate robustly, team with soldiers, and conduct joint multi-domain operations in military relevant environments.4 On September 6–8, 2023, the Panel on Assessment of Military Information Sciences received presentations at the Aberdeen Proving
___________________
1 U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory (ARL), 2022, “DEVCOM Army Research Laboratory Foundational Research Competencies and Core Competencies,” March.
2 ARL, 2023, “Military Information Sciences Competency Story Operationalizing Science for an Army that Looks Farther, Thinks Faster and Acts Smarter,” August 25.
3 Ibid.
4 ARL, “Military Information Sciences Competency Overview.”
Ground on the military information sciences competency focused on its three research thrusts areas. The writings below provide a summary of the panel’s findings in reference to the nine assessment criteria questions.
The science in the look farther research thrust area was found to be on par with leading universities and the competency’s scientists were making good connections to outside researchers. Their collaborations such as the Army Research Laboratory’s (ARL’s) participation in the 1-year Sprint awards, demonstrates a cost-effective and impactful approach to get the pulse of the technology base and conduct some preliminary feasibility studies and research outcomes. Both the intramural and extramural scientific questions being addressed, such as parameter model explosion, learning in the presence of untrustworthy data, integrating multi-modal in relevance and relation to human understanding of language were on target with current science and had a good identification of problems and challenges. This demonstrated that ARL has focused on the most pressing barriers and opportunities to advance artificial intelligence (AI) utilize sensor data and create robust methods for integrating multi-modal data.
The researchers at ARL (both intramural and extramural) demonstrated that they had a broad understanding of research conducted outside of the organization, including other federal agencies (e.g., NASA). Many other organizations have staff in visiting positions to foster long-term research-focused collaborations. One suggestion that may strengthen these collaborations could be to bring in researchers from many different institutions in a formalized program. A good example of such collaboration networks is the NASA Joint Venture program that brought in researchers to the NASA facilities over the summer and continued the collaborations after the researchers returned to their institutions.5 Additionally, since there was a heavy emphasis in the ARL knowledge base on research coming out of academia, strengthening these opportunities to quickly ramp up projects and connect with future intramural employees could be impactful. ARL may also consider bringing experts into ARL on longer sabbaticals.
Furthermore, while the intramural researchers have a good understanding of the research done outside of ARL, there is an opportunity for these researchers to gain greater knowledge of the cutting-edge work being done across and within the ARL organization, as the same level of awareness of what is going on in the field does not appear to be reflected internally in the ARL organization.
While it is not feasible for all research teams to be aware of every other research project underway in ARL, a useful exercise may be to have each team find at least three other works by other ARL intramural research where their work could (1) utilize and share data being collected or already collected and (2) broaden the impacts of team research by using the outputs and outcomes of the research being conducted by parlaying it into other research ideas. Such cross-pollination between ARL researchers may lead to innovative research directions and, potentially, joint projects that leverage the talents of the intramural teams through greater connections and cooperation with each other.
Continuing this theme, several projects internal to ARL presented to the review panel could benefit from greater cross-pollination between the intramural researchers. For example, the act smarter presentation “Terrain Perception for Autonomous Maneuver” looks at Army systems that must maneuver through arbitrary environments (daytime and nighttime, novel domains, and degraded perceptual and
___________________
5 See D. Wold, 1996, JOVE Final Report, Little Rock, AR, University of Arkansas at Little Rock Department of Physics and Astronomy, https://ntrs.nasa.gov/api/citations/19970041452.
uncertain traversability conditions) and exhibits an excellent identification of key technical challenges, which include how to manage data modalities and representative data sets, how to approach multi-modal fusion, domain adaptation, uncertainty estimation and semantic-based learned traversability estimation.6 This project could be foundational for exploring some of the multi-modal data fusion and data collection approaches for others at ARL to follow. The “Perception Under Extreme Conditions, Including Computational Imaging” presentation showed a good understanding of how synthetic data can be utilized in conjunction with real data. The Arch-angel data, which is the first UAV-based human detection data set with real and synthetic sub-data sets captured with UAV flying across 15–50 m altitudes and rotation radii with 5 m increments and complete UAV position metadata,7 was a good model for others within the organization to allow for external engagement with benchmark data sets. Exploring methods for data set augmentation for weather, lighting, clutter, and recoloring may also be helpful to this effort.
The presentation, “Heterogeneous LiDar Sensing” focused on vortex detection and atmospheric turbulence. This is a scientifically important problem. This work could be impactful for other projects, such as the act smarter project on “Synergistic Deception and Swarm Comprehension in Swarm/Integrated Air Defense System (IADS) Engagements,” which are looking at swarm and aircraft counter research to help identify when aircrafts are most vulnerable. Fusion of other multi-modal data and computational fluid dynamics (CFD) data could also be incorporated. Additionally, the project may benefit from collaboration with the Federal Aviation Administration or the NASA Data Visualization Animation Lab, whose point of contact would be Kurt Severance.
In other projects presented during the visit, the team saw common thrusts and synergies for some of the research presented. For example, the project “Synthetic Data” has a good research thrust in generating visual to electro optical imagery. This connects to the previously discussed project that was presented to the team, namely, “Perception Under Extreme Conditions Including Computational Imaging” which investigates the limitations and benefits of using synthetic versus real data. This is impactful work that is leading a multi-modal data fusion thrust. It could be instrumental in connecting and building synergies between projects. It could also be noted that these projects could benefit from using the ARL Robotics Research Collaboration Campus (R2C2) facility at Graces Quarters to perform testing.
An important research question that evolved from reviewing these works is “how to understand the limitations of synthetic data?” Such a question could be elevated across the organization as it has ties to non-line-of-sight imaging and counter autonomy. Synthetic data are used to train systems, but very few researchers have investigated how it compares and contrasts to using real world data. This is important as more systems trained with synthetic data move to higher readiness and deployment levels. Systems trained on synthetic data could fail miserably when deployed in a real world scenario. Contrastive learning was another really positive and innovative direction. This work shows ARL’s leadership in developing new AI architectures beyond the traditional off the shelf neural networks and other standard deep learning algorithms. It will be essential to help distinguish what features are lacking or present in synthetic data that distinguish it from real world data. The references in the footnotes below on contrastive learning may be helpful to ARL.8
___________________
6 P. Osteen, 2023, “Terrain Perception for Autonomous Maneuver,” DEVCOM ARL, September 6.
7 J. Hyatt and H. Kwon, 2023, “Perception Under Extreme Conditions, Including Computational Imaging,” DEVCOM ARL, September 6.
8 Y. Tian, O.J. Hénaff, and A. van den Oord, 2021, “Divide and Contrast: Self-Supervised Learning from Uncurated Data,” IEEE/CVF International Conference on Computer Vision (ICCV), https://doi.org/10.1109/ICCV48922.2021.00991; H. Kuang et al., 2021, “Video Contrastive Learning with Global Context,” IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), https://doi.org/10.1109/ICCVW54120.2021.00358; K. Kotar, G. Ilharco, L. Schmidt, K. Ehsani, and R. Mottaghi, 2021, “Contrasting Contrastive Self-Supervised Representation Learning Pipelines,” ICCV, https://doi.org/10.1109/ICCV48922.2021.00980.
The assessment criteria ask for commentary on ARL’s use of research methods and methodologies. The researchers at ARL were consistent in showing state-of-the-art methods.9,10 State-of-the-art methods refer to the use most innovative and best performing models, techniques, algorithms, and technologies in ML. While utilization of state-of-the-art models from the public domain is commendable for development of baseline approaches, new problems may require new approaches and there is an opportunity for novel model development, as well as to develop foundational models that run on standard models, transfer learning and quaternion architectures, especially where there was a trend around model complexity and hyper-parameter optimization. ARL can push the state of the art both by attacking new problems and, in some cases, developing new approaches.
Additionally, in viewing the overall portfolio there are many areas that could leverage more modeling and simulation. Modeling and simulation tools were fundamental throughout all the presentations and because they are so important, it would be beneficial to create a synergistic platform of tools for every project to utilize for their simulation and modeling needs. This includes projects developing their own specific tools to have their work integrated for wider dissemination so others do not need to “reinvent the wheel.” Modeling and simulation algorithms are dependent on robust data. This means that every project needs to have access to or the capability to create, acquire, and curate data, and so, data set curation is critical to support a strong foundational modeling and simulation framework.
Increasing the scale of experimentation and curation of data sets that can be shared across ARL would create a robust foundation for all military information science projects to leverage. Too many projects are trying to build their own frameworks for experimentation and modeling. By using more mixed mode simulator/emulators, both hardware and software system designs could be tested and software could be developed before the hardware is built. As more human interaction and human behavior centered experiments are needed to understand how real world scenarios would impact a human’s reaction and decision-making, having this kind of framework would be a valuable tool for any team developing systems that enhance human decision-making performance without cognitive overload. This could be a focus that is distributed across projects and programs.
Additionally, every team appeared to utilize its own modeling and simulation tools, yet no unifying framework for the entire competency was presented. Teams should be able to plug into a simulator and have it accept different kinds of models. For instance, as a simplified real example, in the Very-High-Speed Integrated Circuit Hardware Description Language (VHDL) hardware logic modeling used for circuit design there are different levels of abstraction used. A model can be high level (behavioral), and as a design matures, more structural, component, and lower-level, detail-based models can be substituted in for the higher-level of abstraction. This allows greater experimentation, where a specific team can have a detailed computational model in a system environment to test with other interactions and models. Model sharing and reuse would help expedite research across the organization. One suggestion is to utilize the Institute of Electrical and Electronics Engineers (IEEE) and SPIE organizations that have established standards and provide a wealth of publications on these topics.11
Furthermore, achieving data fusion is an incredibly important aspect of research that cuts across all the research. A first step to accomplish this could include doing a survey of what data sets and data types are most commonly collected in ARL’s intramural and extramural experiments, such as video, images, audio, satellite, underwater, and eye-tracking data. This includes thermal, hyperspectral, or other kinds of sensor data. After understanding the most commonly used forms of data, the investigation could establish which ARL teams are experts in manipulating that kind of data and develop strategy for more
___________________
9 N. Srivastava, 2022, “What Is SOTA in Artificial Intelligence?” E2E Cloud, https://www.e2enetworks.com/blog/what-is-sota-in-artificial-intelligence.
10 R. Chandra et al., 2020, “Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs,” IEEE Robotics and Automation Letters 5(3):4882–4890, https://doi.org/10.1109/LRA.2020.3004794.
11 IEC and IEEE, 2021, “IEC/IEEE International Standard-Behavioral Languages—Part 6: VHDL Analog and Mixed-Signal Extensions,” June, https://doi.org/10.1109/IEEESTD.2021.9456808.
data fusion. For instance, video of scenes from drones combined with satellite imagery and night vision and thermal imagery combined with virtual reality generated model data. If one does a search on recent works in data fusion, one can see that there is very little recent work that performs a comprehensive review of more than two modalities of data fusion. This is an opportunity for ARL to lead because no one else has access to such rich and diverse modalities of data to perform an in-depth analysis to create the standards for data fusion. ARL may find the references on data fusion in the footnotes below to be helpful.12
The assessment criteria ask to comment on any specific areas, if found, where the research may be at major risk of not meeting its objectives and to provide reasoning. The major risks to the portfolio are the duplication and independent collection and usage of data by different teams. Some are doing data scraping and some are constructing experiments to create custom sets. This is a labor-intensive process and it could risk progress since there are different standards and annotation requirements being utilized. Training researchers on helping to scope and write requirements for data would be impactful.
Some opportunities for synergies of individual ARL projects are listed above. Additionally, the “Synthetic Aperture Radar (SAR) with Graph Neural Networks (GNNs)” presentation showed a good utilization of off the shelf techniques. Still, it had an opportunity to explore performance using new network architectures. This work could move toward looking at other types of models including quaternions, or using models developed by ARL. This work shows potential for improved vision and detection systems to see through cloud cover and better recognition for UAV’s that need to deal with shadows on objects and different kinds of lighting conditions. This work would be very valuable to industries working with the Department of Homeland Security—if aspects of this work could be published. The IEEE Homeland Security Technologies Conference would be an excellent venue to disseminate this important work.
The presentations and posters shown during the review that focused on the think faster research thrust represent an impressive array of activities spanning the domains of natural language processing (NLP), including large language models; human–robot interactions, including dialogue management; knowledge representation, using connectionist and neural net models; and knowledge and skill acquisition using reinforcement learning. Overall, the think faster research thrust is doing high quality work. In some cases, especially with the extramural work, it includes leading advances at par with top funding agencies and research institutions, in others, especially with the intramural work, it showed a narrower focus on applications that, while not always groundbreaking, clearly relate to unique scientific needs at ARL. Some of these presentations represent cutting-edge, state-of-the-art research in the field of artificial intelligence. Evidence of the high quality of these works in the form of both prestigious peer-reviewed publications as well as demonstrations were presented to the review panel. One example that is worth noting is the work in information fusion. This has been a focus of numerous ARL efforts over the years, and for a long time progress was relatively slow. However, newer research that ARL presented on “uncertain likelihood” (the presentation “Strategy Adaptation of an AI Commander’s Assistant via Quickest Change Detection with Uncertain Models”) has taken advantage of emerging techniques, has
___________________
12 M. Schmitt and X. Zhu, 2016, “Data Fusion and Remote Sensing: An Ever-Growing Relationship,” IEEE Geoscience and Remote Sensing Magazine, December 4, https://doi.org/10.1109/MGRS.2016.256102; J. Gawlikowski, S. Saha, J. Niebling, and X.X. Zhu, 2022, “Robust Distribution-Shift Aware SAR-Optical Data Fusion for Multi-Label Scene Classification,” IEEE International Geoscience and Remote Sensing Symposium, https://doi.org/10.1109/IGARSS46834.2022.9884880.
recognized the underlying scientific issues (rather than the surface needs), and has published in the appropriate literature. This may not be highly cited compared to some other work, as it is in a niche that is not “mainstream,” but within the specialized communities that require the capability, this is a leading result and shows how a long-term commitment can lead to significant results when it is properly focused on the underlying issues and not simply external performance.
From its research portfolio, it was also apparent that ARL was aware of research trends in the broader scientific community, and ARL was funding top people and universities. The core competency is also using strong research methods and methodologies and there were good examples of both strong theory (mainly on posters) and presentations of specific work that used appropriate demos and experimental methods.
There is research that was presented in the extramural overviews and posters that clearly could connect to intramural projects, and vice versa, particularly in the areas of NLP, human–robot interaction, and decision support systems.
It was unclear, however, whether there was very much interaction between these groups (e.g., there were posters and presentations that did not seem to acknowledge work done by the other groups), and there is an opportunity for some of the world-class expertise of those in the Multidisciplinary University Research Initiatives (MURIs) supported in this area to contribute to the specific objectives of many of the intramural projects presented, particularly in the areas of NLP, human–robot interaction, and decision support systems. Leveraging the extramural expertise through deeper connections with the intramural researchers is encouraged.
Generally in science, and more specifically in AI and NLP research, certain terms are often used without specific definitions. Terms like “explanation, digital twins, and formal models” for example are sometimes used inconsistently, and this phenomenon, which is endemic to the field at large, was reflected at ARL in how they were used between projects. While it cannot be expected that ARL can develop its own “vocabulary” that tries to clearly define these terms, more interaction between the different researchers working on similar problems could help align the vocabulary that ARL researchers are using.
A few opportunities are identified below for where ARL may either expand on existing research, or move into novel research areas that may bolster its existing research areas. First, while it is understood that ARL, due to limitations on resources, cannot possibly engage in research on every aspect of AI and autonomous systems, one critical direction of research that did not appear to be adequately represented in the ARL portfolio is multi-modal human–computer interactions (HCIs).13 ARL collects massive amounts of data for human use. There is an opportunity to leverage this data for automated decision-making with additional research on processing. Thus, multi-modal HCI attempts to represent, capture, and communicate the same concept and idea using multiple interaction media or mechanisms. While NLP is a powerful component of this, multi-modal interactions, as in human–human communications and human–robot communications will include other mechanisms such as gesture recognition, voice feature (pitch, tone) recognition, and facial expression recognition. It is well known that natural languages (NLs) have fundamental difficulties of ambiguity and inadequate vocabulary, which can be ameliorated by leveraging other contextual, possibly non-verbal communication. ARL, in future, may wish to consider adding a focus on multi-modal AI to the competency’s portfolio. It would seem likely that such multi-modal knowledge representation and learning could help to facilitate human-robot interactions in a more accurate, reliable, and trustworthy manner than what is possible with using NLP alone.14
___________________
13 A. Holzinger, B. Malle, A. Saranti, and B. Pfeifer, 2021, “Towards Multi-Modal Causability with Graph Neural Networks Enabling Information Fusion for Explainable AI,” Information Fusion 71, https://doi.org/10.1016/j.inffus.2021.01.008.
14 C. Li, Z. Gan, Z. Yang, J. Yang, L. Li, L. Wang, and J. Gao, 2023, “Multimodal Foundation Models: From Specialists to General-Purpose Assistants,” arXiv (preprint) 1(2):2, arXiv:2309.10020.
Supporting the goals of the competency there is work on robotics, especially in the area of language, where “simplified” modeling and simulation was used instead of experimentation with actual robots or with the simulators used by the robotics researchers. The reason for this is that there is a well-known gap between the actual physical competencies of robots, both in sensing and action, and the linguistic capabilities that would be desired by those controlling them. Essentially, the robot needs something that can be translated into specific action sequences, and the human wishes to express a high level goal. Large language models (LLMs) may help, but LLMs have been shown to need significant improvement in planning applications to be able to generate the type of sequences needed.15 Thus, closer cooperation between these two areas could be very beneficial. It is suggested that the competency leadership work more closely with ARL experts in R2C2 to explore how this gap might be filled.
In terms of adversarial reasoning, it can be said that the academic AI community has largely been focused on very specific models that are unrelated to the competency’s goals, as well as either resource allocation questions (at a lower scale than is needed in many real-world problems) or on business-to-business needs where the “adversarial” nature is, for example, how to get fair performance in bidding for advertisements on a web page or personalization of price setting. The problems being addressed in this competency area on adversarial reasoning are obviously much more focused on the real and complicated world in which the military needs to function. Some of the scientific questions that might be most useful to the military information systems competency are being explored in other parts of the AI world (e.g., distributed learning work is aiming to scale to significant levels in the presence of limited bandwidth, such as would be needed in autonomous vehicles). See the related references in the footnotes below.16
While the goals for this competency are indeed ones on which the Army should, and must, double down—the competency could also consider casting a wider net on the modern AI front and could also make sure that opportunities coming up in these other domains are more scalable. Related AI problems may be unintentionally missed by the Army because of overly narrow definitions of adversarial reasoning, causal modeling, and the like. Two good examples of this are hybrid utility models17 and the use of graph networks (and graph neural network learning) in representing adversarial problems (see the related references in the footnotes below).18 One area within this competency, NLP understanding, is a good example of where things are being done in a way that casts a wider net on unique areas of scientific research. There has been enormous excitement in the past few years over LLMs, such as ChatGPT, and of course the Army can use some of those results. But those results do not always work for niche applications, such as training manuals that do not have large corpuses of related texts on which to train. ARL is commended for working on the limitations of LLMs for Army applications, and for find approaches to address those limits. Additionally, the LLM paradigms are not the solutions to all AI problems. There is still room for symbolic reasoning, and for hybrid approaches. ARL is to be commended for its work such as neuro-symbolic AI/ML.19 While the overall competency is focused
___________________
15 K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati, 2022, “Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning About Change),” arXiv (preprint), arXiv:2206.10498.
16 D. Otto et al., 2023, “Distributed Learning Ecosystems: Concepts, Resources, and Repositories,” Springer Nature.
17 Z. Zhao, A. Liu, and L. Xia, 2020, “Learning Mixtures of Random Utility Models with Features from Incomplete Preferences,” arXiv.
18 V. Hassija et al., 2020, “DAGIoV: A Framework for Vehicle to Vehicle Communication Using Directed Acyclic Graph and Game Theory,” IEEE Transactions on Vehicular Technology 69(4); B. Zhan et al., 2019, “Link Prediction in Temporal Networks: Integrating Survival Analysis and Game Theory,” Information Sciences 498; A. Duval and F.D. Malliaros, 2021, “GraphSVX: Shapley Value Explanations for Graph Neural Networks,” Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science, 12976, https://doi.org/10.1007/978-3-030-86520-7_19.
19 Unlike end-to-end ML, neuro-symbolic approaches can learn with moderate size data and can learn atomic events using only complex event labels. Their use in this case is to enable the detection and identification of complex events consisting of a collection of coordinated atomic events spread out over time and space and enable complex event processing that can adapt to changing operational environments.
ongoing from language to performance, this research areas allows for exploration of LLMs,20 such as the work in exploring training manuals, and other cutting-edge approaches that can fill the “gap” between the huge data-hungry needs of the commercial language community and the more specific needs that the Army has been exploring for a number of years. For example, several techniques have been shown to potentially improve the competencies of LLMs including Retrieval-Augmented Generation, Specialized Fine Tuning, and various forms of prompt engineering.21,22,23 However, the stated goal of the competency to develop formal models may be less relevant as these new approaches emerge.
It was clear from the posters and examples presented that the Army’s researchers are aware of the “elephant in the room” (the advent of commercial large language models), and it is important that the work in this competency is both exploring where that work needs to be improved for real-world use in time-dependent situations, and whether it could be applicable as a framework for Army-specific needs (such as training manuals). The poster “Neuro-Symbolic AI/ML for Complex Event Processing” shows intramural research that is starting to look into integrating symbolic and neural AI approaches. Looking into those approaches in NLP is an emerging area of importance in AI research, and ARL might want to explore how the laboratory’s expertise in reasoning under uncertainty, adversarial reasoning, and resource-limited reasoning might be integrated with the work in NLP being pursued separately. For example, an emerging area that might be of interest is data-driven causal inferencing which was the topic of several recent Defense Advanced Research Projects Agency programs among other Department of Defense research efforts.
AI has made huge advances in the past few years, but the intramural work in this competency runs the risk of being out of date by sticking to methods that are not being explored at the cutting edge—some of the performers are indeed cutting-edge researchers but, for example, in a couple of cases panel members knew of newer work, some of it performed by these researchers themselves, that were not reported to during the review as part of the Army program. This is more evidence that ARL researchers have a tremendous resource in each other, and management can facilitate more connections between them for greater cross-pollination of their terrific talents.
Finally, in a number of cases, NLP researchers within the projects are looking at interesting phenomena, such as the work on improving tokenization, and these are clearly early works that are helping the Army explore the emerging changes in language research wrought by LLMs. However, the particular goal of this work was said to be “improving explainability,” which is not a direct consequence of the specific work. In several other projects, not just in NL, it is clear the researchers were trying to tie their work to some high-end overarching theme (like explanation, digital twins, etc.) but the work itself was more narrowly focused.
Overall, the presented work in the act smarter research thrust area was clear and compelling. The work is on par with leading funding agency work. While several researchers exhibited that they had good knowledge of other research (e.g., some researchers provided references of related work in their presentations) this was not the case across the board and some of the presentations would be improved with specific content positioning the project’s research in state-of-the-art literature. The opportunities
___________________
20 Large language models are systems like ChatGPT, Bard, and other commercial systems based on semi-supervised learning over an extremely large corpus of text.
21 D. Cai, Y. Wang, L. Liu, and S. Shi, 2022 “Recent Advances in Retrieval-Augmented Text Generation,” Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3417–3419.
22 N. Ding et al., 2023, “Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models,” Nature Machine Intelligence 5(3):220–235.
23 Ibid.
identified for individual projects section toward the end of this chapter gives feedback on some of the presentations, which it is hoped, will bolster the research portfolio as a whole.
The overarching research questions identified within the project presentations are very good. It was exciting to see that most presenters had published their work in top venues. It is commendable that four of the five planned demos24 “worked” and that the teams had backup plans in place when glitches occurred. Discussion of the individual presentations that make up the broader portfolio is in the section below on “Opportunities Identified for Individual Projects” and in the general text below, when a finding about a project could be applied more broadly to the core competency or competency.
The ARL intramural research portfolio explored ML, multi-agent, and off-road navigation challenges. Given a future where increasingly autonomous robotic systems will be deployed, the ML-based capture of human preferences concerning decision-making is important, since autonomy will need to be trusted to make decisions, especially in circumstance where it has been tasked with competing objectives. The ML-focused projects are strong in their objectives and are applying sound methods to achieve these goals. Recommendations on improving robustness, causality analysis and explainability are described below. Multi-agent teaming and swarm projects investigated important aspects of distributed coordination and control, though the novelty of some projects was not clear given a lack of connection to the existing literature. Off-road navigation through unstructured unmapped environments is challenging, and ARL presented both posters and demos with promising results, as detailed below. Projects to improve geospatial maps and apply real-time Simultaneous Localization and Mapping (SLAM) offer complementary essential information for off-road autonomy.
The primary gaps in the presented ARL research portfolio appeared to be related to cases when state-of-the-art methods from the literature were not applied or considered, and in cross-project collaboration that does not yet occur. Resilience and robustness are important to most every effort, though only certain projects formally considered these metrics. Although scalable multi-agent teams and swarm methods were modeled in simulation, the presented two-robot demonstrations could only focus on specific physical or data-sharing interactions. Larger team sizes will be necessary to better understand real-world multi-robot team deployments. The writings below provide more commentary on these points.
There appear to be several opportunities in cross-project collaboration. For example, maps from the Geospatial Data Integration Server project should be utilized in Terrain-Aware Autonomous Ground Navigation to maximize situational awareness beyond sensor line-of-sight, minimize the need for real-time SLAM, and facilitate feedback of every deployed robot SLAM data set into the Geospatial Data Server. As another example, collaboration between Adaptive Planner Parameter Learning and real-world artificial intelligence of maneuver and mobility (AIMM) off-road autonomy trials will enable capture of human expert data from R2C2 experiments, which in turn will offer improved expert-informed planning autonomy for fielded robots. In general, efforts to better integrate theoretical, computational, and hardware-based projects, that may have different approaches and components, need to be identified and pursued when possible. Coordination is currently being done in discipline-specific silos and ARL is encouraged to integrate experts in different disciplines to learn, plan, and execute full multi-agent missions. There are three traditional disciplines within the act smarter project group: human-machine systems (e.g., human expert guided ML), multi-agent coordination (e.g., swarms), and experimental robotic platform development and field-testing. Creativity will be enhanced by bringing ARL personnel with diverse mindsets together, for example, human-guided planning and cooperative control mathematics experts.
___________________
24 ARL Presented five demos, which included (1) “AIMM Offroad Autonomy,” (2) “Autonomous Movement in Complex Terrain,” (3) “Resilient and Distributed Multi-Robot Visual SLAM KIMERA-MULTI,” (4) “Energy Redistribution for Air (UAS)—Ground (UGV) Teaming and Battery Management System,” and (5) “Geospatial Data Integration Server.”
Similarly, there will need to be more coordination of the talents between the researchers in the three research thrust areas to inspire more innovation. For example, the act smarter presentation on planning only focused on motion planning without consideration of the greater mission and its modeling and planning. This is because there is one “discipline silo” for mission task planning (in the think faster research thrust area) and another “discipline silo” for acting (the act faster thrust area). There are opportunities for mission and tactical (motion) planner integration that will be better realized through these groups working together. Similarly, there is unrealized collaboration potential with the “perception” team working on sensor data processing to “map” an environment (the “M” in SLAM).
Additionally, the demo “Geospatial Data Integration Server” illustrated how large-scale data mining is being used to build and evolve a variety of worldwide maps. This project appears to have a promising trajectory. Traditionally, published geospatial map updates were time-intensive and infrequent, for example, over a period of weeks, months, or years. With such slow updates, new or temporary structures will not be properly mapped. Furthermore, today’s maps of vertical structures such as power lines and communication towers are so inaccurate they cannot be used to assure collision avoidance in low-altitude rotorcraft or unmanned aerial system (UAS) flight. Capturing real-time data from numerous mobile agents in continuously updated geospatial maps will enable better confidence, quality, and accuracy of maps. In contested areas, real-time updates will be able to define the locations and movements of temporary structures as well as static objects. The novelty here is in “crowd sourcing” mobile agent data in a way that offers rapid updates along with confirmation the data are consistent between sources. It was surprising that this project appeared to have a small footprint in the ARL research portfolio.
In particular, it appears the geospatial data generated has not yet been integrated into any of the simulators of map products used by any other ARL project that were reviewed. Applied (6.2) research efforts could be making use of the best maps, and this project would benefit from ingesting as much real-time data as possible. While the quantity of data obtained from deployed Army assets will be much greater, the quality of data from experimental ARL robot platforms might be higher. This project clearly illustrates the importance of close collaboration across the ARL robotics research project portfolio.
ARL may also consider upscaling the multi-agent team sizes. Multi-agent team sizes can often be arbitrarily scaled in theory, and scaled to very large sizes in simulation. However, it is quite challenging to field large-scale robot teams in the wild due to cost and technical support overhead. Nonetheless, because the ARL act smarter research portfolio emphasizes multi-agent teams, it is critical to deploy teams capable of exposing any flaws in theoretical and simulation studies related to team size. Such experiments offer opportunities to identify and document planned and unanticipated interactions in real world testing.
To ensure user acceptance and build truly trustworthy decision-support systems and robotic systems, one requires additional functionality, including explanation capabilities. Autonomous systems are increasingly responsible for independent decision-making. Just as people often are asked to explain their decisions to other people so everyone understands why a decision is being made and the possible implications of this decision. Autonomous “agents” (robots, software packages) that make decisions and also need to be able to explain their decisions to other agents (human or robotic) for the same reasons. Explanations may be graphical, numerical, NL, or a combination. Explanation capabilities are currently understudied at ARL. Centralizing basic research on them could be supportive to individual subprojects. For example, a swarm should be able to explain how it is being deceptive with its motions, a planner should be able to explain how a new decision is guided by what an expert previously recommended, or a deployed vehicle that has been out of contact should be able to explain how and why it chose to go silent, and how and why it re-established contact.
Achieving robustness is an important scientific topic that needs to be emphasized even more across the research portfolio of ARL. The Army frequently exposes robotic platforms to high-risk and adversarial environments, and each robot must be able to succeed and even thrive despite failures and fog of war conditions. The principles of robust feedback control can help individual robots continue functioning despite systems failures. ML can improve robustness to changes in models or environmental
conditions. Multi-agent systems can be made robust with task re-allocation and recovery capabilities should robotic team members be damaged or lost. Diversity can also improve robustness. The panel saw a promising pathway toward this capability with coordinated missions involving combinations of air, ground, and legged robots. There is more to explore in heterogeneous large-scale multi-agent and human-robot team planning and coordination research.
___________________
25 K.W. Wong, R. Ehlers, and H. Kress-Gazit, 2018, “Resilient, Provably-Correct, and High-Level Robot Behaviors,” IEEE Transactions on Robotics 34(4):936–952.
26 K. Leahy et al., 2021, “Scalable and Robust Algorithms for Task-Based Coordination from High-Level Specifications (Scratches),” IEEE Transactions on Robotics 38(4):2516–2535.
27 J. Gregory, 2023, “AI-Enabled Experimental Design for Accelerating the Development of Trustworthy Autonomous Systems,” DEVCOM ARL, September 6.
experimentation) is good. Providing a tool for human decision support is a good goal, since this is a domain where keeping the human in charge makes sense. Ongoing work studies systems that can alert users to wrong or suboptimal human choices about hardware parts, software modules, or their parameters, and thus decisions that are each either binary or can be expressed by a single number—relatively simple choices. The correlation between AI-enabled experimental design, trust, and explainability could be clearer. It appears that causal reasoning28 would be an appropriate approach for this research, especially given the relatively small number of examples that the system will presumably be learning from. The checklist input from human users may not be sufficient to achieve the goals of this research, since the information available from the checklist is quite limited. Additionally, explanation capabilities could be helpful to this study. Overall, this research is on an interesting and often-neglected topic that promises to avoid human system configuration mistakes and thus result in better-performing systems. The initial results are promising.
___________________
28 The following reference presents examples of developing graphical causal models given small data sets: J. Pearl, 2009, Causality, Cambridge University Press. This book describes causal reasoning techniques and provides an excellent foundation for the field. S. Mueller, A. Li, and J. Pearl, 2021, “Causes of Effects: Learning Individual Responses from Population Data,” arXiv (preprint), arXiv 2104.13730.
sophisticated methods were presented. Future research could, for example, study how to formulate the problem as an optimization problem, how to reason explicitly about the tradeoff between the movement time and the signal strength, and how to determine a series of locations not constrained to the past trajectory of the robot (which also requires predicting the signal strength at unvisited locations). ARL might want to pair applied 6.2 projects, like this one, with complementary 6.1 projects that can focus on research and development of more sophisticated techniques for the applications studied in the 6.2 projects.
___________________
29 M.H. Raibert and H.B. Brown, Jr., 1984, “Experiments in Balance with a 2D One-Legged Hopping Machine,” Journal of Dynamic. Systems, Measurement, and Control 106(1):75–81.
capabilities: (1) well-modeled smoke generation and (2) optimization of where and when smoke should be generated. The team is initially focusing on real-world tests with smoke generation to understand, model, and iteratively improve on their smoke generation capability. First data-collection steps have been completed successfully, but the underlying optimization problem has not been tackled yet. The team’s optimization tool will employ increasingly refined models as they are developed, which demonstrates a nice co-development of hardware and software. This applied research project and feasibility study shows promise.
The look farther research thrust team, staff and collaborations with the extramural researchers showed strong synergies that were producing relevant robust data and strong integration of research outcomes into applications. Some of the results of these collaborations produced integrated demos that showed mission and path planning and navigation through complex wooded terrain. The project teams were constructed to complement expertise and were monitored/assessed appropriately to ensure that the outcomes would be utilized. Many times organizations send funding to extramural teams and there is no true collaboration or outcomes. This was not the case with the extramural work, and it was very clear that there was excellent oversight and long-term relationships were being developed with every team. There was also evidence of strong personnel growth opportunities and mentoring. For example, the presentation, “Heterogeneous LiDar Sensing” exhibited good mentorship of students from a Historically Black Colleges and Universities. Further mentoring of early-career professionals would help to advance the organization and provide continuity.
Some of the findings of the ARL research that looks at enhancing human performance within the projects could be applied back to educational trainings that synthesizes and leverages the knowledge gained from these project findings. There are significant broader impacts this work could have to the educational community to help create best practices in learning for training, information retention or potentially even for helping differently abled individuals. For example, many projects are looking at how humans interact with robots and at understanding human decision-making, and projects such as “Dialogue Abstract Meaning Representations for Conversational Artificial Intelligence,” capture how humans communicate instructions to robots to create more robust, streamlined human/robot cooperation. As more data are collected from a human across a variety of scenarios using virtual reality in simulators, that data can be used to help created educational trainings for personnel in real world operational situations or to help provide better training materials that are conducive to human information retention and
understanding. To this end, ARL may find the reference in the footnotes below to be useful.3030 Another possibility would be to explore the code-generation capabilities of foundational models, although that would still require integrating the generated code (which may be high level) with the robotic operating system (low level), so research in this area could be promising,
Many of the project teams could benefit from having a cognitive science expert. While this was not strongly represented in the portfolio, it would be a critical aspect of modeling and understanding human factors in much of the research. Another opportunity to deepen the research portfolio would be to add human-computer/robot interaction researchers that investigate how best to use humans-in-the-loop for data augmentation, learning, model improvement, and new architecture development.
An opportunity beyond publications include recognitions from external organizations such IEEE awards31 and Association for the Advancement of Artificial Intelligence awards.32 Much of the work by the teams is leading edge and needs to be recognized by the professional community for their contributions. Many researchers seek to collaborate with the leaders in the field and these recognitions could help connect researchers to the projects for new collaborations. Integrating personnel with these organizations beyond publishing will be important. For instance, participation in the relevant groups within these organizations can be advantageous. For AI and robotics, it could be the IEEE Systems, Man and Cybernetics Society or the Robotics and Automation Society. For AI and robotics, it could be the IEEE Systems Council,33 Systems awards, Systems, Man and Cybernetics Society awards34 or the Robotics and Automation Society.35 ARL could also consider nominating its researchers to become senior members and fellows of relevant scientific societies, as appropriate. Reviewing other people’s scholarly works as journal reviewers, and participation in initiatives, such as serving as mentors in student contests would connect researchers to upcoming ideas and state-of-the-art work coming out before works are even published. These professional interactions and service as mentors/reviewers are be encouraged.
The team that ARL has assembled in this think faster research thrust area is impressive, including MURIs that fund many of the top scientists in the field and internal researchers many of whom are either well known in their areas or are “up and coming” researchers. Within the ARL, the work related to this competency was clearly laid out in the presentations and demos.
The NLP understanding group has some outstanding researchers including the presenter on “Dialogue Abstract Meaning Representations for Conversational Artificial Intelligence,” who was one of the developers of Abstract Meaning Representation (AMR) parsing, a widely used technique in NLP, and other researchers who clearly profit from the strong ARL scientific leadership. The work in AMR was being used for directing robots, but not directly in the bidirectional NLP work. The latter provides an essential future capability for robots deployed in novel environments where there may not be a lot of training data. While it is not consistent with the AMR work, this is an example of where closer cooperation between internal teams might be useful.
Additionally, as previously mentioned, the poster “Neuro-Symbolic AI/ML for complex event processing” shows intramural research that is starting to look into integrating symbolic and neural AI approaches. Looking into those approaches in NLP is an emerging area of importance in AI research, and
___________________
30 A. Lekova, P. Tsvetkova, and A. Andreeva, 2023, “System Software Architecture for Enhancing Human-Robot Interaction by Conversational AI,” International Conference on Information Technologies (InfoTech), https://doi.org/10.1109/InfoTech58664.2023.10266870; D. Pham et al., 2022, “A Case Study of Human-AI Interactions Using Transparent AI-Driven Autonomous Systems for Improved Human-AI Trust Factors,” IEEE 3rd International Conference on Human-Machine Systems (ICHMS), https://doi.org/10.1109/ICHMS56717.2022.9980662.
31 IEEE Awards, IEEE.org/awards.
32 AAAI, “About,” https://aaai.org/about-aaai/aaai-awards, accessed February 4, 2023.
33 IEEE Systems Council, https://ieeesystemscouncil.org/awards, accessed February 4, 2023.
34 IEEE Systems, Man, and Cybernetics Society, https://www.ieeesmc.org/about-smc/awards, accessed February 4, 2023.
35 IEEE Robotics & Automation Society, https://www.ieee-ras.org/awards-recognition, accessed February 4, 2023.
ARL might want to explore how the laboratory’s expertise in reasoning under uncertainty, adversarial reasoning, and resource-limited reasoning might be integrated with the work in NLP that is being pursued separately.
The ARL researchers supporting the act smarter research thrust appear to be well qualified to perform the research they are undertaking. Demos were managed capably, and success in four out of five research-class experiments is commendable. ARL is involving undergraduate students in some of its research (e.g., the poster “Information Recovery from Robotic Systems Under Communication Constraints”), which is commendable. ARL researchers often published their work and attended conferences exposing them to related work. However, ARL research presentations did not always connect their projects to the related literature. Fundamental projects (6.1) need to clearly describe their novelty, and applied projects (6.2) need to clearly define specific methods and models being transitioned into simulation and field tests. It is difficult to ascertain expertise in a presenter when related work and technical details of their approach are scarce. The principal investigator who presented the only “Act Smarter” extramural talk, demonstrated excellent knowledge of her program’s research thrusts, and research portfolio. Extramural posters were less detailed, in part because presented MURI efforts seemed to be in a more nascent stage.
For the look farther research thrust area the robot testing facilities appears to be world class; however, the capacity to which the facility was being utilized was not clear. For instance, there were many perception algorithms and other projects presented requiring data acquisition, training and then testing in a “real environment.” For each of the works presented, the reviewers could not tell whether the R2C2 training facility played any role in any aspect of the projects presented. The impact and utilization of the facility could be included for all future project presentations.
One of the most valuable outcomes presented to the review team was the use of the facility to create a public data set from information acquired by utilizing the R2C2 facility. This work needs be commended because gathering such robust data sets for real world data in outdoor environments is cost prohibitive for external entities and is quite limited in the amount of data that can be collected. This is a fundamental necessity that is required for advancing any artificial intelligence system training and testing.
The competency seems well resourced in terms of computational power and access to simulation to support the think faster research area. It was less clear whether access to the actual robots was available and whether the interaction of this area and the overall robotics work was realistic. For example, some of the NLP demonstrations included interacting with simulated robots that were not extremely realistic. This was useful for the particular linguistic work they were demonstrating, but would not lead to realistic human–robot interaction in the short term, unless more work was done interacting with the detailed simulators and/or actual robots that the Army has available in its research facilities.
For the act smarter research thrust, the R2C2 facility was toured during the review. An urban mockup, paved areas, and research trailers and buildings were toured. ARL has an appropriate set of natural open and forested environments. R2C2 offers appropriate test environments for ARL robots. Trailers offered control station environments similar to what Army personnel might find in a deployed operation. Buildings offered ARL researchers office, laboratory, and storage space appropriate for the robotics platforms and projects the panel reviewed. ARL demonstrations successfully utilized UGV, UAS, and quadruped robots in simulated missions. These robots were well chosen and appropriate for each assigned task. Control stations were well equipped, and large monitors were available for onsite briefings.
Graces Quarters is an exciting facility for robotics research. Other robotics sites across the country have high bays, or military operations in urban terrain (MOUT) scenarios, or wooded terrain, or littoral settings, or open areas for UAV operation. Few sites have all of those together in one location.
Some of the researchers asked for more convenient access to machine shops: this is an important asset for rapid prototyping of experimental vehicles.
It was noted that most robots were off-the-shelf, suggesting a need for more manufacturing capability to support customization of hardware. While the robot chassis used at ARL are mostly off-the-shelf, it is apparent that significant effort is required to mount sensors and electronics. Particularly for those robots operating in rugged outdoor environments that require cooling and shock mounting, there are few commercial solutions. Additionally, some of the ARL researchers asked for more convenient access to machine shops: this is an important asset for rapid prototyping of experimental vehicles. Such machine shops could be used to hang sensors, build pan/tile mounts, add additional shock enclosures for electronics, as well as additive manufacturing facilities and electronics fabrication.
In review of all three research thrusts, few if any of the researchers indicated the need for more computing power—which is surprising, given the demands of processing for big data. It will be particularly important to consider low-power specialized computing, as larger data sets need to be processed, both in real time and as preprocessed large models.
No organization, no matter the size, can cover all areas of military information science. The field is too large, and commercial developments (such as large language models) change the frontier of what domains are research versus which software is available as a commodity. Within those constraints, both extramural and intramural ARL managers, have done a commendable job of sampling the areas where they can best contribute, such as robotics systems, planning in dynamic domains, perception in the natural outdoor environment, and interfaces between NLP and intelligent systems.
The science in the look farther research thrust area is on par with leading universities. Both the intramural and extramural scientific questions being addressed, such as parameter model explosion, learning in the presence of untrustworthy data, and integrating multi-modal data in relation to human understanding of language were on target with current science and had a good identification of problems and challenges. The team, staff and collaborations with the extramural researchers showed strong synergies that were producing relevant robust data and strong integration of research outcomes into applications. There was also evidence of strong personnel growth opportunities and mentoring.
The researchers were consistent in showing state-of-the-art methods.36 Still, there is an opportunity for novel model development, transfer learning, and quaternion architectures, especially where there was a trend around model complexity and hyper-parameter optimization.
One major risk to this portfolio is the duplication and independent collection and usage of data by different ARL teams. Some are doing data scraping and some are constructing experiments to create custom sets. This is a labor-intensive process and it could risk progress since there are different standards and annotation requirements being utilized. Training researchers on scoping and writing requirements for data would be helpful Achieving data fusion is important and cuts across all research.
Similarly, the think faster research thrust is doing relatively high quality work—in some cases, especially the extramural work, it includes leading advances at par with top funding agencies and research institutions, in others, especially with the intramural work, it showed a narrower focus on applications that, while not always groundbreaking, clearly relate to ARL priorities. The researchers are largely aware of research trends in the broader scientific community, although some opportunities to bolster this awareness are listed in the chapter. Additionally, ARL is funding top extramural people and universities. The researchers are using sound research methods and methodologies and there are no risks to the overall portfolio.
___________________
36 State-of-the-art methods refers to the use most innovative and best performing techniques and technologies in ML.
Overall, the presented work was clear, compelling, and at par with leading funding agencies nationally and internationally. The overarching research questions identified within the project presentations are very good, and represent tough challenges. Most presenters had published their work in top venues. ARL is making good choices in which extramural projects to emphasize and which to de-emphasize moving forward. The ARL researchers supporting the Act Smarter research thrust appear to be well qualified to perform the research they are undertaking. Demos were managed capably, and success in four out of five research-class experiments is commendable. R2C2 offers appropriate test environments for ARL robots, however, opportunities to improve these resources are listed in the chapter and below. ARL demonstrations successfully utilized UGV, UAS, and quadruped robots in simulated missions. These robots were well chosen and appropriate for each assigned task. Control stations were well equipped, and large monitors were available for onsite briefings.
ARL may consider upscaling the multi-agent team sizes. Since the ARL Act Smarter research thrust emphasizes multi-agent teams, it is critical to deploy teams capable of exposing any flaws in theoretical and simulation studies related to team size. Such experiments offer opportunities to identify and document planned and unanticipated interactions in real world testing. Adding resources to purchase, maintain, and deploy larger quantities of robots for multi-agent experiments are suggested. The team could also address challenges in scaling up user interfaces (number of people, number of control stations) to support multi-agent testing.
Opportunities that cut across these three research thrusts include: