Brain-Inspired Computing Research: DARPA HIVE Project for Developing Next-Generation AI Platforms

Selected for HIVE Рroject Collaborations are Following: Intel Corporation (Santa Clara, California), Qualcomm Intelligent Solutions (San Diego, California), Pacific Northwest National Laboratory (Richland, Washington), Georgia Tech (Atlanta, Georgia), and Northrop Grumman (Falls Church, Virginia). “The HIVE program is an exemplary prototype to forge new R&D pathways that can deliver unprecedented levels of hardware specialization and to build the software infrastructure. The new software must not only be compatible with the new hardware, but also with the existing CPU and GPU. In addition, the new software must support a large amount of existing software used in the data science world. Part of the goal is to be able to connect existing software and libraries to the HIVE software framework to make it easier to use.


Introduction
As described by DARPA, main HIVE goal is creation of a "graph analytics processor" which incorporates the power of graphical representations of relationships in a network more efficiently than traditional data formats and processing techniques according to DARPA. In combination with emerging machine learning and other artificial intelligence techniques that can categorize raw data elements. By updating the elements in the graph as new data becomes available, a powerful graph analytics processor could discern otherwise hidden causal relationships among the data elements in the graph representations [1]. DARPA suggests such a graph analytics processor might achieve a "thousand-fold improvement" in processing efficiency over today's best processors, enabling the real-time identification of strategically important relationships as they unfold in the field rather than relying on after-the-fact analyses in data centers. Current software includes algorithms exposed via API, internal graph representation of data and hardware "backends" (GPU, CPU, ASIC). Chinese specialist

Basic Targets
In Russia we also promote proactive development of domestic massively parallel processors. Objectives of the project also include information and analytical work and the development of technical solutions for creating high-speed element and design base. The basic component of the processor is a tile formed by a 64-thread core, connected by specialized accelerators (SFU). The massively parallel processor must include connected by an on-chip network hundred tiles, several links of an on-chip interaction, PCI-e interface with the host processor. The ideology of massively parallel architecture is similar to processor "Colossus" (company Graphcore) [4], focused on machine learning tasks. But the domestic massively parallel systems are hybrid and reconfigurable platforms. Now we will SoC hardware, but this does not seem to be the focus of this project.
According to the description of this project, PE is a concept with a relatively large granularity, such as CPU, GPU, TPU, Neuromorphic Unit, DSP and hardware accelerators.

Testing Results
In the field of dedicated processor and accelerator design the performance competition is fierce. It has even become an international-level scoring contest. Usually, international high-level publications (such as ISSCC) publish the latest "scoring results", such as the performance comparison result of a CNN acceleration chip shown in Figure 1  When we analyze chip architecture, we mainly need to consider the following indicators, as shown in Figure 2

Comprehensive Results (Light Blue Part)
In the preliminary evaluation or sub-assessment, some comprehensive key indicators given by the comprehensive tool.
These indicators are based on netlists and therefore do not include connections and can only be used for initial chip cost evaluation.
Including: combinatorial logic resource cost, non-combination logic resource cost; device cost (such as RAM), combinatorial logic cost can also be measured by the number of equivalent gates. The difference is that the combinatorial logic overhead is related to the process library, while the number of equivalent gates is independent of the process library.

Memory Related (Dark Blue Part)
Memory is often represented as a black box. The number of memory chips, the capacity, depth and width of each memory chip. Whether it is dual-port or single-port (the size of dual-port is about twice as large as single-port). For small-scale memory less than 1KB or memory with many read and write ports, it is often implemented by register file size. When a smaller memory is needed or a storage device with a complex read-write interface is required, the register array is generally used to build it directly.
The difference between the realization of the register array and the register file is that the register file often needs to be realized by a dedicated compiler, while the register array is directly realized by the hardware description language. The area of the register file will be much smaller than the equivalent function of the register array.

Interface and Bus Related (Grey Part)
The first is whether the types of external interfaces provided by the chip meet the requirements, such as high-speed SER/DES interface, USB interface, DDR interface etc. The second is the data transmission bandwidth of the interface and the electrical specification of the interface. The bus is located inside the chip, and the bus type (for example, 2D-mesh network on chip, AMBA), bus transmission mode (packet switching, circuit switching) and transmission performance are mainly concerned.

Process Node (Cyan Part)
Including the process library used, the number of wiring layers, operating voltage, low power consumption or high-performance process and etc.

Conclusion
Similarly, integrating a new user API only needs to add an interface to one of the hardware and use at least one algorithm.
Ultimately, the overall goal of HIVE is to unify and simplify the process of "optimizing the communication between graph software and hardware". Reconfigurable hardware often needs to support multiple operating modes or operating parameters. Configurability and programmability are ultimately the need to add some redundancy to the original dedicated circuit to improve flexibility.
But the efficiency of "ordinary" processors is not consistent.
With proper ASIC design we can demonstrate good indicators for a limited class of "hard" logic algorithms. For a successful "breakthrough in the field of microelectronics" it is necessary to start developing a mass-parallel processor based on of multithread cores with specialized accelerators.