• Overview of the SpiNNaker system architecture

      Furber, Steve B.; Lester, David R.; Plana, Luis A.; Garside, Jim D.; Painkras, Eustace; Temple, Steve; Brown, Andrew D. (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2012)
      SpiNNaker (a contraction of Spiking Neural Network Architecture) is a million-core computing engine whose flagship goal is to be able to simulate the behaviour of aggregates of up to a billion neurons in real time. It consists of an array of ARM9 cores, communicating via packets carried by a custom interconnect fabric. The packets are small (40 or 72 bits), and their transmission is brokered entirely by hardware, giving the overall engine an extremely high bisection bandwidth of over 5 billion packets/s. Three of the principle axioms of parallel machine design -- memory coherence, synchronicity and determinism -- have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations. A further attribute of the system is the acknowledgement, from the initial design stages, that the sheer size of the implementation will make component failures an inevitable aspect of day-to-day operation, and fault detection and recovery mechanisms have been built into the system at many levels of abstraction. This paper describes the architecture of the machine and outlines the underlying design philosophy; software and applications are to be described in detail elsewhere, and only introduced in passing here as necessary to illuminate the description
    • Scalable communications for a million-core neural processing architecture

      Patterson, Cameron; Garside, Jim D.; Painkras, Eustace; Temple, Steve; Plana, Luis A.; Navaridas, Javier; Sharp, Thomas; Furber, Steve B. (Elsevier, 2012)
      The design of a new high-performance computing platform to model biological neural networks requires scalable, layered communications in both hardware and software. SpiNNaker's hardware is based upon Multi-Processor System-on-Chips (MPSoCs) with flexible, power-efficient, custom communication between processors and chips. The architecture scales from a single 18-processor chip to over 1 million processors and to simulations of billion-neuron, trillion-synapse models, with tens of trillions of neural spike-event packets conveyed each second. The communication networks and overlying protocols are key to the successful operation of the SpiNNaker architecture, designed together to maximise performance and minimise the power demands of the platform. SpiNNaker is a work in progress, having recently reached a major milestone with the delivery of the first MPSoCs. This paper presents the architectural justification, which is now supported by preliminary measured results of silicon performance, indicating that it is indeed scalable to a million-plus processor system.
    • SpiNNaker: a 1-W 18-core system-on-chip for massively-parallel neural network simulation

      Painkras, Eustace; Plana, Luis A.; Garside, Jim D.; Temple, Steve; Galluppi, Francesco; Patterson, Cameron; Lester, David R.; Brown, Andrew D.; Furber, Steve B.; University of Manchester (IEEE, 2013-08)
      The modelling of large systems of spiking neurons is computationally very demanding in terms of processing power and communication. SpiNNaker - Spiking Neural Network architecture - is a massively parallel computer system designed to provide a cost-effective and flexible simulator for neuroscience experiments. It can model up to a billion neurons and a trillion synapses in biological real time. The basic building block is the SpiNNaker Chip Multiprocessor (CMP), which is a custom-designed globally asynchronous locally synchronous (GALS) system with 18 ARM968 processor nodes residing in synchronous islands, surrounded by a lightweight, packet-switched asynchronous communications infrastructure. In this paper, we review the design requirements for its very demanding target application, the SpiNNaker micro-architecture and its implementation issues. We also evaluate the SpiNNaker CMP, which contains 100 million transistors in a 102-mm2 die, provides a peak performance of 3.96 GIPS, and has a peak power consumption of 1 W when all processor cores operate at the nominal frequency of 180 MHz. SpiNNaker chips are fully operational and meet their power and performance requirements.
    • SpiNNaker: a multi-core System-on-Chip for massively-parallel neural net simulation

      Painkras, Eustace; Plana, Luis A.; Garside, Jim D.; Temple, Steve; Davidson, Simon; Pepper, Jeffrey; Clark, David; Patterson, Cameron; Furber, Steve B. (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2012)
      The modelling of large systems of spiking neurons is computationally very demanding in terms of processing power and communication. SpiNNaker is a massively-parallel computer system designed to model up to a billion spiking neurons in real time. The basic block of the machine is the SpiNNaker multicore System-on-Chip, a Globally Asynchronous Locally Synchronous (GALS) system with 18 ARM968 processor nodes residing in synchronous islands, surrounded by a light-weight, packet-switched asynchronous communications infrastructure. The MPSoC contains 100 million transistors in a 102 mm2 die, provides a peak performance of 3.96 GIPS and has a power consumption of 1W at 1.2V when all processor cores operate at nominal frequency. SpiNNaker chips were delivered in May 2011, were fully operational, and met power and performance requirements.
    • SpiNNaker: design and implementation of a GALS multicore system-on-chip

      Plana, Luis A.; Clark, David; Davidson, Simon; Furber, Steve B.; Garside, Jim D.; Painkras, Eustace; Pepper, Jeffrey; Temple, Steve; Bainbridge, John (ACM, 2011)
      The design and implementation of Globally Asynchronous Locally Synchronous Systems-on-Chip is a challenging activity. The large size and complexity of the systems require the use of Computer-Aided Design (CAD) tools but, unfortunately, most tools do not work adequately with asynchronous circuits. This paper describes the successful design and implementation of SpiNNaker, a GALS multi-core system-on-chip. The processes was completed using commercial CAD tools from synthesis to layout. A hierarchical methodology was devised to deal with the asynchronous sections of the system, encapsulating and validating timing assumptions at each level. The crossbar topology combined with a pipelined asynchronous fabric implementation allows the on-chip network to meet the stringent requirements of the system. The implementation methodology constrains the design in a way which allows the tools to complete their tasks successfully. A first test chip, with reduced resources and complexity was taped-out using the proposed methodology. Test chips were received in December 2009 and were fully functional. The methodology had to be modified to cope with the increased complexity of the SpiNNaker SoC. SpiNNaker chips were delivered in May 2011 and were also fully operational, and the interconnect requirements were met.