

Discover more from H+ Weekly
In-Memory Computing and Analog Chips for AI
As modern AI models push digital computers to their limits, engineers look at ways to improve AI computations in terms of speed and efficiency. One of the paths is to go back to the roots of computing
Modern AI models are huge. The number of their parameters is measured in billions. All those parameters need to be stored somewhere and that takes a lot of memory.
For instance, the open-source LLaMa model, which consists of 65 billion parameters, demands 120GB of RAM. Even after converting the original 16-bit numbers to 4-bit numbers for weight representation, it still requires 38.5GB of memory. Keep in mind that these figures are for a model with 65 billion parameters. GPT-3, with its 176 billion parameters, requires even more memory. Nvidia's top consumer GPU, the RTX 4090, has 24GB of memory, while their data centre-focused product, the H100, offers 80GB of memory. It's worth noting that the memory requirements for training AI models are typically several times larger than the number of parameters. This is because training involves storing intermediate activations, which typically adds 3-4 times more memory than the number of parameters (excluding embeddings).
Due to their size, large neural networks cannot fit into the local memory of CPUs or GPUs, and need to be transferred from external memory such as RAM. However, moving such vast amounts of data between memory and processors pushes current computer architectures to their limits.
The Memory Wall
The digital computers we use today, ranging from smartphones in our pockets to powerful supercomputers, follow the von Neumann architecture. In this architecture, memory (which holds the data and the program to process that data), I/O and processing units (like CPUs or GPUs) are separate units connected with a bus to exchange information between them.
This modular and flexible architecture has served us well for nearly 80 years and has played a crucial role in advancing computing technology to its current state.

However, the biggest strength of von Neumann's architecture - the separation of components - is also the source of performance bottlenecks. All those components need to be able to exchange information between themselves. As long as the memory speed matches the processor speed, there is no problem. The memory can feed the processor without any issues.
But that’s not what has been happening. The processor's speed has increased at a much faster rate than the memory speed. Over the past two decades, computing power has grown by a factor of 90,000, while memory speed has only increased by a factor of 30. In other words, memory struggles to keep up with feeding data to the processor.
This problem is commonly referred to as the "Memory Wall" and presents a significant challenge for semiconductor and machine learning engineers.
This growing chasm between memory and processor performance is costing time and energy. To illustrate the magnitude of this issue, consider the task of adding two 32-bit numbers retrieved from memory. The processor requires less than 1 pJ of energy to perform this operation. However, fetching those numbers from memory into the processor consumes 2-3 nJ of energy. In terms of energy expenditure, accessing memory is 1000 times more costly than computation.
Memory Wall hits particularly hard for data-heavy applications such as machine learning. Large amounts of data need to be moved between memory, CPUs, GPUs and other accelerators to train and run massive neural networks. Today’s machine learning models are so big that they cannot fit in the CPU’s cache memory or GPU’s VRAM. Only a small portion can be loaded into local memory at a time and constantly swapped, which adds to large energy and time costs.
Semiconductor engineers are actively working to minimize memory access time and distance. Initially, cache memory was introduced alongside the CPU, with the addition of L1, L2, and L3 caches. More recently, AMD introduced 3D V-Cache technology, which places additional memory on top of the CPU.
Another approach involves physically bringing processing units and memory closer together. Apple, for instance, implemented Unified Memory in their Apple Silicon chips. In this design, the entire system memory (RAM) is placed on the same package as the SoC (System on a Chip). This close integration between memory and computing units enables faster access to memory and ultimately improves overall performance.
This is what engineers are doing to improve the performance of digital computers. But digital computers are not the only computers available.
Enter analog computers
Digital computing operates on bits - discrete values that can only be either 0 or 1. In contrast, analog computers use continuous physical processes and variables, such as electrical current or voltage, for calculations. We will be talking about electronic analog computers in this article but analog computers can be built using mechanical devices or fluid systems.
Analog computers played a significant role in early scientific research and engineering by solving complex mathematical equations and simulating physical systems. They excelled at tackling mathematical problems involving continuous functions like differential equations, integrations, and optimizations.
Even the first artificial neuron, the perceptron, was first designed to use analog computing.

But analog computers come with some drawbacks.
The biggest problem with analog computers is that they are not precise. Since they operate using physical phenomena, noise and signal degradation creep in. Combined with inaccuracies inherent in physical components, it makes it challenging to achieve high levels of precision. This also affects reproducibility - running the same program on an analog computer might give two slightly different results.
Another problem is that analog computers are often purpose-built for a specific task. If you have an analog computer that can solve one specific equation and you want it to solve another equation, you have to reconfigure its physical components. Essentially, building a new computer was required for each unique problem.
Digital computers did not have those problems. They are precise and their results are reproducible. Reprogramming them is a matter of providing new software, not rebuilding the hardware. They are also better at scaling and cheaper to maintain compared to analog computers. Storing data is also easier done with digital computers.
These factors caused analog computers to fade away and digital computers took over in the 1950s to 1970s. Today, every computer we interact with on a daily basis is a digital computer.
How analog computers can run neural networks
Recently researchers and engineers started to explore again the possibility of analog computers to run neural networks as a solution for machine learning’s energy and computation problems.
All modern machine learning algorithms, ranging from image recognition to large language models like transformers, heavily rely on vector and matrix operations. These complex operations ultimately boil down to a series of additions and multiplications.
Those two operations, addition and multiplication, can be easily performed on an analog computer. We can use Kirchoff’s First Law to add numbers. It is as simple as knowing the currents in two wires and measuring the current when we connect both wires. Multiplication is similarly straightforward. By employing Ohm's Law, we can measure the current passing through a resistor with a known resistance value.
Now, let’s go back to matrix operations and see how we can build an analog circuit to calculate them. First, we map the weights of matrix A to conductance and map the input vector to voltages. Then we measure the currents coming out from the grid and this is our output vector.
It is a straightforward and elegant solution. The computations are happening in place - there is no distinction between memory and processing unit (hence the name of this technique - in-memory computing, compute-in-memory, computational-RAM, at-memory computing and others). There is no energy required to move the data to and from memory. The computation happens very quickly, in parallel and uses fewer components. The energy savings are massive - analog chips promise to be 100x more energy efficient than their digital equivalents.
In the example above, we used resistors to store the weights. In real-life, more sophisticated components are used to store the weights, such as phase-change memory, resistive RAM, flash memory or electrochemical RAM.
Applications
Analog AI chips offer the same precision as digital computers when running neural networks but at significantly lower energy consumption. The devices can also be simpler and smaller.
Those characteristics make analog AI chips perfect for edge devices, such as smart speakers, security cameras, phones or industrial applications. On the edge, it is often unnecessary or even undesirable to have a large computer for processing voice commands or performing image recognition. Sending data to the cloud may not be applicable due to privacy concerns, network latency, or other reasons. On the edge, the smaller and more efficient the device is, the better.
Analog AI chips can also be used in AI accelerators to speed up all those matrix operations used in machine learning.
The first company to introduce analog AI chips and accelerators was Mythic. In 2021, they launched the Mythic Analog Matrix Processor, available both as a standalone chip and an M.2 key for integration into existing computers. In late 2022, reports emerged regarding Mythic facing financial difficulties. However, in March 2023, the company announced a new round of investment totalling $13M along with a change in CEO. Mythic has raised a total of $164.7M thus far.
Other companies in this field include Untether AI, which offers AI chips and PCIe accelerators and has raised $152M. Axelera is another player, promising a range of products including M.2 AI accelerators, PCIe cards, and dedicated AI servers. Axelera has secured $68.7M in funding.
Drawbacks of analog AI chips
Analog chips are not perfect. Designers of these chips must consider the challenges of digital computers and also address the unique difficulties presented by the analog world.
Analog AI chips are well-suited for inference but are not ideal for training AI models. The parameters of a neural network are adjusted using a backpropagation algorithm and that algorithm requires the precision of a digital computer. The digital computer will provide the data, while the analog chip will handle calculations and manage the conversion between digital and analog signals.
In the examples we have explored so far, we focused on multiplying input values with a single matrix. However, modern neural networks are deep, consisting of multiple layers represented by different matrices. Implementing deep neural networks in analog chips poses a significant engineering challenge. One approach is to connect multiple chips to represent different layers, requiring efficient analog-to-digital conversion and some level of parallel digital computation between the chips.
And then there is the need to have new software libraries and tools to fully utilise these chips. Currently, companies offering analog AI chips provide their own extensions to popular machine learning frameworks like TensorFlow or PyTorch, or provide their own software development kits (SDKs). When you use TensorFlow or PyTorch, you can be hardware agnostic. But when you use one of those new analog AI accelerators, you are tying your application to that specific chip.
Analog AI chips and accelerators present an exciting path to improve AI computations in terms of speed and efficiency. They hold the potential to enable powerful machine learning models on smaller edge devices while improving data centre efficiency for inference. However, there are still engineering challenges that need to be addressed for the widespread adoption of these chips. But if successful, we could see a future where a model with the size and capabilities of GPT-3 can fit on a single small chip.
H+ Weekly sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.
A big thank you to my paid subscribers and to my Patreons: whmr, Florian, dux, Eric and Andrew. Thank you for the support!