Continuing improvements in semiconductor density are enabling new classes of System-on-a-Chip architectures that combine extensive processing logic and high-density memory. By 2009, the International Technology Roadmap for Semiconductors [1] predicts that a single high-end microprocessor die will contain approximately 84 million logic transistors. Dynamic Random Access Memory (DRAM) density is increasing at an even faster pace, with 2Gbyte DRAM chips expected in the same timeframe. Large-scale problems may be handled on-chip using the increased memory and logic density, but new architectures must be developed which can effectively exploit these tremendous on-chip resources.
Computing
in Memory Architectures (CIMA) offer an SOC strategy that can effectively
accelerate many high-performance computing and real-time image/signal/data
processing problems.
Our research group has studied CIMA architectures [2],
and has identified several key applications that can benefit from CIMA
acceleration.
We
are developing a CIMA hardware prototype using FPGA and DRAM to create a
reconfigurable (Smart)
module that is inserted into the high-bandwidth PC-100 (DIMM)
memory module slot of a conventional PC (a SmartDIMM.
Since our design is mapped directly onto the frontside memory bus
on a standard PC, we gain several advantages over existing FPGA-based
accelerator cards.
The most important of these is reduced latency and increased bandwidth to
and from the CPU.
Our SmartDIMM
design provides higher available bandwidth to Main Memory (800MB/sec)
than any existing FPGA-based accelerator, can act as traditional memory when not
in use as Smart Memory, and offers extremely large memory resources (up to
16MByte per chip).
By adopting the PC-100 bus standard, the SmartDIMM design can target the
traditional desktop PC market, as well as the interface-compatible SODIMM
standard for portable/miniaturized applications.
Furthermore, we can support the next-generation DDR (Double Data Rate)
SDRAM standard.
The extensive logic resources of a Virtex FPGA can accommodate
large-scale application solutions; with partial reconfiguration of the FPGA the
effective problem solution space can be further expanded.
The proposed
SmartDIMM design integrates a large amount of main memory (64MB), and a large
FPGA onto a DIMM form-factor PCB that is designed as a completely
backwards-compatible replacement for a standard desktop PC-100 memory module.
In order to provide for simultaneous CPU and FPGA access to memory, we
propose a dual-bank memory arrangement where each device has direct access to
one bank at a time (and the other device is correspondingly locked out during
this time).
In spite of tight design timing limitations, this approach achieves the highest-bandwidth access to its local block of main memory (800MB/sec), for the largest potential performance increase. The largest drawback to this design is the lack of any facility for signaling the host: all host accesses to the memory card must be facilitated in real time with no wait states, and any signals to the CPU can only be accomplished by setting a semaphore and waiting for the CPU to poll.
Take
this link to see several application
examples which we are using as the baseline "target" for
the SmartDIMM design. These applications are: image
processing, java acceleration,
and advanced network interface.
Take this link for a bandwidth comparison between the SmartDIMM design, and other traditional FPGA/Memory accelerator architectures which use the PCI Bus.
SmartDIMM Key Features:
+
Highest bandwidth to Main Memory (800MB/sec)
+
Acts as traditional memory when not in use as Smart Memory
+
Largest memory size (16MByte per chip)
-
Access to memory by only one of host CPU or FPGA at any given
time: no cycle-by-cycle arbitration
-
No FPGA-driven communication: must wait for CPU to poll
- Tight timing/signal propagation constraints

Simplified SmartDIMM block diagram
In
order to be completely PC100 (or PC66) compliant, there are many timing and
operational restrictions that must be observed by the SmartDIMMdesign.
We have already analyzed the critical timing paths, and have taken PC
memory bus timing measurements at 100MHz and 66MHz on two separate BX-Chipset
motherboards. Here is a numerical
analysis of the "write burst timing" at 100MHz (PC-100) speeds.
This link will take you to the current working version of our SmartDIMM Control Register Address Definition (updated 4/20/00 ).
Here is a link to the previous version of this specification (which defined both active & passive modes). This older version includes CPLD Interface requirements and details of the most recent set of changes and enhancements to the design and control register set.
FACULTY (web page and CV links):
STUDENTS:
Computing In Memory Architectures, Rapid Prototyping, Hardware Acceleration, and other relevant related references by the SmartDIMM team are listed here (with hyperlinks to full text online where available - ? to group ? keep here, or move off to a separate page?)
PDG Proposal References (are found on this link)
SmartDIMM-related LINKS:
Quote of Interest:
This page has been accessed
times, and was last meddled with:
10/19/04