E-Book, Englisch, 244 Seiten
Gokhale / Graham Reconfigurable Computing
1. Auflage 2006
ISBN: 978-0-387-26106-5
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Accelerating Computation with Field-Programmable Gate Arrays
E-Book, Englisch, 244 Seiten
ISBN: 978-0-387-26106-5
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
A one-of-a-kind survey of the field of Reconfigurable Computing Gives a comprehensive introduction to a discipline that offers a 10X-100X acceleration of algorithms over microprocessors Discusses the impact of reconfigurable hardware on a wide range of applications: signal and image processing, network security, bioinformatics, and supercomputing Includes the history of the field as well as recent advances Includes an extensive bibliography of primary sources
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;5
2;Acknowledgments;9
3;1 An Introduction to Reconfigurable Computing;11
3.1;1.1 What is RC?;11
3.2;1.2 RC Architectures;13
3.3;1.3 How did RC originate?;14
3.4;1.4 Inside the FPGA;16
3.5;1.5 Mapping Algorithms to Hardware;17
3.6;1.6 RC Applications;18
3.7;1.7 Example: Dot Product;19
3.8;1.8 Further Reading;20
4;2 Reconfigurable Logic Devices;21
4.1;2.1 Field-Programmable Gate Arrays;22
4.1.1;2.1.1 Basic Architecture;22
4.1.1.1;Programmable Logic;24
4.1.1.2;Routing;26
4.1.1.3;Programmable I/O Architectures;31
4.1.2;2.1.2 Specialized Function Blocks;32
4.1.2.1;Embedded Memory;33
4.1.2.2;Embedded Arithmetic Logic;33
4.1.2.3;High-Speed Serial I/O;34
4.1.2.4;Embedded Microprocessors;34
4.1.3;2.1.3 Programming Architecture;36
4.2;2.2 Coarse-Grained Reconfigurable Arrays;38
4.2.1;2.2.1 Raw;39
4.2.2;2.2.2 PipeRench;40
4.2.3;2.2.3 RaPiD;42
4.2.4;2.2.4 PACT XPP;43
4.2.5;2.2.5 MathStar;45
4.3;2.3 Summary;46
5;3 Reconfigurable Computing Systems;47
5.1;3.1 Parallel Processing on Reconfigurable Computers;47
5.1.1;3.1.1 Instruction Level Parallelism;47
5.1.2;3.1.2 Task Level Parallelism;49
5.2;3.2 A Survey of Reconfigurable Computing Systems;51
5.2.1;3.2.1 I/O Bus Accelerator;53
5.2.2;3.2.2 Massively Parallel FPGA array;55
5.2.3;3.2.3 Reconfigurable Supercomputer;55
5.2.4;3.2.4 Reconfigurable Logic Co-processor;57
5.3;3.3 Summary;59
6;4 Languages and Compilation;61
6.1;4.1 Design Cycle;61
6.2;4.2 Languages;64
6.2.1;4.2.1 Algorithmic RC Languages;65
6.2.1.1;Algorithmic Language Example;67
6.2.2;4.2.2 Hardware Description Languages (HDL);67
6.2.2.1;A VHDL Example;69
6.3;4.3 High Level Compilation;70
6.3.1;4.3.1 Compiler Phases;75
6.3.2;4.3.2 Analysis and Optimizations;76
6.3.3;4.3.3 Scheduling;77
6.4;4.4 Low Level Design Flow;78
6.4.1;4.4.1 Logic Synthesis;79
6.4.2;4.4.2 Technology Mapping;80
6.4.3;4.4.3 Logic Placement;81
6.4.4;4.4.4 Signal Routing;82
6.4.5;4.4.5 Configuration Bitstreams;83
6.5;4.5 Debugging Reconfigurable Computing Applications;84
6.5.1;4.5.1 Basic Needs for Debugging;84
6.5.1.1;Observability:;84
6.5.1.2;Controllability:;84
6.5.1.3;Execution Control:;84
6.5.1.4;Debugging Data Bandwidth:;84
6.5.1.5;System Execution Speed:;85
6.5.1.6;Instrumentation Costs:;85
6.5.1.7;Ease of Use:;85
6.5.2;4.5.2 Debugging Facilities;85
6.5.3;4.5.3 Challenges for RC Application Debugging;94
6.6;4.6 Summary;95
7;5 Digital Signal Processing Applications;97
7.1;5.1 What is Digital Signal Processing?;97
7.2;5.2 Why Use Recon.gurable Computing for DSP?;99
7.2.1;5.2.1 Reconfigurable Computing’s Suitability for DSP;99
7.2.2;5.2.2 Comparing DSP Implementation Technologies;102
7.3;5.3 DSP Application Building Blocks;106
7.3.1;5.3.1 Basic Operations and Elements;107
7.3.2;5.3.2 Filtering;112
7.3.3;5.3.3 Transforms;113
7.4;5.4 Example DSP Applications;118
7.4.1;5.4.1 Beamforming;118
7.4.2;5.4.2 Software Radio;122
7.5;5.5 Summary;127
8;6 Image Processing;129
8.1;6.1 RC for Image and Video Processing;129
8.2;6.2 Local Neighborhood Functions;131
8.2.1;6.2.1 Cellular Arrays for Pixel Parallelism;133
8.2.2;6.2.2 Image Pipelines for Instruction-Level Parallelism;133
8.3;6.3 Convolution;134
8.4;6.4 Morphology;135
8.5;6.5 Feature Extraction;137
8.6;6.6 Automatic Target Recognition;139
8.7;6.7 Image Matching;141
8.8;6.8 Evolutionary Image Processing;144
8.9;6.9 Summary;149
9;7 Network Security;151
9.1;7.1 Cryptographic Applications;151
9.1.1;7.1.1 Cryptography Basics;152
9.1.1.1;Symmetric Algorithms;152
9.1.1.2;Block Symmetric Algorithms;153
9.1.1.3;Stream ciphers,;154
9.1.1.4;Asymmetric Algorithms;154
9.1.2;7.1.2 RC Cryptographic Algorithm Implementations;156
9.2;7.2 Network Protocol Security;158
9.2.1;7.2.1 RC Network Interface;158
9.2.2;7.2.2 Security Protocols;161
9.2.3;7.2.3 Network Defense;162
9.3;7.3 Summary;165
10;8 Bioinformatics Applications;167
10.1;8.1 Introduction;167
10.2;8.2 Applications;169
10.2.1;8.2.1 Genome Assembly;169
10.2.2;8.2.2 Content-Based Search;170
10.2.3;8.2.3 Genome Comparison;170
10.2.4;8.2.4 Molecular Phylogeny;171
10.2.5;8.2.5 Pattern Matching;171
10.2.6;8.2.6 Protein Domain Databases;172
10.3;8.3 Dynamic Programming Algorithms;173
10.3.1;8.3.1 Alignments;173
10.3.2;8.3.2 Dynamic Programming Equations;174
10.3.3;8.3.3 Gap Functions;176
10.3.4;8.3.4 Systolic DP Computation;176
10.3.5;8.3.5 Backtracking;177
10.3.6;8.3.6 Modulo Encoding;179
10.3.7;8.3.7 FPGA Implementations;180
10.4;8.4 Seed-Based Heuristics;180
10.4.1;8.4.1 Filtering, Heuristics, and Quality Values;181
10.4.2;8.4.2 BLAST: a 3-Stages Heuristic;181
10.4.3;8.4.3 Seed Indexing;182
10.4.4;8.4.4 FPGA Implementations;184
10.5;8.5 Profiles, HMMs and Language Models;184
10.5.1;8.5.1 Position-Dependent Pro.les;184
10.5.2;8.5.2 Hidden Markov Models;185
10.5.3;8.5.3 Language Models;186
10.6;8.6 Bioinformatics FPGA Accelerators;187
10.6.1;8.6.1 Splash;188
10.6.2;8.6.2 Perle;188
10.6.3;8.6.3 GenStorm;188
10.6.4;8.6.4 RDisk;188
10.6.5;8.6.5 BioXL/H;191
10.6.6;8.6.6 DeCypher;191
10.7;8.7 Summary;191
11;9 Supercomputing Applications;193
11.1;9.1 Introduction;193
11.2;9.2 Monte Carlo Simulation of Radiative Heat Transfer;194
11.2.1;9.2.1 Algorithm Description;195
11.2.2;9.2.2 Hardware Implementation;197
11.2.3;9.2.3 Performance;198
11.3;9.3 Urban Road Traffic Simulation;202
11.3.1;9.3.1 CA Traffic Modeling;203
11.3.2;9.3.2 Intersections and Global Behavior;204
11.3.3;9.3.3 Constructive Approach;206
11.3.4;9.3.4 Streaming Approach;208
11.4;9.4 Summary;213
12;References;215
13;Index;243
3 Reconfigurable Computing Systems (p. 37-38)
In this chapter, we will discuss general purpose computing systems that incorporate FPGAs into the system architecture. While modern FPGAs include processors, memory blocks, and built-in I/O interfaces on-chip, recon.gurable systems, even those with a single FPGA or tiled processor array contain off-chip memory and I/O resources as well. Since recon.gurable computing is concerned with parallel operations at any level of granularity, we will motivate the roles that FPGAs can play by first discussing parallel processing models and how they might use reconfigurable logic. We will then survey the field of reconfigurable processing systems.
3.1 Parallel Processing on Recon.gurable Computers
Reconfigurable computing systems derive high performance by exploiting parallelism at multiple levels of granularity, from instruction through task level parallelism. In this section we introduce the levels of parallelism and discuss the use of recon.gurable hardware at various granularity of parallelization.
3.1.1 Instruction Level Parallelism
The lowest level of granularity we consider is instruction-level parallelism. In conventional microprocessors, instruction-level parallelism is exploited in the micro-architecture of a superscalar processor. By having multiple instructions in progress in di.erent stages of completion, the superscalar processor is able to complete more than one instruction in a clock cycle.
Very Long Instruction Word (VLIW) processors offer another method for fine-grained parallel operation. A VLIW processor contains multiple function units operating in parallel. In Figure 3.1, the instruction word contains fields for two integer operations, two floating point operations, two memory operations, and a branch. To compile for a superscalar processor, the compiler simply generates a sequential instruction stream, and the processor parallelizes the instruction stream at run time. In contrast, the VLIW processor executes the instruction word generated by the compiler, requiring the compiler to schedule concurrent operations at compile time.
Co-processor parallelism is achieved within a single instruction stream. A customized parallel instruction is performed by co-processor. Examples of co-processors include MMX/SSE units or vector units. Instructions for the co-processor are integrated into the instruction set of the processor. The coprocessor shares register files and other internal state with other arithmetic units, such as the floating point units, as shown in Figure 3.2.




