Home Technique Digital signal processor

Digital signal processor

Application introduction

Basic introduction

Digital signal processor is composed of large-scale or very large-scale integrated circuit chips to complete certain signal processing tasks Device. It is gradually developed to meet the needs of high-speed real-time signal processing tasks. With the development of integrated circuit technology and digital signal processing algorithms, the implementation methods of digital signal processors are constantly changing, and processing functions continue to improve and expand.


Digital signal processor is not limited to audio and video level, it is widely used in communication and information system, signal and information processing, automatic control, radar, military, aviation Aerospace, medical, household appliances and many other fields. In the past, general-purpose microprocessors were used to complete a large number of digital signal processing operations, which were slow and difficult to meet actual needs; while using bit-chip microprocessors and fast parallel multipliers at the same time was an effective way to realize digital signal processing, but This method has many devices, complicated logic design and programming, high power consumption, and high price. The emergence of digital signal processors has solved the above-mentioned problems. DSP can quickly realize the processing of signal acquisition, transformation, filtering, estimation, enhancement, compression, identification, etc., in order to obtain the signal form that meets people's needs.

For the vehicle host, the digital signal processor DSP mainly provides a specific sound field or effect, such as theater, jazz, etc., and some can also receive high-definition (HD) radio and satellite radio, etc. And so on, in order to achieve the greatest audio-visual enjoyment. The digital signal processor DSP enhances the performance and usability of the vehicle host, improves the audio and video quality, provides more flexibility and a faster design cycle. With the development of technology, I believe that more auditory and visual special effects can be provided in the future, and the on-board host will become the high-tech information and entertainment center in the car.


Digital signal processors can be divided into programmable and non-programmable according to their programmability. The non-programmable signal processor takes the flow of the signal processing algorithm as the basic logic structure, without a control program, and generally can only complete one main processing function, so it is also called a dedicated signal processor. Such as fast Fourier transform processor, digital filter, etc. Although this type of processor has limited functions, it has a higher processing speed. Programmable signal processor can change the function to be completed by the processor through programming, and has greater versatility, so it is also called general-purpose signal processor. With the continuous improvement of the performance-to-price ratio of general-purpose signal processors, its application in the signal area is becoming more and more popular.

The programmable signal processors that have been developed are roughly as follows:

The main body is a micro-processing chip with a basic bit length of 2, 4, and 8 bits, with a program Control chip, interrupt and DMA control chip, clock chip and other components. Using microprogram control and grouping instruction format, a system with the required word length can be constructed as required. Its advantages are fast processing speed and high efficiency. The disadvantage is that the power consumption is large, and the number of chips is also large.

Single chip signal processor. It integrates arithmetic unit, multiplier, memory, read-only memory (ROM), input and output interfaces, and even analog-to-digital, digital-to-analog conversion, etc., all on a single chip. It has fast calculation speed, high precision, low power consumption and strong versatility. Compared with general-purpose microprocessors, its instruction set and addressing mode are more suitable for common operations and data structures in signal processing.

Very Large Scale Integration(VLSI) array processor. This is a signal processor that uses a large number of processing units to complete the same operation on different data under the control of a single instruction sequence to obtain high-speed calculations. It is very suitable for signal processing tasks with large amount of data, large amount of calculation, and strong repetitive operation. They are often used in conjunction with general-purpose computers to form a powerful signal processing system. There are roughly two types of existing array processors, namely systolic array processors.

And wave array processor. The former adopts a unified synchronous clock and control drive mechanism for the entire array, and has the advantages of simple structure, good modularity, and easy expansion. The latter adopts independent timing of each unit and a data-driven mechanism. It brings certain convenience to programming and fault-tolerant design, and also improves the processing speed.

Development direction

Digital signal processors have developed from dedicated signal processors in the 1970s to VLSI array processors, and their application fields have changed from the initial voice, sonar, etc. The processing of low-frequency signals has been developed to the signal processing of large amounts of video data such as radar and images. Due to the use of floating-point arithmetic and parallel processing technology, the processing capabilities of signal processors have been greatly improved. The digital signal processor will continue to develop along the two directions of improving processing speed and operation accuracy. The data stream structure and the artificial neural network structure on the architecture will likely become the basic structural model of the next-generation digital signal processor.


There are many DSP algorithms. Most DSP processors use fixed-point arithmetic, and numbers are expressed as integers or decimals between -1.0 and +1.0. Some processors use floating-point arithmetic, and data is expressed in the form of mantissa plus exponent: mantissa x 2 exponent.

Floating point arithmetic is a more complex conventional algorithm, using floating point data can achieve a large dynamic range of data. This dynamic range can be expressed by the ratio of the maximum and minimum numbers. In the application of floating-point DSP, design engineers do not need to care about issues such as dynamic range and accuracy. Floating-point DSP is easier to program than fixed-point DSP, but the cost and power consumption are high.

Due to cost and power consumption, fixed-point DSPs are generally used in bulk products. Programming and algorithm designers use analysis or simulation to determine the required dynamic range and accuracy. If the requirement is easy to develop, and the dynamic range is very wide, and the precision is high, you can consider using floating-point DSP.

Floating-point calculations can also be implemented by software under the condition of using fixed-point DSP, but such software programs will take up a lot of processor time, so they are rarely used. An effective method is "block floating point", which uses this method to process a group of data with the same exponent but different mantissas as data blocks. "Block floating point" processing is usually implemented in software.

Data width

The word width of all floating-point DSPs is 32 bits, while the word width of fixed-point DSPs is generally 16 bits, and there are also 24 bits and 20 bits. Bit DSP, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Since the word width has a great relationship with the external size of the DSP, the number of pins, and the size of the memory required, the length of the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the greater the memory requirements, and the corresponding increase in cost. Under the condition of meeting the design requirements, try to choose a DSP with a small character width to reduce the cost.

When choosing between fixed-point and floating-point, you can weigh the relationship between word width and development complexity. For example, by combining instructions, a 16-bit word-wide DSP device can also implement a 32-bit word-wide double-precision algorithm. If single-precision can meet most of the calculation requirements, and only a small amount of code requires double-precision, this method is also feasible, but if most of the calculations require high precision, you need to choose a processor with a larger word width.

Please note that most DSP devices have the same width of instruction word and data word, but there are some differences. For example, the data word of ADI's ADSP-21XX series is 16 bits and the instruction word is 24 bits. .

Processing speed

Whether the processor meets the design requirements, the key lies in whether it meets the speed requirements. There are many ways to test the speed of a processor, the most basic of which is to measure the instruction cycle of the processor.

However, instruction execution time does not indicate the true performance of the processor. Different processors perform different tasks in a single instruction. Simply comparing instruction execution time cannot fairly distinguish the difference in performance. Some new DSPs adopt a very long instruction word (VLIW) architecture. In this architecture, multiple instructions can be implemented in a single cycle time, and each instruction implements fewer tasks than traditional DSPs. Therefore, it is relative to VLIW and general-purpose DSP devices. In other words, comparing the size of MIPS can be misleading.

Even compared with traditional DSP MIPS size, there is a certain degree of one-sidedness. For example, some processors allow several bits to be shifted together in a single instruction at the same time, while some DSP instructions can only shift a single bit of data; some DSPs can perform parallel data that has nothing to do with the ALU instruction being executed. Processing (load operands while executing instructions), and some DSPs can only support data parallel processing related to the ALU instructions being executed; some new DSPs allow two MACs to be defined within a single instruction. Therefore, the performance of the processor cannot be accurately obtained by simply comparing the MIPS.

One of the ways to solve the above problems is to use a basic operation as a standard to compare processor performance. MAC operation is commonly used, but MAC operation time cannot provide enough information to compare DSP performance differences. In most DSPs, MAC operation is only implemented in a single instruction cycle, and its MAC time is equal to the instruction cycle time, as mentioned above, Some DSPs can handle more tasks in a single MAC cycle than other DSPs. MAC time does not reflect performance such as loop operations, which are used in all applications.

The most common method is to define a set of standard routines and compare the execution speeds on different DSPs. This routine may be the "core" function of an algorithm, such as FIR or IIR filters, or it may be a whole or part of an application, such as a speech encoder.

When comparing the speed of DSP processors, pay attention to the advertised MOPS (million operations per second) and MFLOPS (million floating point operations per second) parameters, because different manufacturers have The understanding of "operation" is different, and the meaning of the index is also different. For example, some processors can perform floating-point multiplication and floating-point addition operations at the same time, thus advertised that the MFLOPS of their products is twice that of MIPS.

Secondly, when comparing processor clock rates, the input clock of the DSP may be the same as its instruction rate, or it may be two to four times the instruction rate, and different processors may be different. In addition, many DSPs have clock multipliers or phase-locked loops, and can use external low-frequency clocks to generate on-chip high-frequency clock signals.

Practical application

Voice processing: voice coding, voice synthesis, voice recognition, voice enhancement, voice mail, voice storage, etc.

Image/graphics: two-dimensional and three-dimensional graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic maps, image enhancement, etc.

Military; confidential communication, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and anti-search, etc.

Instruments and meters: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automatic control: control, deep space operations, automatic driving, robot control, disk control, etc.

Medical: hearing aids, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Household appliances: digital audio, digital TV, video phone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized X-ray tomography device. (Among them, Housefield, who invented head CT, British EMI company, won the Nobel Prize.)

CAT: Computerized X-ray spatial reconstruction device. Full-body scans, three-dimensional images of heart activity, foreign bodies in brain tumors, and image reconstruction of human torso appear. ECG analysis.

Storage management

The performance of DSP is affected by its ability to manage the memory subsystem. As mentioned earlier, MAC and some other signal processing functions are the basic signal processing capabilities of DSP devices. Fast MAC execution capability requires reading one instruction word and two data words from the memory in each instruction cycle. There are multiple ways to achieve this reading. For example, the use of multi-interface memory (allowing multiple accesses to the memory in each instruction cycle), separate instruction and data memory ("Harvard" structure and its derivatives), and instruction cache (allowing instructions to be read from the cache instead of the memory, Thus, the memory is freed up for data reading).

Also pay attention to the size of the supported memory space. The main target market for many fixed-point DSPs is embedded application systems, in which memory is generally small, so this DSP device has small to medium on-chip memory (about 4K to 64K words) and a narrow external data bus. In addition, the address bus of most fixed-point DSPs is less than or equal to 16 bits, so the external memory space is limited.

Some floating-point DSPs have small or no on-chip memory, but the external data bus is wide. For example, TI’s TMS320C30 only has 6K on-chip memory, the external bus is 24 bits, and the 13-bit external address bus. And ADI's ADSP2-21060 has 4Mb on-chip memory, which can be divided into program memory and data memory in many ways.

When choosing a DSP, you need to choose according to the size of the storage space of the specific application and the requirements for the external bus.

Type characteristics

DSP processors and general-purpose processors (GPPs) such as Intel, Pentium or Power

PC have very The big difference is that the structure and instructions of DSPs are specifically designed and developed for signal processing. It has the following characteristics.

Hardware multiplication and accumulation operations (MACs)

In order to effectively complete multiplication and accumulation operations such as signal filtering, the processor must perform effective multiplication operations. GPPs were not originally designed for heavy multiplication operations. The first major technical improvement that distinguished DSPs from earlier GPPs was the addition of specialized hardware capable of single-cycle multiplication operations and explicit MAC instructions.

Harvard structure

Traditional GPPs use von Norman memory structure. In this structure, a memory space passes through two buses (an address A bus and a data bus) are connected to the processor core. This structure cannot meet the requirement that the MAC must access the memory four times in one instruction cycle. DSPs generally use the Harvard structure. In the Harvard structure, there are two storage spaces: program storage space and data storage space. The processor core is connected to these storage spaces through two sets of buses, allowing two simultaneous accesses to the memory. This arrangement doubles the bandwidth of the processor. In the Harvard structure, sometimes by adding a second data storage space and bus to achieve greater storage bandwidth. Modern high-performance GPPs usually have two on-chip cache memories, one for data and one for instructions. From a theoretical point of view, this dual on-chip cache and bus connection is equivalent to the Harvard structure, but GPPs use control logic to determine which data and instruction words reside in the on-chip cache. This process is usually not for programmers. As you can see, in DSPs, programmers can clearly control which data and instructions are stored in on-chip storage units or caches.

Zero-consumption loop control

The common feature of DSP algorithms: most of the processing time is spent executing a small number of instructions contained in a relatively small loop. Therefore, most DSP processors have dedicated hardware for zero-consumption cycle control. Zero-consumption cycle refers to a cycle in which the processor can execute a set of instructions without spending time testing the value of the cycle counter, and the hardware completes the cycle jump and the attenuation of the cycle counter. Some DSPs also implement high-speed single-instruction loops through a one-instruction cache.

Special addressing mode

DSPs often contain special address generators, which can generate special addressing required by signal processing algorithms, such as circular search Address and bit flip addressing. Cyclic addressing corresponds to the pipeline FIR filter algorithm, and bit flip addressing corresponds to the FFT algorithm.

Predictability of execution time

Most DSP applications have hard real-time requirements, in each case all processing work must be at a specified time Finished within. This real-time limitation requires the programmer to determine how much time each sample will take, or at least how much time will be used in the worst case. The process of DSPs executing the program is transparent to the programmer, so it is easy to predict the execution time of each task. However, for high-performance GPPs, due to the use of a large amount of ultra-high-speed data and program cache, and the dynamic allocation of programs, the prediction of execution time becomes complicated and difficult.

Abundant peripherals

DSPs have peripherals such as DMA, serial ports, Link ports, and timers.

Introduction to knowledge

Algorithm format

There are many DSP algorithms. Most DSP processors use fixed-point arithmetic, and numbers are expressed as integers or decimals between -1.0 and +1.0. Some processors use floating-point arithmetic, and the data is expressed in the form of mantissa plus exponent: mantissa x 2 exponent.

Floating point algorithm is a more complex conventional algorithm, using floating point data can achieve a large data dynamic range (this dynamic range can be expressed by the ratio of the maximum and minimum numbers). In the application of floating-point DSP, design engineers do not need to care about issues such as dynamic range and accuracy. Floating-point DSP is easier to program than fixed-point DSP, but the cost and power consumption are high.

Due to cost and power consumption, fixed-point DSPs are generally used in bulk products. Programming and algorithm designers use analysis or simulation to determine the required dynamic range and accuracy. If the requirement is easy to develop, and the dynamic range is very wide, and the precision is high, you can consider using floating-point DSP.

Floating-point calculations can also be implemented by software under the condition of using fixed-point DSP, but such software programs will take up a lot of processor time, so they are rarely used. An effective method is "block floating point", which uses this method to process a group of data with the same exponent but different mantissas as data blocks. "Block floating point" processing is usually implemented in software.

Data width

The word width of all floating-point DSPs is 32 bits, while the word width of fixed-point DSPs is generally 16 bits. There are also 24-bit and 20-bit DSPs, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Since the word width has a great relationship with the external size of the DSP, the number of pins, and the size of the memory required, the length of the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the greater the memory requirements, and the corresponding increase in cost. Under the condition of meeting the design requirements, try to choose a DSP with a small character width to reduce the cost.

When choosing between fixed-point and floating-point, you can weigh the relationship between word width and development complexity. For example, by combining instructions, a 16-bit word-wide DSP device can also implement a 32-bit word-wide double-precision algorithm (of course, double-precision arithmetic is much slower than single-precision arithmetic). If single-precision can meet most of the calculation requirements, and only a small amount of code requires double-precision, this method is also feasible, but if most of the calculations require high precision, you need to choose a processor with a larger word width.

Please note that the instruction word and data word width of most DSP devices are the same, and there are some differences. For example, the data word of the ADSP-21XX series of ADI (Analog Devices Company) is 16 bits and the instruction The word is 24 bits.

Processing speed

Whether the processor meets the design requirements, the key lies in whether it meets the speed requirements. There are many ways to test the speed of a processor. The most basic is to measure the instruction cycle of the processor, that is, the time required for the processor to execute the fastest instruction. The reciprocal of the instruction cycle is divided by one million, and then multiplied by the number of instructions executed in each cycle. The result is the highest speed of the processor, in units of million instructions per second, MIPS.

However, instruction execution time does not indicate the true performance of the processor. Different processors perform different tasks in a single instruction. Simply comparing instruction execution time cannot fairly distinguish the difference in performance. Some new DSPs use a very long instruction word (VLIW) architecture. In this architecture, multiple instructions can be implemented in a single cycle time, and each instruction implements fewer tasks than traditional DSPs. Therefore, it is relative to VLIW and general-purpose DSP devices. In other words, comparing the size of MIPS can be misleading.

Even compared with traditional DSP MIPS size, there is a certain degree of one-sidedness. For example, some processors allow several bits to be shifted together in a single instruction at the same time, while some DSP instructions can only shift a single bit of data; some DSPs can perform parallel data that has nothing to do with the ALU instruction being executed. Processing (load operands while executing instructions), and some DSPs can only support data parallel processing related to the ALU instructions being executed; some new DSPs allow two MACs to be defined within a single instruction. Therefore, the performance of the processor cannot be accurately obtained by simply comparing the MIPS.

One of the ways to solve the above problems is to use a basic operation (rather than instructions) as a standard to compare processor performance. MAC operation is commonly used, but MAC operation time cannot provide enough information to compare DSP performance differences. In most DSPs, MAC operation is only implemented in a single instruction cycle, and its MAC time is equal to the instruction cycle time, as mentioned above, Some DSPs can handle more tasks in a single MAC cycle than other DSPs. MAC time does not reflect performance such as loop operations, which are used in all applications.

The most common method is to define a set of standard routines and compare the execution speeds on different DSPs. This routine may be the "core" function of an algorithm, such as FIR or IIR filters, etc., or it may be the entire or part of the application (such as a speech encoder). Figure 1 shows the performance of several DSP devices tested using BDTI's tools.

When comparing the speed of DSP processors, pay attention to the advertised MOPS (million operations per second) and MFLOPS (million floating point operations per second) parameters, because different manufacturers have The understanding of "operation" is different, and the meaning of the index is also different. For example, some processors can perform floating-point multiplication and floating-point addition operations at the same time, thus advertised that the MFLOPS of their products is twice that of MIPS.

Secondly, when comparing processor clock rates, the input clock of the DSP may be the same as its instruction rate, or it may be two to four times the instruction rate, and different processors may be different. In addition, many DSPs have clock multipliers or phase-locked loops, and can use external low-frequency clocks to generate on-chip high-frequency clock signals.

Practical application

Voice processing: voice coding, voice synthesis, voice recognition, voice enhancement, voice mail, voice storage, etc.

Image/graphics: 2D and 3D graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic maps, image enhancement, etc.

Military; confidential communication, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and anti-search, etc.

Instruments and meters: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automatic control: control, deep space operations, automatic driving, robot control, disk control, etc.

Medical: hearing aids, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Household appliances: digital audio, digital TV, video phone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized X-ray tomography device. (Among them, Housefield, who invented head CT, British EMI, won the Nobel Prize.)

CAT: Computerized X-ray spatial reconstruction device. Full-body scans, three-dimensional images of heart activity, foreign bodies in brain tumors, and image reconstruction of human torso appear.

ECG analysis.

Storage Management

The performance of DSP is affected by its ability to manage the memory subsystem. As mentioned earlier, MAC and some other signal processing functions are the basic signal processing capabilities of DSP devices. Fast MAC execution capability requires reading one instruction word and two data words from the memory in each instruction cycle. There are many ways to achieve this read, including multi-interface memory (allowing multiple accesses to the memory in each instruction cycle), separate instruction and data memory ("Harvard" structure and its derivatives), and instruction cache (allowing from Cache the read instructions instead of the memory, thereby freeing the memory for data reading). Figures 2 and 3 show the difference between the Harvard memory structure and the "von Norman" structure used by many microcontrollers.

Also pay attention to the size of the supported memory space. The main target market for many fixed-point DSPs is embedded application systems, in which memory is generally small, so this DSP device has small to medium on-chip memory (about 4K to 64K words) and a narrow external data bus. In addition, the address bus of most fixed-point DSPs is less than or equal to 16 bits, so the external memory space is limited.

Some floating-point DSPs have small or no on-chip memory, but the external data bus is wide. For example, TI’s TMS320C30 only has 6K on-chip memory, the external bus is 24 bits, and the 13-bit external address bus. And ADI's ADSP2-21060 has 4Mb on-chip memory, which can be divided into program memory and data memory in many ways.

When choosing a DSP, you need to choose according to the size of the storage space of the specific application and the requirements for the external bus.

Type characteristics

DSP processors and general-purpose processors (GPPs) such as Intel, Pentium or Power

PC are very different, these differences produce The structure and instructions for DSPs are specifically designed and developed for signal processing, and it has the following characteristics.

·Hardware multiplication and accumulation operations (MACs)

In order to effectively complete multiplication and accumulation operations such as signal filtering, the processor must perform effective multiplication operations. GPPs were not originally designed for heavy multiplication operations. The first major technical improvement that distinguished DSPs from earlier GPPs was the addition of specialized hardware capable of single-cycle multiplication operations and explicit MAC instructions.

·Harvard Structure

Traditional GPPs use Feng. Norman memory structure. In this structure, a memory space is connected to the processor core through two buses (an address bus and a data bus). This structure cannot satisfy that the MAC must perform four operations on the memory in one instruction cycle. Requirements for the second visit. DSPs generally use the Harvard structure. In the Harvard structure, there are two storage spaces: program storage space and data storage space. The processor core is connected to these storage spaces through two sets of buses, allowing two simultaneous accesses to the memory. This arrangement doubles the bandwidth of the processor. In the Harvard structure, sometimes by adding a second data storage space and bus to achieve greater storage bandwidth. Modern high-performance GPPs usually have two on-chip cache memories, one for data and one for instructions. From a theoretical point of view, this dual on-chip cache and bus connection is equivalent to the Harvard structure, but GPPs use control logic to determine which data and instruction words reside in the on-chip cache. This process is usually not for programmers. As you can see, in DSPs, programmers can clearly control which data and instructions are stored in on-chip storage units or caches.

Zero-consumption loop control

The common feature of DSP algorithms: most of the processing time is spent executing a small number of instructions contained in a relatively small loop. Therefore, most DSP processors have dedicated hardware for zero-consumption cycle control. Zero-consumption cycle refers to a cycle in which the processor can execute a set of instructions without spending time testing the value of the cycle counter, and the hardware completes the cycle jump and the attenuation of the cycle counter. Some DSPs also implement high-speed single-instruction loops through a one-instruction cache.

·Special addressing mode

DSPs often contain special address generators, which can generate special addressing required by signal processing algorithms, such as loops Addressing and bit flip addressing. Cyclic addressing corresponds to the pipeline FIR filter algorithm, and bit flip addressing corresponds to the FFT algorithm.

·Predictability of execution time

Most DSP applications have hard real-time requirements. In each case, all processing work must be specified Completed in time. This real-time limitation requires the programmer to determine how much time each sample will take, or at least how much time will be used in the worst case. The process of DSPs executing the program is transparent to the programmer, so it is easy to predict the execution time of each task. However, for high-performance GPPs, due to the use of a large amount of ultra-high-speed data and program cache, and the dynamic allocation of programs, the prediction of execution time becomes complicated and difficult.

·Abundant peripherals

DSPs have peripherals such as DMA, serial ports, Link ports, and timers.

Evaluation criteria

Performance classification

The performance of DSP processors can be divided into three grades: low-cost, low-performance DSPs, and low-energy mid-range DSPs And diverse high-end DSPs. Low-cost performance low-end DSPs are the most widely used processors in the industry. Products in this range include: ADSP-21xx, TMS320C2xx, DSP560xx and other series, their operating speed is generally 20-50MIPS, and while maintaining appropriate energy consumption and storage capacity, while providing high-quality DSP performance. Moderately priced DSP processors, through increased clock frequency, combined with more complex hardware to improve performance, formed the mid-range products of DSPs, such as DSP16xx, TMS320C54x series, their operating speed is 100 ~ 150MIPS, usually used in wireless telecommunication In equipment and high-speed demodulators, relatively high processing speed and low energy consumption are required. As high-end DSPs are driven by the demand for ultra-high-speed processing, their structure has really begun to be classified and diversified. The relevant structure is detailed in the next section. The main frequency of high-end DSPs is above 150MHz, and the processing speed is above 1000MIPS, such as TI's TMS320C6X series, ADI's Tiger SHARC, etc.

Evaluation indicators

There are many indicators for evaluating processor performance, the most commonly used is speed, but energy consumption and memory capacity indicators are also very important, especially in embedded system applications. In view of the increasing number of DSPs, it becomes more difficult for system designers to select the processor that can provide the best performance on a given application device. In the past, DSP system designers relied on MIPS or similar metrics to get a rough idea of ​​the relative performance provided by different chips. Unfortunately, with the diversification of processor technology, traditional measures like MIPS are becoming less and more inaccurate, because MIPS does not actually measure performance. Since one of the characteristics of DSP applications is that most of the processing work is concentrated in a part of the program (core program), it is possible to test and evaluate the DSP processor with reference programs related to signal processing. BDTI has completed a set of core standards and registered a new type of hybrid speed measurement: BDTI score.

Introduction to the structure


In the past two years, the higher performance of the DSP processor cannot be solved from the traditional structure, so various improvements have been proposed Performance strategy. Increasing the clock frequency seems to be limited, and the best way is to increase the parallelism. Increasing the parallelism of operations can be achieved in two ways: increasing the number of operations executed by each instruction, or increasing the number of instructions executed in each instruction cycle. These two parallel requirements have produced a variety of new structures for DSPs.



Previously, DSP processors used complex and mixed instruction sets, allowing programmers to Multiple operations are coded in one instruction. Traditionally, DSP processors only issue and execute one instruction in one instruction cycle. This single-stream, complex instruction method enables the DSP processor to obtain very powerful performance without the need for a large amount of memory.

While keeping the DSP structure and the above-mentioned instruction set unchanged, one way to increase the workload of each instruction is to use additional execution units and increase data paths. For example, some high-end DSPs have two multipliers instead of one. We call the DSPs that use this method 撛銮啃畃覫榫覫諛, because their structure is similar to the previous generation DSP, but the performance is greatly enhanced by adding execution units. Of course, the instruction set must also be enhanced at the same time, so that the programmer can specify more parallel operations in one instruction to take advantage of additional hardware. Examples of enhanced DSPs are Lucent's DSP16000 and ADI's ADSP2116x. The advantage of enhanced DSPs is that they are compatible and have similar cost and power consumption to earlier DSPs. The disadvantage is that the structure is complex, the instructions are complex, and further development is limited.



As mentioned earlier, traditional DSP processors use complex mixed instructions, and Only one instruction flows and executes in the instruction loop. However, recently some DSPs adopt a more RISC-based instruction set, and execute multiple instructions in one instruction cycle, using a large unified register file. For example, Siemems' Carmel, Philips' TriMedia, and TI's TMS320C62XX processor family all use a very long instruction word (VLIW) structure. The C62xx processor fetches a 256-bit instruction packet each time, parses the packet into 8 32-bit instructions, and then directs them to its 8 independent execution units. In the best case, C62xx executes 8 instructions at the same time. In this case, it reaches a very high MIPS rate (such as 1600MIPS). The advantages of the VLIW structure are high performance, regular structure (potentially easy to program and good target compilation system).缺点是高功耗、代码膨胀-需要宽的程序存储器、新的编程/编译困难(需跟踪指令安排,易破坏流水线使性能下降)。








通用处理器SIMD增强的两个例子是Pentium的MMX扩展和PowerPC族的AltiVec扩展。 simd在一些高性能的DSP处理器中也有应用。例如,DSP16000在其数据路中支持有限的SIMD风格的操作,而Analog

Devices最近推出了有名的SHARC的新一代DSP处理器,进行了SIMD能力的扩展。 SIMD结构由于使总线、数据通道等资源充分使用,并无需改变信号处理(含图象、语音)算法的基本结构,因此SIMD结构使用越来越普遍。 SIMD结构遇到的问题是算法、数据结构必须满足数据并行处理的要求,为了加速,循环常常需要被拆开,处理数据需要重新安排调整。通常SIMD仅支持定点运算。



许多的应用需要以控制为主的软件和DSP软件的混合。一个明显的例子是数字蜂窝电话,因为其中有监控和语音处理的工作。一般地,微处理器在控制上能提供良好的性能而在DSP性能上则很糟,专用的DSP处理器的特性则刚好相反。因此,最近有一些微处理器产商开始提供DSP增强版本的微处理器。用单处理器完成两种软件的任务是很有吸引力的,因为其可以潜在地提供简化设计,节省版面空间,降低总功耗,降低系统成本等。 DSP和微处理器结合的方法有:











VLIW结构、超标量体系结构和DSP/MCU混合处理器是DSPs结构发展的新潮流。 VLIW和超标量结构能够获得很高的处理性能。 DSP/MCU混合可以简化应用系统设计,降低体积和成本。高性能通用处理器(GPPs)借用了DSPs的许多结构优点,其浮点处理速度比高档DSPs还要快。高性能GPPs一般时钟频率为200~500MHz,具有超标量、SIMD结构,单周期乘法操作,好的存储器带宽,转移预测功能,因此GPPs正在涉足DSP领域。但由于GPPs缺乏实时可预测性,优化DSP代码困难,有限的DSP工具支持,高功耗等问题,因此GPPs在DSP中的应用还有限。但瞄准嵌入系统应用的高性能GPPs与DSPs进行混合,形成专用的嵌入GPPs,如Hitachi的SH-DSP,ARM的Piccolo,Siemens的TriCore。嵌入GPPs保留原有的高性能,并加强DSP实时预测、控制等方面的能力,与专用DSP处理器形成了对照。



随着DSPs性能的提高,开发工具可能比处理器结构将更重要,因为只有有效的开发工具,才能使处理器得到普遍使用,并使性能充分发挥。片上Debug是实时调试的最好手段,它将采用与JTAG兼容的Debug口。 C编译器的效率仍然是重点,如何方便容易地进行有效代码开发是关键。指令软件仿真器显得更重要,更精确的指令软件仿真器将得到开发。多类型DSP调试开发工具将混合集成在一起。 DSPs开发工具将是一个充满机遇和挑战的领域。

DSP处理器存在两种发展趋势:一是DSP应用越来越多,如手机和便携式音频播放器等。 DSP将集成更多功能,如A/D转换、LCD控制器等,系统成本和器件数将会大为降低。另一个趋势是将DSP作为IP出售,如亿恒科技公司的Camel和TriCore内核。随着EDA工具的不断成熟,系统设计工程师将更容易地修改DSP内核,加入用户专用外围电路以实现更专业化、更低成本的DSP解决方案。


DSP处理器(DSPs)的品种越来越繁多,除了大家熟知的四大DSPs产商:TexasInstruments(德州仪器)公司、Lucent Technologies(朗讯技术)公司、Analog

Devies(模拟设备)公司和Motorola(摩托罗拉)公司,大约还有80家DSPs产商。它们生产的DSPs主要用于特殊功能的设备,如调制解调器、MPEG译码器、硬盘驱动器等。 DSP处理器可分为两大类:定点DSPs和浮点DSPs。定点DSPs发展迅速,品种最多,处理速度为20~2000MIPS。浮点DSPs基本由TI和AD公司垄断,处理速度40~1000MFLOPS。 DSPs的性能已形成低、中、高三档,高端产品处理器结构发生了深刻的变化,形成了多样化的趋势。




























数字信号处理(Digital Signal Processing,简称DSP)是一门涉及许多学科而又广泛应用于许多领域的新兴学科。数字信号处理是围绕着数字信号处理的理论、实现和应用等几个方面发展起来的。数字信号处理在理论上的发展推动了数字信号处理应用的发展。反过来,数字信号处理的应用又促进了数字信号处理理论的提高。而数字信号处理的实现则是理论和应用之间的桥梁。数字信号处理是以众多的学科为理论基础的,它所涉及的范围及其广泛。例如,在数学领域,微积分、概率统计、随机过程、数值分析等都是数字信号处理的基本工具,与网络理论、信号与系统、控制论、通信理论、故障诊断等也密切相关。一些新兴的学科,如人工智能、模式识别、神经网络等,都与数字信号处理密不可分。可以说,数字信号处理是把许多经典的理论体系作为自己的理论基础,同时又使自己成为一系列新兴学科的理论基础。 



数字信号处理的书籍很多,其中以麻省理工学院奥本海姆编著的《Discrete Time Signa Processing》最为经典 ,有中译本《离散时间信号处理》由西安交通大学出版。

This article is from the network, does not represent the position of this station. Please indicate the origin of reprint