Home Technique Digital signal processor

Digital signal processor



Application introduction

Basic introduction

Digital signal processor is composed of large-scale or very large-scale integrated circuit chips to complete certain signal processing tasks Device. It is gradually developed to meet the needs of high-speed real-time signal processing tasks. With the development of integrated circuit technology and digital signal processing algorithms, the implementation methods of digital signal processors are constantly changing, and processing functions continue to improve and expand.

Application

Digital signal processor is not limited to audio and video level, it is widely used in communication and information system, signal and information processing, automatic control, radar, military, aviation Aerospace, medical, household appliances and many other fields. In the past, general-purpose microprocessors were used to complete a large number of digital signal processing operations, which were slow and difficult to meet actual needs; while using bit-chip microprocessors and fast parallel multipliers at the same time was an effective way to realize digital signal processing, but This method has many devices, complicated logic design and programming, high power consumption, and high price. The emergence of digital signal processors has solved the above-mentioned problems. DSP can quickly realize the processing of signal acquisition, transformation, filtering, estimation, enhancement, compression, identification, etc., in order to obtain the signal form that meets people's needs.

For the vehicle host, the digital signal processor DSP mainly provides a specific sound field or effect, such as theater, jazz, etc., and some can also receive high-definition (HD) radio and satellite radio, etc. And so on, in order to achieve the greatest audio-visual enjoyment. The digital signal processor DSP enhances the performance and usability of the vehicle host, improves the audio and video quality, provides more flexibility and a faster design cycle. With the development of technology, I believe that more auditory and visual special effects can be provided in the future, and the on-board host will become the high-tech information and entertainment center in the car.

Classification

Digital signal processors can be divided into programmable and non-programmable according to their programmability. The non-programmable signal processor takes the flow of the signal processing algorithm as the basic logic structure, without a control program, and generally can only complete one main processing function, so it is also called a dedicated signal processor. Such as fast Fourier transform processor, digital filter, etc. Although this type of processor has limited functions, it has a higher processing speed. Programmable signal processor can change the function to be completed by the processor through programming, and has greater versatility, so it is also called general-purpose signal processor. With the continuous improvement of the performance-to-price ratio of general-purpose signal processors, its application in the signal area is becoming more and more popular.

The programmable signal processors that have been developed are roughly as follows:

The main body is a micro-processing chip with a basic bit length of 2, 4, and 8 bits, with a program Control chip, interrupt and DMA control chip, clock chip and other components. Using microprogram control and grouping instruction format, a system with the required word length can be constructed as required. Its advantages are fast processing speed and high efficiency. The disadvantage is that the power consumption is large, and the number of chips is also large.

Single chip signal processor. It integrates arithmetic unit, multiplier, memory, read-only memory (ROM), input and output interfaces, and even analog-to-digital, digital-to-analog conversion, etc., all on a single chip. It has fast calculation speed, high precision, low power consumption and strong versatility. Compared with general-purpose microprocessors, its instruction set and addressing mode are more suitable for common operations and data structures in signal processing.

Very Large Scale Integration(VLSI) array processor. This is a signal processor that uses a large number of processing units to complete the same operation on different data under the control of a single instruction sequence to obtain high-speed calculations. It is very suitable for signal processing tasks with large amount of data, large amount of calculation, and strong repetitive operation. They are often used in conjunction with general-purpose computers to form a powerful signal processing system. There are roughly two types of existing array processors, namely systolic array processors.

And wave array processor. The former adopts a unified synchronous clock and control drive mechanism for the entire array, and has the advantages of simple structure, good modularity, and easy expansion. The latter adopts independent timing of each unit and a data-driven mechanism. It brings certain convenience to programming and fault-tolerant design, and also improves the processing speed.

Development direction

Digital signal processors have developed from dedicated signal processors in the 1970s to VLSI array processors, and their application fields have changed from the initial voice, sonar, etc. The processing of low-frequency signals has been developed to the signal processing of large amounts of video data such as radar and images. Due to the use of floating-point arithmetic and parallel processing technology, the processing capabilities of signal processors have been greatly improved. The digital signal processor will continue to develop along the two directions of improving processing speed and operation accuracy. The data stream structure and the artificial neural network structure on the architecture will likely become the basic structural model of the next-generation digital signal processor.

AlgorithmFormat

There are many DSP algorithms. Most DSP processors use fixed-point arithmetic, and numbers are expressed as integers or decimals between -1.0 and +1.0. Some processors use floating-point arithmetic, and data is expressed in the form of mantissa plus exponent: mantissa x 2 exponent.

Floating point arithmetic is a more complex conventional algorithm, using floating point data can achieve a large dynamic range of data. This dynamic range can be expressed by the ratio of the maximum and minimum numbers. In the application of floating-point DSP, design engineers do not need to care about issues such as dynamic range and accuracy. Floating-point DSP is easier to program than fixed-point DSP, but the cost and power consumption are high.

Due to cost and power consumption, fixed-point DSPs are generally used in bulk products. Programming and algorithm designers use analysis or simulation to determine the required dynamic range and accuracy. If the requirement is easy to develop, and the dynamic range is very wide, and the precision is high, you can consider using floating-point DSP.

Floating-point calculations can also be implemented by software under the condition of using fixed-point DSP, but such software programs will take up a lot of processor time, so they are rarely used. An effective method is "block floating point", which uses this method to process a group of data with the same exponent but different mantissas as data blocks. "Block floating point" processing is usually implemented in software.

Data width

The word width of all floating-point DSPs is 32 bits, while the word width of fixed-point DSPs is generally 16 bits, and there are also 24 bits and 20 bits. Bit DSP, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Since the word width has a great relationship with the external size of the DSP, the number of pins, and the size of the memory required, the length of the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the greater the memory requirements, and the corresponding increase in cost. Under the condition of meeting the design requirements, try to choose a DSP with a small character width to reduce the cost.

When choosing between fixed-point and floating-point, you can weigh the relationship between word width and development complexity. For example, by combining instructions, a 16-bit word-wide DSP device can also implement a 32-bit word-wide double-precision algorithm. If single-precision can meet most of the calculation requirements, and only a small amount of code requires double-precision, this method is also feasible, but if most of the calculations require high precision, you need to choose a processor with a larger word width.

Please note that most DSP devices have the same width of instruction word and data word, but there are some differences. For example, the data word of ADI's ADSP-21XX series is 16 bits and the instruction word is 24 bits. .

Processing speed

Whether the processor meets the design requirements, the key lies in whether it meets the speed requirements. There are many ways to test the speed of a processor, the most basic of which is to measure the instruction cycle of the processor.

However, instruction execution time does not indicate the true performance of the processor. Different processors perform different tasks in a single instruction. Simply comparing instruction execution time cannot fairly distinguish the difference in performance. Some new DSPs adopt a very long instruction word (VLIW) architecture. In this architecture, multiple instructions can be implemented in a single cycle time, and each instruction implements fewer tasks than traditional DSPs. Therefore, it is relative to VLIW and general-purpose DSP devices. In other words, comparing the size of MIPS can be misleading.

Even compared with traditional DSP MIPS size, there is a certain degree of one-sidedness. For example, some processors allow several bits to be shifted together in a single instruction at the same time, while some DSP instructions can only shift a single bit of data; some DSPs can perform parallel data that has nothing to do with the ALU instruction being executed. Processing (load operands while executing instructions), and some DSPs can only support data parallel processing related to the ALU instructions being executed; some new DSPs allow two MACs to be defined within a single instruction. Therefore, the performance of the processor cannot be accurately obtained by simply comparing the MIPS.

One of the ways to solve the above problems is to use a basic operation as a standard to compare processor performance. MAC operation is commonly used, but MAC operation time cannot provide enough information to compare DSP performance differences. In most DSPs, MAC operation is only implemented in a single instruction cycle, and its MAC time is equal to the instruction cycle time, as mentioned above, Some DSPs can handle more tasks in a single MAC cycle than other DSPs. MAC time does not reflect performance such as loop operations, which are used in all applications.

The most common method is to define a set of standard routines and compare the execution speeds on different DSPs. This routine may be the "core" function of an algorithm, such as FIR or IIR filters, or it may be a whole or part of an application, such as a speech encoder.

When comparing the speed of DSP processors, pay attention to the advertised MOPS (million operations per second) and MFLOPS (million floating point operations per second) parameters, because different manufacturers have The understanding of "operation" is different, and the meaning of the index is also different. For example, some processors can perform floating-point multiplication and floating-point addition operations at the same time, thus advertised that the MFLOPS of their products is twice that of MIPS.

Secondly, when comparing processor clock rates, the input clock of the DSP may be the same as its instruction rate, or it may be two to four times the instruction rate, and different processors may be different. In addition, many DSPs have clock multipliers or phase-locked loops, and can use external low-frequency clocks to generate on-chip high-frequency clock signals.

Practical application

Voice processing: voice coding, voice synthesis, voice recognition, voice enhancement, voice mail, voice storage, etc.

Image/graphics: two-dimensional and three-dimensional graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic maps, image enhancement, etc.

Military; confidential communication, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and anti-search, etc.

Instruments and meters: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automatic control: control, deep space operations, automatic driving, robot control, disk control, etc.

Medical: hearing aids, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Household appliances: digital audio, digital TV, video phone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized X-ray tomography device. (Among them, Housefield, who invented head CT, British EMI company, won the Nobel Prize.)

CAT: Computerized X-ray spatial reconstruction device. Full-body scans, three-dimensional images of heart activity, foreign bodies in brain tumors, and image reconstruction of human torso appear. ECG analysis.

Storage management

The performance of DSP is affected by its ability to manage the memory subsystem. As mentioned earlier, MAC and some other signal processing functions are the basic signal processing capabilities of DSP devices. Fast MAC execution capability requires reading one instruction word and two data words from the memory in each instruction cycle. There are multiple ways to achieve this reading. For example, the use of multi-interface memory (allowing multiple accesses to the memory in each instruction cycle), separate instruction and data memory ("Harvard" structure and its derivatives), and instruction cache (allowing instructions to be read from the cache instead of the memory, Thus, the memory is freed up for data reading).

Also pay attention to the size of the supported memory space. The main target market for many fixed-point DSPs is embedded application systems, in which memory is generally small, so this DSP device has small to medium on-chip memory (about 4K to 64K words) and a narrow external data bus. In addition, the address bus of most fixed-point DSPs is less than or equal to 16 bits, so the external memory space is limited.

Some floating-point DSPs have small or no on-chip memory, but the external data bus is wide. For example, TI’s TMS320C30 only has 6K on-chip memory, the external bus is 24 bits, and the 13-bit external address bus. And ADI's ADSP2-21060 has 4Mb on-chip memory, which can be divided into program memory and data memory in many ways.

When choosing a DSP, you need to choose according to the size of the storage space of the specific application and the requirements for the external bus.

Type characteristics

DSP processors and general-purpose processors (GPPs) such as Intel, Pentium or Power

PC have very The big difference is that the structure and instructions of DSPs are specifically designed and developed for signal processing. It has the following characteristics.

Hardware multiplication and accumulation operations (MACs)

In order to effectively complete multiplication and accumulation operations such as signal filtering, the processor must perform effective multiplication operations. GPPs were not originally designed for heavy multiplication operations. The first major technical improvement that distinguished DSPs from earlier GPPs was the addition of specialized hardware capable of single-cycle multiplication operations and explicit MAC instructions.

Harvard structure

Traditional GPPs use von Norman memory structure. In this structure, a memory space passes through two buses (an address A bus and a data bus) are connected to the processor core. This structure cannot meet the requirement that the MAC must access the memory four times in one instruction cycle. DSPs generally use the Harvard structure. In the Harvard structure, there are two storage spaces: program storage space and data storage space. The processor core is connected to these storage spaces through two sets of buses, allowing two simultaneous accesses to the memory. This arrangement doubles the bandwidth of the processor. In the Harvard structure, sometimes by adding a second data storage space and bus to achieve greater storage bandwidth. Modern high-performance GPPs usually have two on-chip cache memories, one for data and one for instructions. From a theoretical point of view, this dual on-chip cache and bus connection is equivalent to the Harvard structure, but GPPs use control logic to determine which data and instruction words reside in the on-chip cache. This process is usually not for programmers. As you can see, in DSPs, programmers can clearly control which data and instructions are stored in on-chip storage units or caches.

Zero-consumption loop control

The common feature of DSP algorithms: most of the processing time is spent executing a small number of instructions contained in a relatively small loop. Therefore, most DSP processors have dedicated hardware for zero-consumption cycle control. Zero-consumption cycle refers to a cycle in which the processor can execute a set of instructions without spending time testing the value of the cycle counter, and the hardware completes the cycle jump and the attenuation of the cycle counter. Some DSPs also implement high-speed single-instruction loops through a one-instruction cache.

Special addressing mode

DSPs often contain special address generators, which can generate special addressing required by signal processing algorithms, such as circular search Address and bit flip addressing. Cyclic addressing corresponds to the pipeline FIR filter algorithm, and bit flip addressing corresponds to the FFT algorithm.

Predictability of execution time

Most DSP applications have hard real-time requirements, in each case all processing work must be at a specified time Finished within. This real-time limitation requires the programmer to determine how much time each sample will take, or at least how much time will be used in the worst case. The process of DSPs executing the program is transparent to the programmer, so it is easy to predict the execution time of each task. However, for high-performance GPPs, due to the use of a large amount of ultra-high-speed data and program cache, and the dynamic allocation of programs, the prediction of execution time becomes complicated and difficult.

Abundant peripherals

DSPs have peripherals such as DMA, serial ports, Link ports, and timers.

Introduction to knowledge

Algorithm format

There are many DSP algorithms. Most DSP processors use fixed-point arithmetic, and numbers are expressed as integers or decimals between -1.0 and +1.0. Some processors use floating-point arithmetic, and the data is expressed in the form of mantissa plus exponent: mantissa x 2 exponent.

Floating point algorithm is a more complex conventional algorithm, using floating point data can achieve a large data dynamic range (this dynamic range can be expressed by the ratio of the maximum and minimum numbers). In the application of floating-point DSP, design engineers do not need to care about issues such as dynamic range and accuracy. Floating-point DSP is easier to program than fixed-point DSP, but the cost and power consumption are high.

Due to cost and power consumption, fixed-point DSPs are generally used in bulk products. Programming and algorithm designers use analysis or simulation to determine the required dynamic range and accuracy. If the requirement is easy to develop, and the dynamic range is very wide, and the precision is high, you can consider using floating-point DSP.

Floating-point calculations can also be implemented by software under the condition of using fixed-point DSP, but such software programs will take up a lot of processor time, so they are rarely used. An effective method is "block floating point", which uses this method to process a group of data with the same exponent but different mantissas as data blocks. "Block floating point" processing is usually implemented in software.

Data width

The word width of all floating-point DSPs is 32 bits, while the word width of fixed-point DSPs is generally 16 bits. There are also 24-bit and 20-bit DSPs, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Since the word width has a great relationship with the external size of the DSP, the number of pins, and the size of the memory required, the length of the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the greater the memory requirements, and the corresponding increase in cost. Under the condition of meeting the design requirements, try to choose a DSP with a small character width to reduce the cost.

When choosing between fixed-point and floating-point, you can weigh the relationship between word width and development complexity. For example, by combining instructions, a 16-bit word-wide DSP device can also implement a 32-bit word-wide double-precision algorithm (of course, double-precision arithmetic is much slower than single-precision arithmetic). If single-precision can meet most of the calculation requirements, and only a small amount of code requires double-precision, this method is also feasible, but if most of the calculations require high precision, you need to choose a processor with a larger word width.

Please note that the instruction word and data word width of most DSP devices are the same, and there are some differences. For example, the data word of the ADSP-21XX series of ADI (Analog Devices Company) is 16 bits and the instruction The word is 24 bits.

Processing speed

Whether the processor meets the design requirements, the key lies in whether it meets the speed requirements. There are many ways to test the speed of a processor. The most basic is to measure the instruction cycle of the processor, that is, the time required for the processor to execute the fastest instruction. The reciprocal of the instruction cycle is divided by one million, and then multiplied by the number of instructions executed in each cycle. The result is the highest speed of the processor, in units of million instructions per second, MIPS.

However, instruction execution time does not indicate the true performance of the processor. Different processors perform different tasks in a single instruction. Simply comparing instruction execution time cannot fairly distinguish the difference in performance. Some new DSPs use a very long instruction word (VLIW) architecture. In this architecture, multiple instructions can be implemented in a single cycle time, and each instruction implements fewer tasks than traditional DSPs. Therefore, it is relative to VLIW and general-purpose DSP devices. In other words, comparing the size of MIPS can be misleading.

Even compared with traditional DSP MIPS size, there is a certain degree of one-sidedness. For example, some processors allow several bits to be shifted together in a single instruction at the same time, while some DSP instructions can only shift a single bit of data; some DSPs can perform parallel data that has nothing to do with the ALU instruction being executed. Processing (load operands while executing instructions), and some DSPs can only support data parallel processing related to the ALU instructions being executed; some new DSPs allow two MACs to be defined within a single instruction. Therefore, the performance of the processor cannot be accurately obtained by simply comparing the MIPS.

One of the ways to solve the above problems is to use a basic operation (rather than instructions) as a standard to compare processor performance. MAC operation is commonly used, but MAC operation time cannot provide enough information to compare DSP performance differences. In most DSPs, MAC operation is only implemented in a single instruction cycle, and its MAC time is equal to the instruction cycle time, as mentioned above, Some DSPs can handle more tasks in a single MAC cycle than other DSPs. MAC time does not reflect performance such as loop operations, which are used in all applications.

The most common method is to define a set of standard routines and compare the execution speeds on different DSPs. This routine may be the "core" function of an algorithm, such as FIR or IIR filters, etc., or it may be the entire or part of the application (such as a speech encoder). Figure 1 shows the performance of several DSP devices tested using BDTI's tools.

When comparing the speed of DSP processors, pay attention to the advertised MOPS (million operations per second) and MFLOPS (million floating point operations per second) parameters, because different manufacturers have The understanding of "operation" is different, and the meaning of the index is also different. For example, some processors can perform floating-point multiplication and floating-point addition operations at the same time, thus advertised that the MFLOPS of their products is twice that of MIPS.

Secondly, when comparing processor clock rates, the input clock of the DSP may be the same as its instruction rate, or it may be two to four times the instruction rate, and different processors may be different. In addition, many DSPs have clock multipliers or phase-locked loops, and can use external low-frequency clocks to generate on-chip high-frequency clock signals.

Practical application

Voice processing: voice coding, voice synthesis, voice recognition, voice enhancement, voice mail, voice storage, etc.

Image/graphics: 2D and 3D graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic maps, image enhancement, etc.

Military; confidential communication, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and anti-search, etc.

Instruments and meters: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automatic control: control, deep space operations, automatic driving, robot control, disk control, etc.

Medical: hearing aids, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Household appliances: digital audio, digital TV, video phone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized X-ray tomography device. (Among them, Housefield, who invented head CT, British EMI, won the Nobel Prize.)

CAT: Computerized X-ray spatial reconstruction device. Full-body scans, three-dimensional images of heart activity, foreign bodies in brain tumors, and image reconstruction of human torso appear.

ECG analysis.

Storage Management

The performance of DSP is affected by its ability to manage the memory subsystem. As mentioned earlier, MAC and some other signal processing functions are the basic signal processing capabilities of DSP devices. Fast MAC execution capability requires reading one instruction word and two data words from the memory in each instruction cycle. There are many ways to achieve this read, including multi-interface memory (allowing multiple accesses to the memory in each instruction cycle), separate instruction and data memory ("Harvard" structure and its derivatives), and instruction cache (allowing from Cache the read instructions instead of the memory, thereby freeing the memory for data reading). Figures 2 and 3 show the difference between the Harvard memory structure and the "von Norman" structure used by many microcontrollers.

Also pay attention to the size of the supported memory space. The main target market for many fixed-point DSPs is embedded application systems, in which memory is generally small, so this DSP device has small to medium on-chip memory (about 4K to 64K words) and a narrow external data bus. In addition, the address bus of most fixed-point DSPs is less than or equal to 16 bits, so the external memory space is limited.

Some floating-point DSPs have small or no on-chip memory, but the external data bus is wide. For example, TI’s TMS320C30 only has 6K on-chip memory, the external bus is 24 bits, and the 13-bit external address bus. And ADI's ADSP2-21060 has 4Mb on-chip memory, which can be divided into program memory and data memory in many ways.

When choosing a DSP, you need to choose according to the size of the storage space of the specific application and the requirements for the external bus.

Type characteristics

DSP processors and general-purpose processors (GPPs) such as Intel, Pentium or Power

PC are very different, these differences produce The structure and instructions for DSPs are specifically designed and developed for signal processing, and it has the following characteristics.

·Hardware multiplication and accumulation operations (MACs)

In order to effectively complete multiplication and accumulation operations such as signal filtering, the processor must perform effective multiplication operations. GPPs were not originally designed for heavy multiplication operations. The first major technical improvement that distinguished DSPs from earlier GPPs was the addition of specialized hardware capable of single-cycle multiplication operations and explicit MAC instructions.

·Harvard Structure

Traditional GPPs use Feng. Norman memory structure. In this structure, a memory space is connected to the processor core through two buses (an address bus and a data bus). This structure cannot satisfy that the MAC must perform four operations on the memory in one instruction cycle. Requirements for the second visit. DSPs generally use the Harvard structure. In the Harvard structure, there are two storage spaces: program storage space and data storage space. The processor core is connected to these storage spaces through two sets of buses, allowing two simultaneous accesses to the memory. This arrangement doubles the bandwidth of the processor. In the Harvard structure, sometimes by adding a second data storage space and bus to achieve greater storage bandwidth. Modern high-performance GPPs usually have two on-chip cache memories, one for data and one for instructions. From a theoretical point of view, this dual on-chip cache and bus connection is equivalent to the Harvard structure, but GPPs use control logic to determine which data and instruction words reside in the on-chip cache. This process is usually not for programmers. As you can see, in DSPs, programmers can clearly control which data and instructions are stored in on-chip storage units or caches.

Zero-consumption loop control

The common feature of DSP algorithms: most of the processing time is spent executing a small number of instructions contained in a relatively small loop. Therefore, most DSP processors have dedicated hardware for zero-consumption cycle control. Zero-consumption cycle refers to a cycle in which the processor can execute a set of instructions without spending time testing the value of the cycle counter, and the hardware completes the cycle jump and the attenuation of the cycle counter. Some DSPs also implement high-speed single-instruction loops through a one-instruction cache.

·Special addressing mode

DSPs often contain special address generators, which can generate special addressing required by signal processing algorithms, such as loops Addressing and bit flip addressing. Cyclic addressing corresponds to the pipeline FIR filter algorithm, and bit flip addressing corresponds to the FFT algorithm.

·Predictability of execution time

Most DSP applications have hard real-time requirements. In each case, all processing work must be specified Completed in time. This real-time limitation requires the programmer to determine how much time each sample will take, or at least how much time will be used in the worst case. The process of DSPs executing the program is transparent to the programmer, so it is easy to predict the execution time of each task. However, for high-performance GPPs, due to the use of a large amount of ultra-high-speed data and program cache, and the dynamic allocation of programs, the prediction of execution time becomes complicated and difficult.

·Abundant peripherals

DSPs have peripherals such as DMA, serial ports, Link ports, and timers.

Evaluation criteria

Performance classification

The performance of DSP processors can be divided into three grades: low-cost, low-performance DSPs, and low-energy mid-range DSPs And diverse high-end DSPs. Low-cost performance low-end DSPs are the most widely used processors in the industry. Products in this range include: ADSP-21xx, TMS320C2xx, DSP560xx and other series, their operating speed is generally 20-50MIPS, and while maintaining appropriate energy consumption and storage capacity, while providing high-quality DSP performance. Moderately priced DSP processors, through increased clock frequency, combined with more complex hardware to improve performance, formed the mid-range products of DSPs, such as DSP16xx, TMS320C54x series, their operating speed is 100 ~ 150MIPS, usually used in wireless telecommunication In equipment and high-speed demodulators, relatively high processing speed and low energy consumption are required. As high-end DSPs are driven by the demand for ultra-high-speed processing, their structure has really begun to be classified and diversified. The relevant structure is detailed in the next section. The main frequency of high-end DSPs is above 150MHz, and the processing speed is above 1000MIPS, such as TI's TMS320C6X series, ADI's Tiger SHARC, etc.

Evaluation indicators

There are many indicators for evaluating processor performance, the most commonly used is speed, but energy consumption and memory capacity indicators are also very important, especially in embedded system applications. In view of the increasing number of DSPs, it becomes more difficult for system designers to select the processor that can provide the best performance on a given application device. In the past, DSP system designers relied on MIPS or similar metrics to get a rough idea of ​​the relative performance provided by different chips. Unfortunately, with the diversification of processor technology, traditional measures like MIPS are becoming less and more inaccurate, because MIPS does not actually measure performance. Since one of the characteristics of DSP applications is that most of the processing work is concentrated in a part of the program (core program), it is possible to test and evaluate the DSP processor with reference programs related to signal processing. BDTI has completed a set of core standards and registered a new type of hybrid speed measurement: BDTI score.

Introduction to the structure

Overview

In the past two years, the higher performance of the DSP processor cannot be solved from the traditional structure, so various improvements have been proposed Performance strategy. Increasing the clock frequency seems to be limited, and the best way is to increase the parallelism. Increasing the parallelism of operations can be achieved in two ways: increasing the number of operations executed by each instruction, or increasing the number of instructions executed in each instruction cycle. These two parallel requirements have produced a variety of new structures for DSPs.

Enhanced

DSP

Previously, DSP processors used complex and mixed instruction sets, allowing programmers to Multiple operations are coded in one instruction. Traditionally, DSP processors only issue and execute one instruction in one instruction cycle. This single-stream, complex instruction method enables the DSP processor to obtain very powerful performance without the need for a large amount of memory.

While keeping the DSP structure and the above-mentioned instruction set unchanged, one way to increase the workload of each instruction is to use additional execution units and increase data paths. For example, some high-end DSPs have two multipliers instead of one. We call the DSPs that use this method 撛銮啃畃覫榫覫諛, because their structure is similar to the previous generation DSP, but the performance is greatly enhanced by adding execution units. Of course, the instruction set must also be enhanced at the same time, so that the programmer can specify more parallel operations in one instruction to take advantage of additional hardware. Examples of enhanced DSPs are Lucent's DSP16000 and ADI's ADSP2116x. The advantage of enhanced DSPs is that they are compatible and have similar cost and power consumption to earlier DSPs. The disadvantage is that the structure is complex, the instructions are complex, and further development is limited.

VLIW

Structure

As mentioned earlier, traditional DSP processors use complex mixed instructions, and Only one instruction flows and executes in the instruction loop. However, recently some DSPs adopt a more RISC-based instruction set, and execute multiple instructions in one instruction cycle, using a large unified register file. For example, Siemems' Carmel, Philips' TriMedia, and TI's TMS320C62XX processor family all use a very long instruction word (VLIW) structure. The C62xx processor fetches a 256-bit instruction packet each time, parses the packet into 8 32-bit instructions, and then directs them to its 8 independent execution units. In the best case, C62xx executes 8 instructions at the same time. In this case, it reaches a very high MIPS rate (such as 1600MIPS). The advantages of the VLIW structure are high performance, regular structure (potentially easy to program and good target compilation system).缺点是高功耗、代码膨胀-需要宽的程序存储器、新的编程/编译困难(需跟踪指令安排,易破坏流水线使性能下降)。

超标量体

超标量体系结构

象VLIW处理器一样,超标量体系结构并行地流出和执行多个指令。但跟VLIW处理器不同的是,超标量体系结构不清楚指定需要并行处理的指令,而是使用动态指令规划,根据处理器可用的资源,数据依赖性和其他的因素来决定哪些指令要被同时执行。超标量体系结构已经长期用于高性能的通用处理器中,如Pentium和PowerPC。最近,ZSP公司开发出第一个商业的超标量体系结构的DSP

ZSP164xx。超标量结构的优点是性能有大的跨越、结构规整、代码宽度没有明显增长。缺点是非常高的功耗、指令的动态安排使代码优化困难。

SIMD

结构

单指令多数据流(SIMD)处理器把输入的长的数据分解为多个较短的数据,然后由单指令并行地操作,从而提高处理海量、可分解数据的能力。该技术能大幅度地提高在多媒体和信号处理中大量使用的一些矢量操作的计算速度,如坐标变换和旋转。

通用处理器SIMD增强的两个例子是Pentium的MMX扩展和PowerPC族的AltiVec扩展。 simd在一些高性能的DSP处理器中也有应用。例如,DSP16000在其数据路中支持有限的SIMD风格的操作,而Analog

Devices最近推出了有名的SHARC的新一代DSP处理器,进行了SIMD能力的扩展。 SIMD结构由于使总线、数据通道等资源充分使用,并无需改变信号处理(含图象、语音)算法的基本结构,因此SIMD结构使用越来越普遍。 SIMD结构遇到的问题是算法、数据结构必须满足数据并行处理的要求,为了加速,循环常常需要被拆开,处理数据需要重新安排调整。通常SIMD仅支持定点运算。

混合结构

DSP/微控制器的混合结构

许多的应用需要以控制为主的软件和DSP软件的混合。一个明显的例子是数字蜂窝电话,因为其中有监控和语音处理的工作。一般地,微处理器在控制上能提供良好的性能而在DSP性能上则很糟,专用的DSP处理器的特性则刚好相反。因此,最近有一些微处理器产商开始提供DSP增强版本的微处理器。用单处理器完成两种软件的任务是很有吸引力的,因为其可以潜在地提供简化设计,节省版面空间,降低总功耗,降低系统成本等。 DSP和微处理器结合的方法有:

·在一个结上集成多种处理器,如MotorolaDSP5665x

·DSP作为协处理器,如ARMPiccolo

·DSP核移值到已有的位处理器,如SH-DSP

·微控制器与已有的DSP集成在一起,如TMS320C27xx

·全部新的设计,如TriCore

随着对DSP能力需求的提高,DSP处理器结构正在进行新的和革新的设计,DSP、MCU、CPU的结构优点相互借用。

发展趋势

综述

DSP处理器发展的趋势是结构多样化,集成单片化用户化,开发工具更完善,评价体系更全面更专业。

趋势

VLIW结构、超标量体系结构和DSP/MCU混合处理器是DSPs结构发展的新潮流。 VLIW和超标量结构能够获得很高的处理性能。 DSP/MCU混合可以简化应用系统设计,降低体积和成本。高性能通用处理器(GPPs)借用了DSPs的许多结构优点,其浮点处理速度比高档DSPs还要快。高性能GPPs一般时钟频率为200~500MHz,具有超标量、SIMD结构,单周期乘法操作,好的存储器带宽,转移预测功能,因此GPPs正在涉足DSP领域。但由于GPPs缺乏实时可预测性,优化DSP代码困难,有限的DSP工具支持,高功耗等问题,因此GPPs在DSP中的应用还有限。但瞄准嵌入系统应用的高性能GPPs与DSPs进行混合,形成专用的嵌入GPPs,如Hitachi的SH-DSP,ARM的Piccolo,Siemens的TriCore。嵌入GPPs保留原有的高性能,并加强DSP实时预测、控制等方面的能力,与专用DSP处理器形成了对照。

在DSPs综合集成方面,处理器核和快速用户可定制能力是重要的。预计将出现和流行:用户可定制DSPs,块组建DSPs,可编程整数DSPs,DSPs化现场可编程门阵列(FPGAs),更专用化的DSPs,多媒体DSPs等。更令人鼓舞的是未来DSP处理器将集成DSP处理器核,微控制器,存储器RAM和ROM,串行口,模数转换器,数模转换器,用户定义数字电路,用户定义模拟电路等,因此DSP处理系统一般将不再是若干印制板(如信号调理板,A/D板,D/A板,接口定时板等)组成的大系统。

由于DSPS结构的多样化,DSPS性能测试将变得更加困难,MIPS、MOPS、MFLOPS、BOPS等指标将越来越不能准确反映DSPS的性能,因此需要更细更专业化的测试评价标准。对具体应用来说,某些单项功能测试结果,可能显得更重要。

随着DSPs性能的提高,开发工具可能比处理器结构将更重要,因为只有有效的开发工具,才能使处理器得到普遍使用,并使性能充分发挥。片上Debug是实时调试的最好手段,它将采用与JTAG兼容的Debug口。 C编译器的效率仍然是重点,如何方便容易地进行有效代码开发是关键。指令软件仿真器显得更重要,更精确的指令软件仿真器将得到开发。多类型DSP调试开发工具将混合集成在一起。 DSPs开发工具将是一个充满机遇和挑战的领域。

DSP处理器存在两种发展趋势:一是DSP应用越来越多,如手机和便携式音频播放器等。 DSP将集成更多功能,如A/D转换、LCD控制器等,系统成本和器件数将会大为降低。另一个趋势是将DSP作为IP出售,如亿恒科技公司的Camel和TriCore内核。随着EDA工具的不断成熟,系统设计工程师将更容易地修改DSP内核,加入用户专用外围电路以实现更专业化、更低成本的DSP解决方案。

80年代还属于少数人研究的数字信号处理(DSP),进入90年代以来,已逐渐成为人们最常用的工程术语之一。处理器应用广泛的原因在于,处理器的制造技术发展得极为先进,使处理器的成本下降到这一水平:它可用在消费品和其它对成本敏感的系统中;处理器的处理速度上升到这一水平:它可满足大部分高速实时信号处理的需求。在产品中越来越多地使用DSP处理器,加剧了对更快、更便宜、更节省能量的DSP处理器的开发和迅速发展。

DSP处理器(DSPs)的品种越来越繁多,除了大家熟知的四大DSPs产商:TexasInstruments(德州仪器)公司、Lucent Technologies(朗讯技术)公司、Analog

Devies(模拟设备)公司和Motorola(摩托罗拉)公司,大约还有80家DSPs产商。它们生产的DSPs主要用于特殊功能的设备,如调制解调器、MPEG译码器、硬盘驱动器等。 DSP处理器可分为两大类:定点DSPs和浮点DSPs。定点DSPs发展迅速,品种最多,处理速度为20~2000MIPS。浮点DSPs基本由TI和AD公司垄断,处理速度40~1000MFLOPS。 DSPs的性能已形成低、中、高三档,高端产品处理器结构发生了深刻的变化,形成了多样化的趋势。

选购指南

DSP处理器的应用领域很广,但实际上没有一个处理器能完全满足所有的或绝大多数应用需要,设计工程师在选择处理器时需要根据性能、成本、集成度、开发的难易程度以及功耗等因素进行综合考虑。

DSP器件按设计要求可以分为两类。第一类,应用领域为廉价的、大规模嵌入式应用系统,如手机、磁盘驱动(DSP用作伺服电机控制)以及便携式数字音频播放器等。在这些应用中价格和集成度是最重要的考虑因素。对于便携式电池供电的设备,功耗也是一个关键的因素。尽管这些应用常常需要开发运行于DSP的客户应用软件和外围支持硬件,但易于开发的要求仍然是次要的因素,因为批量生产可以分摊开发成本,从而降低单位产品的开发成本。

另外一类是需要用复杂算法对大量数据进行处理的应用,例如声纳探测和地震探测等,也需要用DSP器件。该类设备的批量一般较小、算法要求苛刻、产品很大而且很复杂。所以设计工程师在选择处理器时会尽量选择性能最佳、易于开发并支持多处理器的DSP器件。有时,设计工程师更喜欢选用现成的开发板来开发系统而不是从零开始硬件和软件设计,同时可以采用现成的功能库文件开发应用软件。

在实际设计时应根据具体的应用选择合适的DSP。不同的DSP有不同的特点,适用于不同的应用,在选择时可以遵循以下要点。

DSP开发的简便性

对不同的应用来说,对开发简便性的要求不一样。对于研究和样机的开发,一般要求系统工具能便于开发。而如果公司在开发下一代手机产品,成本是最重要的因素,只要能降低最终产品的成本,一般他们愿意承受很烦琐的开发,采用复杂的开发工具(当然如果大大延迟了产品上市的时间则是另一回事)。

因此选择DSP时需要考虑的因素有软件开发工具(包括汇编、链接、仿真、调试、编译、代码库以及实时操作系统等部分)、硬件工具(开发板和仿真机)和高级工具(例如基于框图的代码生成环境)。

选择DSP器件时常有如何实现编程的问题。一般设计工程师选择汇编语言或高级语言(如C或Ada),或两者相结合的办法。大部分的DSP程序采用汇编语言,由于编译器产生的汇编代码一般未经最优化,需要手动进行程序优化,降低程序代码大小和使流程更合理,进一步加快程序的执行速度。这样的工作对于消费类电子产品很有意义,因为通过代码的优化能弥补DSP性能的不足。

使用高级语言编译器的设计工程师会发现,浮点DSP编译器的执行效果比定点DSP好,这有几个原因:首先,多数的高级语言本身并不支持小数算法;其次,浮点处理器一般比定点处理器具有更规则的指令,指令限制少,更适合编译器处理;第三,由于浮点处理器支持更大的存储器,能提供足够的空间。编译器产生的代码一般比手动生成的代码更大。

不管是用高级语言还是汇编语言实现编程,都必须注意调试和硬件仿真工具的使用,因为很大一部分的开发时间会花在这里。几乎所有的生产商都提供指令集仿真器,在硬件完成之前,采用指令集仿真器对软件调试很有帮助。如果所用的是高级语言,对高级语言调试器功能进行评估很重要,包括能否与模拟机和/或硬件仿真器一起运行等性能。

大多数DSP销售商提供硬件仿真工具,许多处理器具有片上调试/仿真功能,通过采用IEEE1149.1JTAG标准的串行接口访问。该串行接口允许基于扫描的仿真,即程序员通过该接口加载断点,然后通过扫描处理器内部寄存器来查看处理器到达断点后寄存器的内容并进行修改。

很多的生产商都可以提供现成的DSP开发系统板。在硬件没有开发完成之前可用开发板实现软件实时运行调试,这样可以提高最终产品的可制造性。对于一些小批量系统甚至可以用开发板作为最终产品电路板。

支持多处理器

在某些数据计算量很大的应用中,经常要求使用多个DSP处理器。在这种情况下,多处理器互连和互连性能(关于相互间通信流量、开销和时间延迟)成为重要的考虑因素。如ADI的ADSP-2106X系列提供了简化多处理器系统设计的专用硬件。

电源管理和功耗

DSP器件越来越多地应用在便携式产品中,在这些应用中功耗是一个重要的考虑因素,因而DSP生产商尽量在产品内部加入电源管理并降低工作电压以减小系统的功耗。在某些DSP器件中的电源管理功能包括:

a.降低工作电压:许多生产商提供低电压DSP版本(3.3V,2.5V,或1.8V),这种处理器在相同的时钟下功耗远远低于5V供电的同类产品。

b.“休眠”或“空闲”模式:绝大多数处理器具有关断处理器部分时钟的功能,降低功耗。在某些情况下,非屏蔽的中断信号可以将处理器从“休眠”模式下恢复,而在另外一些情况下,只有设定的几个外部中断才能唤醒处理器。有些处理器可以提供不同省电功能和时延的多个“休眠”模式。

c.可编程时钟分频器:某些DSP允许在软件控制下改变处理器时钟,以便在某个特定任务时使用最低时钟频率来降低功耗。

d.外围控制:一些DSP器件允许程序停止系统未用到的外围电路的工作。

不管电源管理特性怎么样,设计工程师要获得优秀的省电设计很困难,因为DSP的功耗随所执行的指令不同而不同。多数生产商所提供的功耗指标为典型值或最大值,而TI公司给出的指标是一个例外,该公司的应用实例中详细地说明了在执行不同指令和不同配置下的功耗。

成本因素

在满足设计要求条件下要尽量使用低成本DSP,即使这种DSP编程难度很大而且灵活性差。在处理器系列中,越便宜的处理器功能越少,片上存储器也越小,性能也比价格高的处理器差。

封装不同的DSP器件价格也存在差别。例如,PQFP和TQFP封装比PGA封装便宜得多。

在考虑到成本时要切记两点。首先,处理器的价格在持续下跌;第二点,价格还依赖于批量,如10,000片的单价可能会比1,000片的单价便宜很多。

国内外现状

数字信号处理(Digital Signal Processing,简称DSP)是一门涉及许多学科而又广泛应用于许多领域的新兴学科。数字信号处理是围绕着数字信号处理的理论、实现和应用等几个方面发展起来的。数字信号处理在理论上的发展推动了数字信号处理应用的发展。反过来,数字信号处理的应用又促进了数字信号处理理论的提高。而数字信号处理的实现则是理论和应用之间的桥梁。数字信号处理是以众多的学科为理论基础的,它所涉及的范围及其广泛。例如,在数学领域,微积分、概率统计、随机过程、数值分析等都是数字信号处理的基本工具,与网络理论、信号与系统、控制论、通信理论、故障诊断等也密切相关。一些新兴的学科,如人工智能、模式识别、神经网络等,都与数字信号处理密不可分。可以说,数字信号处理是把许多经典的理论体系作为自己的理论基础,同时又使自己成为一系列新兴学科的理论基础。 

长期以来,信号处理技术—直用于转换或产生模拟或数字信号。其中应用得最频繁的领域就是信号的滤波。此外,从数字通信、语音、音频和生物医学信号处理到检测仪器仪表和机器人技术等许多领域中,都广泛地应用了数字信号处理技术。数字信号处理己经发展成为一项成熟的技术,并且在许多应用领域逐步代替了传统的模拟信号处理系统。

世界上三大DSP芯片生产商有德克萨斯仪器公司、模拟器件公司和摩托罗拉公司。

数字信号处理的书籍很多,其中以麻省理工学院奥本海姆编著的《Discrete Time Signa Processing》最为经典 ,有中译本《离散时间信号处理》由西安交通大学出版。

This article is from the network, does not represent the position of this station. Please indicate the origin of reprint
TOP