加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

DSP-C代码优化总结1

(2014-04-23 09:45:03)
分类: DSP

(转自http://wenku.baidu.com/link?url=L1IPCJpDt7FLJENvDnOL7h1suKbpCeohNnR_jIHVSRk-qbfZSizhPBYeghj1lYqAa3LLgY65T_a0soTcKGObK6Q_LYmSBsefj5EEcG4XIdC)

代码优化总结

      在优化一段代码之前,首先要找出这段程序中最消耗时间的那一段程序,对最耗时程序段的优化会带来很大的优化效果收益。

一般而言,这个最耗时的程序一般位于一个循环体内,而且这个循环体的循环次数非常之多。

下面是具体的一些优化措施:

1.     在循环体内最好不要有条件判断句,尤其是在内层循环。因为内层循环太多的条件判断句会打断程序执行的流水线

2.     while和for循环的选取。如果知道循环次数的话,选用for语句

3.     在循环体内,尽量不要使用系统函数调用,因为系统调用会消耗太多的时间。

4.     如果函数条用的代码量很小,而且使用的次数又非常的多得话,将这一段函数改成内联函数,这样可以减少函数调用的开销。

5.     如果有几层循环的话,将循环次数多得放在内层循环里。

6.     用移位运算来代替乘法运算。

7.     在循环体内最好不要用数学函数。尤其是power、除法操作、模数操作是非常耗时的运算。应该用其他的方法来实现这些操作。

 

以上的这些只是在程序优化时的一些参考,其实程序的优化总体来说分成算法上的优化和程序结构上的优化,如果想要实现很高的优化效率的话,应该从算法上入手。但是程序结构上的优化也是必不可少的,它能够使我们充分发掘CPU的性能。

DSPC代码的优化

一: 代码优化

A:数据类型优化

1)          整数

Use unsigned int instead of int if it is known that the value will never be negative.

Integer arithmetic is much faster than floating-point arithmetic some processors can handle unsigned integer arithmetic considerably faster than signed so, the best declaration for an int variable in a tight loop would be:  unsigned int variable_name.

2)          除法和取余运算

In standard processors, depending on numerator (x) and denominator (y), a 32 bit division takes 20-140 cycles to execute.

It will be better to use unsigned division by ensuring that one of the operands is unsigned, as this is faster than signed division.

3) 全局变量

Global variables are never allocated to registers. Hence, the compiler cannot cache the value of a global variable in a register, resulting in

extra loads and stores when global are used. So, avoid using global variables inside critical loops.

  If a function uses global variables heavily, it is beneficial to copy those global variables into local variables so that they can be assigned to registers. This is possible only if those global variables are not used by any of the functions which are called. Then it will take less amount of time to execute.

4)          函数的指针参数

如果函数的参数有指针参数,并且这个指针的值在函数被调用期间被使用很多次,而且每次使用并不会改变指针参数所指向的变量的值,那么在函数内部将这个指针参数的值赋值给一个局部变量。这样能减少函数每次去读取指针参数所指向的值的时间。

5)          局部变量

  Where possible, it is best to avoid using char and short as local variables. For the types char and short, the compiler needs to reduce the size of the local variable to 8 or 16 bits after each assignment. This is called sign-extending for signed variables and zero extending for unsigned variables. It is implemented by shifting the register left by 24 or 16 bits, followed by a signed or unsigned shift right by the same amount, taking two instructions (zero-extension of an unsigned char takes one instruction).

  These shifts can be avoided by using int and unsigned int for local variables. This is particularly important for calculations which first load data into local variables and then process the data inside the local variables. Even if data is input and output as 8- or 16-bit quantities, it is worth considering processing them as 32-bit quantities.

B:指针优化

1)          指针链

  Pointer chains are frequently used to access information in structures. So, pointer should be modified accordingly to make the code faster. 

For example, a common code sequence is:

typedef struct { int x, y, z; } Point3;

typedef struct { Point3 *pos, *direction; } Object;

void InitPos1(Object *p) {

                                  p->pos->x = 0;

                                  p->pos->y = 0;

                                  p->pos->z = 0;                 

                                      }

This code must reload p->pos for each assignment, because the compiler does not know that p->pos->x is not an alias for p->pos. A better approach would cache p->pos in a local variable: 

void InitPos2(Object *p) { 

                                  Point3 *pos = p->pos;

                                  pos->x = 0; 

                                  pos->y = 0;

                                  pos->z = 0;

}

 

C: 循环

1)          循环结束条件

  The loop termination condition can cause significant overhead if written without caution. So, always write count-down-to-zero loops and use simple termination conditions. The execution will take less time if the termination conditions are simple.

2)          循环阻塞

  Never use two loops where one will suffice. But if there is a lot of work in the loop, it might not fit into your processor's instruction cache. In this case, two separate loops may actually be faster as each one can run completely in the cache.

3)          函数调用

  Functions always have a certain performance overhead when they are called. Not only does the program pointer have to change, but in-use variables have to be pushed onto a stack, and new variables allocated. Care must be taken though to maintain the readability of the program whilst keeping the size of the program manageable. If a function is often called from within a loop, it may be possible to put that loop inside the function to cut down the overhead of calling the function repeatedly.

4)          循环展开

  Small loops can be unrolled for higher performance, with the disadvantage of increased code size. When a loop is unrolled, a loop counter needs to be updated less often and fewer branches are executed. If the loop iterates only a few times, it can be fully unrolled, so that the loop overhead completely disappears.

D: 函数设计

保持函数的小而且简单是很有好处的。这样可以使得编译器采取一些更加好的优化措施,例如分配寄存器等,这将会更加有效率。

1)          函数调用的开销

函数调用的开销是比较小的,相对于被调用函数的执行时间来说,函数调用的开销所占的比例很小。但是对于函数的参数想要以寄存器的形式传递的话,那么所能传递的参数是有一些上限的。通常能以寄存器形式传递的函数参数都要是整数类型(或者类似整数类型的,char, shorts, ints and floats all take one word)或者是占用内存位4个整型大小的结构体变量。假如参数个数的上限是4的话,那么第五个参数以及后来的其他的参数都将会存储在栈上,这样在调用这样的幻术的时候就会要去栈上读取参数值,耗费了时间,增加了函数调用的时间。

2)          最小化函数参数传递耗时

最小化函数参数传递耗时,采取的措施是确保函数小而且参数少于4个,这样可以保证函数不会使用栈来传递参数。

@1 如果一个函数需要的参数个数多于4个的话,那么最好是这个函数所处理的代码很多,这样才能使得栈参数传递是值得的;

@2 用结构体指针作为函数参数,来代替结构体本身作为函数参数来传递;

@3 将同一类的参数放在一个结构体里面,用结构体指针来传递这些参数,这样就可以减少函数参数的个数,而且可以使函数更加的稳健。

3)          使用内联函数

Functions with the keyword __inline results in each call to an inline function being substituted by its body, instead of a normal call. This results in faster code, but it adversely affects code size, particularly if the inline function is large and used often.

  There are several advantages to using inline functions. It will provide no function calls overhead and lower argument evaluation overhead. The biggest disadvantage of inline functions is that the code sizes increase if the function is used in many places. This can vary significantly depending on the size of the function, and the number of places where it is used. It is wise to only inline a few critical functions. Note that when done wisely, inlining may decrease the size  of the code: a call takes usually a few instructions, but the optimized version of the inlined code might translate to even less instructions.

 

E: 要点小结

i) Avoid using ++ and -- etc. within loop expressions. E.g.: while (n--) {}, as this can sometimes be harder to optimize. 

ii) Minimize the use of global variables. 

iii) Declare anything within a file (external to functions) as static, unless it is intended to be global. 

iv)Use word-size variables if you can, as the machine can work with these better instead of char, short, double, bit fields etc.

v) Don’t use recursion. Recursion can be very elegant and neat, but creates many more function calls which can be a large overhead.--really?

vi) Avoid the sqrt () square root function in loops - calculating square roots is very CPU intensive. 

vii) Single dimension arrays are faster than multi-dimension arrays

viii) Compilers can often optimize a whole file – avoid splitting off closely related functions into separate files; the compiler will do better if it can see both of them together.

ix) Single precision math may be faster than double precision - there is often a compiler switch for this.

x)Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0.

xi) Addition is quicker than multiplication - use val + val + val instead of val * 3. puts () is quicker than printf (), although less flexible.

xii)Use #defined macros instead of commonly used tiny functions

sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls,and allow the compiler to be more aggressive in its optimization.

xiii) Turn compiler optimization on! The compiler will be able to optimize at a much lower level than can be done in the source code, and perform optimizations specific to the target processor.

 

二:编译器优化

For the Compiler Optimization the compilers will generate machine instructions based on the C code written, and then make multiple passes through the code to look for improvements. The compiler will update the machine code with improvements and continue to analyze the code until no further enhancements can be made. The compiler tools can perform many optimizations that improve the execution speed and reduce the size of C programs by performing tasks such as simplifying loops, software pipelining, rearranging statements and expressions, and allocating variables into registers.     

0

阅读 收藏 喜欢 打印举报/Report
前一篇:Restrict
后一篇:container_of宏
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有