DSP-C代码优化总结1
(2014-04-23 09:45:03)分类: DSP |
代码优化总结
一般而言,这个最耗时的程序一般位于一个循环体内,而且这个循环体的循环次数非常之多。
下面是具体的一些优化措施:
1.
2.
3.
4.
5.
6.
7.
以上的这些只是在程序优化时的一些参考,其实程序的优化总体来说分成算法上的优化和程序结构上的优化,如果想要实现很高的优化效率的话,应该从算法上入手。但是程序结构上的优化也是必不可少的,它能够使我们充分发掘CPU的性能。
DSP上C代码的优化
一: 代码优化
A:数据类型优化
1)
Use unsigned int instead of int if it is known that the value will never be negative.
Integer arithmetic is much faster than floating-point arithmetic
some processors can handle unsigned integer arithmetic considerably
faster than signed so, the best declaration for an int variable in
a tight loop would be:
2)
In standard processors, depending on numerator (x) and denominator (y), a 32 bit division takes 20-140 cycles to execute.
It will be better to use unsigned division by ensuring that one of the operands is unsigned, as this is faster than signed division.
3) 全局变量
Global variables are never allocated to registers. Hence, the compiler cannot cache the value of a global variable in a register, resulting in
extra loads and stores when global are used. So, avoid using global variables inside critical loops.
4)
如果函数的参数有指针参数,并且这个指针的值在函数被调用期间被使用很多次,而且每次使用并不会改变指针参数所指向的变量的值,那么在函数内部将这个指针参数的值赋值给一个局部变量。这样能减少函数每次去读取指针参数所指向的值的时间。
5)
B:指针优化
1)
For example, a common code sequence is:
typedef struct { int x, y, z; } Point3;
typedef struct { Point3 *pos, *direction; } Object;
void InitPos1(Object *p) {
This code must reload p->pos for each assignment, because the
compiler does not know that p->pos->x is not an alias for
p->pos. A better approach would cache p->pos in a local
variable:
void InitPos2(Object *p) {
}
C: 循环
1)
2)
3)
4)
D: 函数设计
保持函数的小而且简单是很有好处的。这样可以使得编译器采取一些更加好的优化措施,例如分配寄存器等,这将会更加有效率。
1)
函数调用的开销是比较小的,相对于被调用函数的执行时间来说,函数调用的开销所占的比例很小。但是对于函数的参数想要以寄存器的形式传递的话,那么所能传递的参数是有一些上限的。通常能以寄存器形式传递的函数参数都要是整数类型(或者类似整数类型的,char, shorts, ints and floats all take one word)或者是占用内存位4个整型大小的结构体变量。假如参数个数的上限是4的话,那么第五个参数以及后来的其他的参数都将会存储在栈上,这样在调用这样的幻术的时候就会要去栈上读取参数值,耗费了时间,增加了函数调用的时间。
2)
最小化函数参数传递耗时,采取的措施是确保函数小而且参数少于4个,这样可以保证函数不会使用栈来传递参数。
@1 如果一个函数需要的参数个数多于4个的话,那么最好是这个函数所处理的代码很多,这样才能使得栈参数传递是值得的;
@2 用结构体指针作为函数参数,来代替结构体本身作为函数参数来传递;
@3 将同一类的参数放在一个结构体里面,用结构体指针来传递这些参数,这样就可以减少函数参数的个数,而且可以使函数更加的稳健。
3)
Functions with the keyword __inline results in each call to an inline function being substituted by its body, instead of a normal call. This results in faster code, but it adversely affects code size, particularly if the inline function is large and used often.
E: 要点小结
i) Avoid using ++ and -- etc. within loop expressions. E.g.:
while (n--) {}, as this can sometimes be harder to
optimize.
ii) Minimize the use of global variables.
iii) Declare anything within a file
(external to functions) as static, unless it is intended to
be global.
iv)Use word-size variables if you can, as the machine can work with these better instead of char, short, double, bit fields etc.
v) Don’t use recursion. Recursion can be very elegant and neat, but creates many more function calls which can be a large overhead.--really?
vi) Avoid the sqrt () square root function in loops -
calculating square roots is very CPU
intensive.
vii) Single dimension arrays are faster
than multi-dimension arrays.
viii) Compilers can often optimize a whole file – avoid splitting off closely related functions into separate files; the compiler will do better if it can see both of them together.
ix) Single precision math may be faster than double precision - there is often a compiler switch for this.
x)Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0.
xi) Addition is quicker than multiplication - use val + val + val instead of val * 3. puts () is quicker than printf (), although less flexible.
xii)Use #defined macros instead of commonly used tiny functions –
sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls,and allow the compiler to be more aggressive in its optimization.
xiii) Turn compiler optimization on! The compiler will be able to optimize at a much lower level than can be done in the source code, and perform optimizations specific to the target processor.
二:编译器优化
For the Compiler Optimization the compilers will generate
machine instructions based on the C code written, and then make
multiple passes through the code to look for improvements. The
compiler will update the machine code with improvements and
continue to analyze the code until no further enhancements can be
made. The compiler tools can perform many optimizations that
improve the execution speed and reduce the size of C programs by
performing tasks such as simplifying loops, software pipelining,
rearranging statements and expressions, and allocating variables
into registers.