DSC参考代码分析_wuqiseu

http://blog.sina.com.cn/u/2160998997

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

DSC参考代码分析

(2013-12-09 13:53:24)

分类： computerVision

encoder中的代码：

输入命令行options:

static cmdarg_t cmd_args[] = {

// The array arguments have to be first:

{ IVARG, NULL, "RC_OFFSET", "-rcofs", 0, 15}, // RC offset values

{ IVARG, NULL, "RC_MINQP", "-rcmqp", 0, 15}, // Min QP values

{ IVARG, NULL, "RC_MAXQP", "-rcmxqp", 0, 15}, // Max QP values

{ IVARG, NULL, "RC_BUF_THRESH", "-rcbt", 0, 14}, // RC buffer threshold

{ PARG, &rcModelSize, "RC_MODEL_SIZE", "-rms", 0, 0}, // RC model size

{ FARG, &bitsPerPixel, "BITS_PER_PIXEL", "-bpp", 0, 0}, // bits per pixel

{ PARG, &bitsPerComponent, "BITS_PER_COMPONENT", "-bpc", 0, 0}, // bits per component

{ PARG, &enable422, "ENABLE_422", "-e422", 0, 0}, // enable_422

{ PARG, &bpEnable, "BLOCK_PRED_ENABLE", "-bpe", 0, 0}, // Block prediction range

{ PARG, &lineBufferBpc, "LINE_BUFFER_BPC", "-lbpc", 0, 0}, // Line buffer storage bits/component

{ PARG, &sliceWidth, "SLICE_WIDTH", "-nsh", 0, 0}, // Slice width (0=pic width)

{ PARG, &sliceHeight, "SLICE_HEIGHT", "-nsv", 0, 0}, // slice height (0=pic height)

{ PARG, &firstLineBpgOfs, "FIRST_LINE_BPG_OFFSET", "-flbo", 0, 0}, // Additional bpp budget for 1st line

{ PARG, &initialFullnessOfs, "INITIAL_FULLNESS_OFFSET", "-ifo", 0, 0}, // Initial fullness offset

{ PARG, &initialDelay, "INITIAL_DELAY", "-id", 0, 0}, // Initial delay (in pixel time units) from encode start to xmit start

{ PARG, &useYuvInput, "USE_YUV_INPUT", "-uyi", 0, 0}, // Use YUV input (convert if necessary)

{ PARG, &rbSwap, "SWAP_R_AND_B", "-rbswp", 0, 0}, // Swap red & blue components

{ PARG, &rbSwapOut, "SWAP_R_AND_B_OUT", "-rbswpo", 0, 0}, // Swap red & blue components

{ PARG, &function, "FUNCTION", "-do", 0, 0}, // 0=encode/decode, 1=encode, 2=decode

{ IARG, &dpxBugsOverride, "DPX_BUGS_OVERRIDE", "-dpxbugs", 0, 0}, // Sets the DPX bugs mode (else autodetect)

{ PARG, &dpxPadLineEnds, "DPX_PAD_LINE_ENDS", "-dpxpad", 0, 0}, // Pad line ends for DPX output

{ PARG, &dpxWriteBSwap, "DPX_WRITE_BSWAP", "-dpxwbs", 0, 0}, // Pad line ends for DPX output

{ PARG, &enableVbr, "VBR_ENABLE", "-vbr", 0, 0}, // 1=disable stuffing bits (on/off VBR)

{ PARG, &muxWordSize, "MUX_WORD_SIZE", "-mws", 0, 0}, // mux word size if SSM enabled

{ PARG, &tgtOffsetHi, "RC_TGT_OFFSET_HI", "-thi", 0, 0}, // Target hi

{ PARG, &tgtOffsetLo, "RC_TGT_OFFSET_LO", "-tlo", 0, 0}, // Target lo

{ PARG, &rcEdgeFactor, "RC_EDGE_FACTOR", "-ef", 0, 0}, // Edge factor

{ PARG, &quantIncrLimit0, "RC_QUANT_INCR_LIMIT0", "-qli0", 0, 0}, // Quant limit incr 0

{ PARG, &quantIncrLimit1, "RC_QUANT_INCR_LIMIT1", "-qli1", 0, 0}, // Quant limit incr 1

{ PARG, &flatnessMinQp, "FLATNESS_MIN_QP", "-fmin", 0, 0}, // Flatness min QP

{ PARG, &flatnessMaxQp, "FLATNESS_MAX_QP", "-fmax", 0, 0}, // Flatness max QP

{ PARG, &flatnessDetThresh, "FLATNESS_DET_THRESH", "-fdt", 0, 0}, // Flatness detect threshold

{ PARG, &muxingMode, "MUXING_MODE", "-mm", 0, 0}, // Multiplexing mode

{NARG, &help, "", "-help" , 0, 0}, // video format

{SARG, filepath, "INCLUDE", "-F" , 0, 0}, // Cconfig file

{SARG, option, "", "-O" , 0, 0}, // key/value pair

{SARG, fn_i, "SRC_LIST", "" , 0, 0}, // Input file name

{SARG, fn_o, "OUT_DIR", "" , 0, 0}, // Output file name

{SARG, fn_log, "LOG_FILENAME", "" , 0, 0}, // Log file name

{PARG, NULL, "", "", 0, 0 }

};

根据不同的文件类型来调用open函数读取一帧的pixels：

pic_t *ip；

dpx_read(infname, &ip, dpxBugsOverride)

ppm_read(infname, &ip)

读取的picture数据放到了一个数据结构中：

typedef enum format_e {FRAME, TOP, BOTTOM, UNDEFINED_FORMAT} format_t;

typedef enum color_e {RGB, YUV_SD, YUV_HD, UNDEFINED_COLOR} color_t;

typedef enum chroma_e {YUV_420, YUV_422, YUV_444, YUV_4444, UNDEFINED_CHROMA} chroma_t;

typedef struct yuv_s {

int **y;

int **u;

int **v;

int **a;

int **m;

} yuv_t;

typedef struct rgb_s {

int **r;

int **g;

int **b;

int **a;

int **m;

} rgb_t;

typedef struct pic_s {

format_t format;

color_t color;

chroma_t chroma;

int alpha; // 0: not used, 1: used (same resolution as Y)

int w;

int h;

int bits;

int ar1; // aspect ratio (h)

int ar2; // aspect ratio (w)

int frm_no; // frame number in seq

int seq_len; // num images in sequence

float framerate;

int interlaced; // prog(0) or int(1) content

union data_u {

rgb_t rgb;

yuv_t yuv;

} data;

} pic_t;

这里有几个命令行的options会影响到输入图像的读取：

{ PARG, &rbSwap, "SWAP_R_AND_B", "-rbswp", 0, 0}, // Swap red & blue components 这个值表示RGB输入中的R和B需要进行交换一下。

{ PARG, &bitsPerComponent, "BITS_PER_COMPONENT", "-bpc", 0, 0}, // bits per component

这个是表示DSC encoder能支持的bpc，如果输入的图像的bpc和这个不相等，那么需要做一下处理.比如设置的bpc=9，但是输入的RGB是8bit，那么输入的RGB需要左移位1bit，变成9bits。反过来，如果要求的bpc小于输入图像的bpc，同样也要进行相应的处理。

void convertbits(pic_t *p, int newbits)）

{ PARG, &enable422, "ENABLE_422", "-e422", 0, 0}, // enable_422

是否支持YUV422，如果不支持，那么YUV422需要转换成YUV444再进行处理。yuv_422_444(ip, ip2);反过来，如果要求是YUV422作为输入，那么YUV444需要先转换成YUV422。yuv_444_422(ip, ip2);

{ PARG, &useYuvInput, "USE_YUV_INPUT", "-uyi", 0, 0}, // Use YUV input (convert if necessary)

是否只能使用YUV作为输入，如果是，那么RGB需要先转换成YUV再进行处理。rgb2yuv(ip, ip2);相反的，如果只能使用RGB作为输入，那么如果输入是YUV，需要先转换成RGB。yuv2rgb(ip, ip2);

有下面的几个checking:

{ PARG, &sliceWidth, "SLICE_WIDTH", "-nsh", 0, 0}, // Slice width (0=pic width)

对于YUV422的输入来说，picture width和slice width都必须是2的倍数，因为chroma只有一半。需要能够整除。如果不是的话，

如果picture width不满足，那么做crop：

ip->w--;

如果是slice width不满足，调整：

sliceWidth++;

总结：在DSC encoder的一开始，根据配置信息或者输入参数来对输入的图像RGB444/YUV422/YUV444进行处理成要求的输入格式：

比如要求bpc = 8, yuv422.

那么如果输入不是这种要求，就转换成这个要求的格式。

2. 输入cfg文件

encoder端的输入cfg文件：

/// Configuration for a single RC model range

typedef struct dsc_range_cfg_s {

int range_min_qp; ///< Min QP allowed for this range

int range_max_qp; ///< Max QP allowed for this range

int range_bpg_offset; ///< Bits/group offset to apply to target for this group

} dsc_range_cfg_t;

这个配置结构中的一些值需要外部的命令行的输入来确定。但是无论如何，这个struct的信息就是engine需要的输入信息。在硬件中，它们大部分会被转换成register配置信息。

对比DSC中的PPS的定义可以很明显的看大，这这些cfg大部分都会转换成PPS的信息。也就是说encoder产生PPS的来源就是这些cfg。

在encoder中也是利用这个struct来生成PPS的：

void write_pps(unsigned char *buf, dsc_cfg_t *dsc_cfg)；

下面是一个8bpp/8bpc的cfg的例子：

3. RC buffer的分配和处理

在代码中根据rc_model_size的设置来分配实际的RC buffer的大小。

详细的代码如下：

代码解释：

// Compute rate buffer size for auto mode

//根据slice width来计算每一个slice line有多少个group，可以有partial group

groupsPerLine = (dsc_codec.slice_width + 2) / 3;

//dsc_codec.rc_model_size是cfg中设置的RC model size，通常为8kbits

//initialDelay: 也是来自于cfg，

//Initial delay (in pixel time units) from encode start to xmit start,

//其实就是init_xmit_delay的时间，通常为512 pixel time => 512/3= 170 group times.

//initialFullnessOfs 是要求的RC buffer的初始的fullness。

//dsc_codec.rc_model_size - initialFullnessOfs 表示RC buffer中好空余多少空间

//(int)ceil(initialDelay * bitsPerPixel * 3)表示RC buffer在传输之前按照bpp来计算包含了多少bit.

//groupsPerLine * firstLineBpgOfs：表示slice的第一行需要多分配的bit数，和上面的累加表示在初始delay中，encoder RC buffer中按照bpp来计算可能包含的有效的bit数目。

//rbsMin = rc_model_size + (根据delay和bpp计算出来的rc buffer的初始的fullness - 预设的fullness) 这是因为rc_model_size是根据cfg中的initialfullness对应的一个理论上的rc buffer size，但是在实际中在init_xmit_delay的时间内，RC buffer的fullness可能是大于或者小于这个预设的fullness，因此需要对这个rc_model_size进行调整得到实际的physical RC buffer。

//rbsMin = minimumRateBufferSize 最小的RC buffer的size。

rbsMin = (int)(dsc_codec.rc_model_size - initialFullnessOfs

+ ((int)ceil(initialDelay * bitsPerPixel * 3))

+ groupsPerLine * firstLineBpgOfs);

//HRD delay实际上是等于init_xmit_delay+init_dec_delay,也等于rate buffer中的数据full的时候，完全清空需要耗费的时间。

hrdDelay = (int)(ceil(rbsMin / (3.0 * bitsPerPixel)));

//rcb_bits 表示rate buffer bits，其实就是取了rbsMin的bpp*3的整数倍,向下取整.

dsc_codec.rcb_bits = hrdDelay * ((int)(ceil(bitsPerPixel * 3)));

//decode delay = hrdDelay - init_xmit_delay

dsc_codec.initial_dec_delay = hrdDelay - dsc_codec.initial_enc_delay;

总结：实际的rate buffer size是基于delay和bpp在rc_model_size上进行的细微的调整。计算出来的最小的RC buffer的需求可能比rc_model_size更大，也可以更小。

3. PPS中的一些RC 参数的计算

init_scale_value : rcXformScale 在一个slice开始时的初始值

initial value for rcXformScale used at the beginning of a slice.

scale_decrement_interval 计算方法：

dsc_codec.rc_model_size = rcModelSize;

dsc_codec.initial_fullness_ofs = initialFullnessOfs;

//这是spec推荐的初始值，是一个初始的斜率。空闲buffer的比例

dsc_codec.initial_scale_value = 8 * dsc_codec.rc_model_size / (dsc_codec.rc_model_size - dsc_codec.initial_fullness_ofs);

//slice line的group数目

groupsPerLine = (dsc_codec.slice_width + 2) / 3;

//At the beginning of a slice, the rcXformScale factor decreases by 1 every

//scale_decrement_interval groups until it reaches unity scaling

dsc_codec.scale_decrement_interval = groupsPerLine / (dsc_codec.initial_scale_value - 8);

numExtraMuxBits : 包括3个component的所有bits。

在slice结束的时候由于SSM的原因(因为SSM是以mux word为单位的)导致遗留下来的bits数目。也就是说不够mux word的部分都会遗留下来(balance FIFO).

final_offset:

target_bpgx16 = dsc_codec.bits_per_pixel * 3; //group的target bits

//initial_enc_delay是转换为了group time

//final value表示在slice编码结束后遗留下来的空余空间

//spec:

//final_offset = rc_model_size – initial_xmit_delay * bits_per_pixel + numExtraMuxBits

final_value = dsc_codec.rc_model_size - ((dsc_codec.initial_enc_delay * target_bpgx16 + 8)>>4)

+ num_extra_mux_bits（在balance fifo中,不在RC buffer中);

dsc_codec.final_offset = final_value;

scale_increment_interval：

this value specifies the number of group times between incrementing the rcXformScale factor at the end of a slice .

//slice最后一行的空闲buffer的比例

first_line_bpg_offset

nfl_bpg_offset

slice_bpg_offset：

#define OFFSET_FRACTIONAL_BITS 11

first_line_bpg_ofs是slice的第一行多分配的bits/group，因此其他的行就必须要少分配这么多bits，平均到每一个其他的行的group中。

slice_bpg_offset：This value specifies the number of bits (inc luding fractional bits) that are de-allocated for each group in order to enforce the slice constraint

RC的buffer在满足fullness的要求的前提下的空余空间/整个slice的groups。

4. 编码流程

从大的流程上来说，算法模型是slice by slice的进行编码的。slice之间是按照raster scan的顺序进行处理的。

调用一次DSC_Encode()就是编码一个slice。xstart/ystart表示的就是当前处理的slice的左上角点坐标。

4.1 slice 编码

如果当前输入是RGB，首先把当前slice转换成YCoCg进行处理：

slice内部的循环如下：

while ( !done ) {

//先做component的循环，每一个component内的group循环

for ( CType = 0; CType

groups_in_slice_process();

//如果当前slice处理完了就退出

if ( vIndex >= dsc_cfg->slice_height )

done = 1;

}

数据结构说明：基于component和group的结构，有下面的数据结构

#define MAX_UNITS_PER_GROUP 3 //component的数目

#define SAMPLES_PER_UNIT 3 //一个group中的pixel数目

int quantizedResidual[MAX_UNITS_PER_GROUP][SAMPLES_PER_UNIT]; ///< Quantized residuals for current group

比如：dsc_state->quantizedResidual[CType][sampModCnt] = QuantizeResidual(err_raw, qlevel);

代码的每行分析：

1. slice的RGB2YCoCg的转换

2. 初始化和数据准备

首先是slice line buffer的malloc和padding：

#define PADDING_LEFT 5 // Pixels to pad line arrays to the left

#define PADDING_RIGHT 5 // Pixels to pad line arrays to the right

为每一个component都开了3个slice line buffer（上一行，当前行和初始图像line):

int *prevLine[NUM_COMPONENTS]; ///< Previous line reconstructed samples

int *currLine[NUM_COMPONENTS]; ///< Current line reconstructed samples

int *origLine[NUM_COMPONENTS]; ///< Current line original samples (for encoder)

初始化的值对luma为255(bpc=8），对于YCoCg，因为Co，Cg为9bits，因此Co，Cg初始化为510.

问题：为什么slice line buffer的大小需要做padding 5？需要知道slice width已经考虑了partial group。

3. 读取origin line

从输入picture中读取当前要处理的origin slice line的数据存储在origLine[component]中。

4. 更新ICH buffer

在正常跑起来后，前一个group如果是ICH mode或者是P-MODE，都可能需要更新ICH buffer，如果前一个是ICH，那么就需要更新MRU；如果是P-MODE，就需要替换LRU。

这里的函数只是处理如果前一个group是ICH的情况，也就是update MRU。把当前选择的index送到MRU的位置(buffer index 0)

// Insert as most recent

dsc_state->history.pixels[CType][0] = recon[CType];

dsc_state->history.valid[0] = 1;

下面是component循环中的代码：

for ( CType = 0; CType

5. P-MODE prediction计算

对于MMAP/BP来说，有些计算是提前一行做的，包括BPVector的计算和MMAP/BP mode decision等等。

PRED_TYPE *prevLinePred; ///< BP/MMAP selection decsion buffer (since model calculates BP offset one line ahead of time)这个数据结构就是存储预先计算好的mode decision的结果。

typedef enum { PT_MAP=0, PT_LEFT, PT_BLOCK } PRED_TYPE; //PT_LEFT是JPEG的MAP，DSC不使用。

根据前一行的计算得到第一步mode decision的BP/MMAP模式下的prediction。

开始做第二步的mode decision，进行size的check，判断是否使用MPP：

利用上面得到的pred_x和orig计算出residual，并且进行量化，得到量化后的err.

MPP的residual和相应的调整：

计算MPP mode下的err，前面也说过，如果MPP得到的量化后的residual的编码bit>max_bpc-qpLevel,那么这个量化后的residual需要进行调整，是的编码bits不回超过这个bit要求，因此直接利用pred和orig计算出来的MPP的residual不是最终的residual，而是还可能需要调整。

这里有2个函数就是利用residual来得到所需要的size的：

从前面的描述可以知道，group编码采用的是VLC编码，prefix+suffix，其中suffix就是对residual的直接编码。因此实际需要的bits可以通过residual的值的范围就可以知道需要多少bits。

//直接编码的size(sign）

经过上面的过程得到了MMAP/BP/MPP下的量化后的residual:

dsc_state->quantizedResidual[CType][sampModCnt]

dsc_state->quantizedResidualMid[CType][sampModCnt]

这两个数据结构的定义：

int quantizedResidual[MAX_UNITS_PER_GROUP][SAMPLES_PER_UNIT]; ///< Quantized residuals for current group

int quantizedResidualMid[MAX_UNITS_PER_GROUP][SAMPLES_PER_UNIT]; ///< Quantized residuals assuming midpoint prediction for current group

IQ-RECON后得到reconstruct的pixels(MMAP/BP/MPP),计算BP/MMAP的重构的值recon_x和

最大的MPP的maxErr/SAD：

The error/SAD被存储在下面的数据结构中：

int maxError[MAX_UNITS_PER_GROUP]; ///< Max error for each component using DPCM(BP/MMAP)

int maxMidError[MAX_UNITS_PER_GROUP]; ///< Max error for each component using MPP

int maxIchError[MAX_UNITS_PER_GROUP]; ///< Max error for each component using ICH

存储recons数据到对应的buffer中：因为在buffer定义中可以知道，left和right都提供了5个pixels的padding，因此实际的第一个pixel是从5开始存储的：

int *prevLine[NUM_COMPONENTS]; ///< Previous line reconstructed samples
int *currLine[NUM_COMPONENTS]; ///< Current line reconstructed samples
int *origLine[NUM_COMPONENTS]; ///< Current line original samples (for encoder)

// line buffers have padding to left and right
lbufWidth = dsc_cfg->slice_width + PADDING_LEFT + PADDING_RIGHT; // pixels to left and right

上面3个line buffer都是按照lbufWidth来开的。

int hSkew = PADDING_LEFT; // there are hSkew fake pixels to the left of first real pixel

得到的MAPP/BP的reconst数据存储量到对应的currLine buffer中，并且存储到输出buffer中。

可见，recon_x是模式MMAP/BP时的重构数据，而不是MPP/ICH的重构数据。

问题: 为什么直接就存储了，mode decision什么时候做？

总结：到目前为止都是在for ( CType = 0; CType(当然大循环是while ( !done ) 这是一个pixel的循环)，在这个循环中对每一个component计算MMAP/BP 和MPP模式下的maxErr/SAD，并且计算MMAP/BP模式下的recon 值存储到对应的line buffer和输出buffer中。

Flatness checking:

flatness是以supergroup为单位的，也就是说当收集到一个完整的supergroup后可以开始flatness checking。

那么怎么算是一个supergroup？

从代码上可以看到，当收集满一个group的pixel或者当前pixel是slice line的最后一个pixel(partial group）都会开始检测是否是supergroup(4 group)。这就意味着如果是partial group也是作为一个group来计算的。

从spec上来看，认为slice的第一个group不作为supergroup：

但是根据代码上的groupCount的描述

下面是几个计数器的使用：

关于groupCount:当前处理的group的数目

initial: int group_count = 0;

Flatness checking();

VLCGroup( dsc_cfg, dsc_state, &cmpr_buf)

group_count++;

dsc_state->groupCount = group_count;

可见 group_count ++是在flatness checking后进行的。

关于hIndex变量：当前处理的pixel的位置。

slice开始的时候初始为0

flatness checking

...

hIndex ++;

可见，也是和groupCount以后在flatness后面才加1的。

sampModCnt：

group内的当前处理的pixel计数，记到3后回到0.

因为DSC spec规定，只有当masterQP是在flatness要求的范围内才去检测flatness，因此首先要check这个masterQP是否符合要求。这个masterQP是当前group的QP。

这里有几个和flatness有关的变量：

int firstFlat; ///< If -1, none of the 4 group set are flat, otherwise indicates which group is the one where flatness starts

int prevFirstFlat; ///< If -1, none of the 4 group set are flat, otherwise indicates which group is the one where flatness starts

int prevIsFlat; ///< Flag indicating the previous group is flat

int flatnessType; ///< 0 = somewhat flat; 1 = very flat

int prevFlatnessType; ///< The flatnessType coded with the previous group

下面开始在supergroup中对4个group进行循环检测，得到每一个group的flatness type：

for (i=0; i

flatness_type = IsOrigFlatHIndex(dsc_cfg, dsc_state, hIndex + (i+1)*PIXELS_PER_GROUP);

}

注意输入是flatness处理的当前Group和pixel的当前group(groupCount)是不同的：

从下面的代码可以看到：

其实slice的第一个group也算在了supergroup中，

groupCount是从来slice的第0个group开始的，%4 ==3

masterQP的选择：

当前supergroup的左边第2个group的QP，见下面的图。The masterQp value that is used is the one that is used for the 2nd group to the left of the supergroup that is being tested。是符合spec的描述的。

从代码中分析，

int masterQp; ///< QP used for the current group

int prevMasterQp; ///< QP used for the previous group

flatness_det_thresh = 2 <<(bpc-8)；

因为flatness是基于original的pixel来做的，而不是reconst的值，因此只需要取origin line buffer中的数据就可以了。其它的都是根据DSC spec中的算法类似，做两次flatness checking，只有当flatness check1失败后才做flatness check2.得到very flat/somewhat flat/not flat 的三种判断。

这里有2个需要注意的情况：

1. supergroup可以跨行，如果当前要处理的group是在下一行，那么这个group是肯定是非flat的，不需要做判断。

// If group starts past the end of the slice, it can't be flat

if (hIndex+1 >= dsc_cfg->slice_width)

return (0);

2. 从这里可以看到，如果当前supergroup跨了line，那么需要做padding。

p = dsc_state->origLine[CType][PADDING_LEFT + MIN(dsc_cfg->slice_width-1, hIndex+i)]

从上面可以看到，最大的padding长度是3个pixels(flatness check2需要当前group的后面一个group)。

下面是一个图描述了完整的flatness的处理过程：

其余的请看上面一篇blog。

下面是详细的代码：DSC_Algorithm（）

从代码上看，flatness是基于当前group的QP来调整给下一个group使用的QP。

下面是代码执行的顺序：

从上面的顺序可以知道，RC和flatness做的QP都是给下一个group使用的。这是合理的，本来就是根据当前group的编码情况来确定下一个group的QP。

int masterQp; ///< QP used for the current group 这才是当前group正在使用的QP值。

int prevMasterQp; ///< QP used for the previous group

flatness调整后的QP会在下一个group开始编码之前设置为下一个group的masterQP：

dsc_state->masterQp = qp;

MMAP/BP search and mode decision

从SPEC中可以知道，MMAP/BP的mode decision和BPVector的获取都是基于上一行的recon 数据，和当前行的数据无关，因此可以提前做这一步，在代码中也是这么做的，在做行切换的时候做这一行的所有的MMAP/BP的decision并且存储到下面的数据结构中：

typedef enum { PT_MAP=0(MMAP), PT_LEFT(不需要top neibor的MMAP简化版), PT_BLOCK(BP) } PRED_TYPE;

slice的第一行不能使用BP mode，而MMAP需要使用简化版的PT_LEFT来代替：

PRED_TYPE *prevLinePred; ///< BP selection decsion buffer (since model calculates BP offset one line ahead of time)

代码如下：

if ( hIndex >= dsc_cfg->slice_width ) 表示这是一个行切换的时机。

因为MMAP/BP的mode decision的结果是对group的(luma和chroma相同的mode)，因此在做mode decision的时候应该同时考虑luma和chroma，上面的代码中有一个componennt的循环来决定mode decision：

for (CType = 0; CType < NUM_COMPONENTS; ++CType){

BlockPredSearch( dsc_cfg, dsc_state, CType, currLine, mod_hIndex, currLine[CType][mod_hIndex+5] );

}

核心代码是BlockPredSearch()：

#define BP_EDGE_COUNT 9

#define BP_EDGE_STRENGTH 32

hoffsete=0表示这是slice line的第一个group。在BP的判断中需要判断edge，如果edgeCount>9个，那么不应该使用BP的。

这里有一个变量来存储BP/MMAP mode和BPVector：
typedef enum { PT_MAP=0, PT_LEFT, PT_BLOCK } PRED_TYPE;

在DSC中，如果一个值>enum，那么表示的就是BP MODE and BP Vector。

从BP/MMAP 以及BPVector的search：

pred_x

= SamplePredict( dsc_state, currLine[CType], currLine[CType], hOffset, (PRED_TYPE)(i+PT_BLOCK), 0, CType );

(PRED_TYPE)(i+PT_BLOCK) 表示BPVector+1

如下面的代码就是BPmode：

#define BP_RANGE 10

#define BP_SIZE 3

#define PRED_BLK_SIZE 3

上面predErr[ctype][bprange]存储的是：

也就是存储当前group(Hpos）和BPV开始（从右往左) 的3个pixels 的recon之间的SAD累加).

是SAD3X1，每一个component存储自己的。

每一个SAD3X1需要做clamp。根据spec规定，每一个diff需要clamp到6bits。

Clamp操作：

上面得到每一个pixel的error值都需要先进行clamp到了6bits：

modifedAbsDiff = MIN(absDiff >> (maxBpc-7), 0x3F);

然后再去进行SAD3X1的累加。

candidateVector’s (-1, -3, -4, -5, -6, -7, -8, -9, and -10) ，BPVector中没有-2。

这里的代码计算了10个BPV的prediction值(包括-2,后面再抛弃不用)，实际上我们只有9个.

SAD利用的是上一行的recon pixels计算得到的9pixels的SAD9x1来做类似于MV search的BPV search：

当前reference pixels：当前group和left 6个pixels组成当前reference pixel，从hpos -6 开始

以BPV为起点的9个pixels作为SAD的另外一方，进行SAD计算。

上面图中当前的9个pixels就是-8~0, 假设BPV=-10(最大)，那么另外9个pixels就是-18~-10。

SAD就是在这2组各9个pixels的SAD。

如果上面参与运算的pixels有超过了slice的边界就用padding来补，包括left padding和right padding。

从下面的代码可以得到这一点，做了MAX，因此left是需要padding的，padding的数目可能很大>10：

offset = (int)predType - (int)PT_BLOCK;

p = currLine[MAX(h_offset_array_idx - 1 - offset,0)];

int* currLine, // reconstructed samples for current line

The BP predictor is used to predict all three components from the pixel referred to by the block prediction vector:

P[hPos] = recon[hPos + bpVector]; //取BPV指向的reconst值

但是这里有个问题没有解决，最好的BPVector是怎么得到的？这就是9个BPV的9 pixels搜索:

下面是具体的代码：

#define BP_SIZE 3

int lastErr[NUM_COMPONENTS][BP_SIZE][BP_RANGE]; ///< 3-pixel SAD's for each of the past 3 3-pixel-wide prediction blocks for each BP offset

cursamp = (hOffset/PRED_BLK_SIZE) % BP_SIZE; //得到当前group的left的group的，因为9pixels是3个groups，因此当前9个pixels需要3个groups。这是3个groups的index。

因为BPV是对luma和chroma的，因此cost需要同时考虑luma和chroma的cost。这就是ctype<2的作用。

上面是计算3 pixel SAD。

if ( pixel_mod_cnt == PRED_BLK_SIZE - 1 ) ：表示当group的最后一个pixels的时候才开始计算SAD9X1。

计算9 pixel SAD9x1的步骤：

a. 首先把3个pixels的3个component的SAD3X1累加。

3个pixels的3个component的SAD累加，并且clamp到511.

因为最大的diff被clamp到了6bits，1个pixel的3个components的累加为8bits。3个pixels的3个component累加为10bits。clamp到9bits。

b. 累加当前group的left的2个groups的SAD3X1。cursamp 包含了3个groups(left -2， left -1, current group)

得到SAD9X1。

位宽分析：

因为SAD3X1是9bits，因此SAD9X1是11bits。DSC规定需要进行clamp到8bits，因此需要右移3bits。

bp_sads[i] >>= 3; // SAD is truncated to 8-bit for comparison

最终得到了SAD9X1.可见这个SAD9X1包括了9个pixels的每一个pixel的3个components的SAD3X1的和并进行clamp到8bits。

这和SPEC上规定的是一致的。

int bp_sads[BP_RANGE];

bpSad[candidateVector] = MIN(511, sad3x1_0[candidateVector] + sad3x1_1[candidateVector] +

sad3x1_2[candidateVector]);

MMAP和BP的mode decision：根据spec的规定

首先找到最小SAD的BPVector：

bpcount是从hpos>9才开始计算的。可见前面几个group是不可能选择BP的。

LastEdgeCount:

另外还有一个lastEdgeCount是利用top neibor中的recon来检测连续的edge的数目。表示在检测到edge之前有多少个pixel是非edge的。

VLC 单元：VLCGroup（）

1. forceMPP的确定

在SPEC中是这样定义的：

The reason forceMpp is needed is to ensure that the encoder rate buffer has enough bits to allow the chunk to end on a byte boundary.

forceMPP的两种情况：

a. 主要是检查bufferFullness,如果fullness太小了，需要增加bit的输出，使用forceMPP。

b. 另外一个就是如果当前group中的maxSize>maxBPC-qplevel,那么也需要MPP。这是mode decision中使用的，但是不叫forceMPP。而只是mode decision选择MPP。

代码如下：

substream & slice multiplexing代码

1. substream multiplexing

以mux word为单位从balance FIFO中取到RC buffer中，request的顺序由virtual decoder来确定。

VLCGroup（）：

从上面的代码可以看到，当VLC后，生成的bits被送入了encoder balance FIFO中。在VLC_Unit()中直接调用addbits()来加入(prefix+suffix=total_size)：

void AddBits(dsc_cfg_t *dsc_cfg, dsc_state_t *dsc_state, int CType, int d, int nbits)

{

fifo_put_bits(&(dsc_state->encShifter[CType]), d, nbits);

dsc_state->numBits += nbits;

}

对于substream multiplexing：

if (dsc_state->groupCount > dsc_cfg->mux_word_size + MAX_SE_SIZE - 1)

ProcessGroupEnc(dsc_cfg, dsc_state, *byte_out_p);

当编码的group大于dsc_cfg->mux_word_size + MAX_SE_SIZE - 1后开始做substream，也就是说第一次的substream要求是至少处理了这么多的group后才开始。为什么？？？

实际的处理过程如下：

在这里使用了virtual decoder来确定request是否发出，发出的顺序是什么.

if (dsc_state->shifter[i].fullness < max_se_size[i])

在DSC spec中规定，如果request发出后，必须保证mux word数据已经准备好了，在C代码中去checking balance FIFO中的fullness，是否包含了mux word的数据。并且取出mux word的数据。

上面的代码中还分析了如果mux word没有准备好，那么需要填0，但是在正常的情况下这不是DSC SPEC允许的。

只有在最后的request中是可以允许的，比如最后的data确实不够一个mux word，因此需要进行填充输出。

函数AddBits：

// Write bits to output bitstream

if (prefix_value == max_pfx_size) // Trailing "1" can be omitted

AddBits(dsc_cfg, dsc_state, CType, 0, max_pfx_size);

else

AddBits(dsc_cfg, dsc_state, CType, 1, prefix_value+1);

AddBits(dsc_cfg, dsc_state, CType, 0, max_pfx_size);

表示输出max_pfx_size个0到balance FIFO中。

AddBits(dsc_cfg, dsc_state, CType, 1, prefix_value+1)

表示输出0x0001 （0的个数为prefix_value)到balance FIFO。

在slice编码结束后有一个check：最后的RC bufferness不能大于这个threshold：

slice multiplexing:

这是以chunk 为单位输出到decoder(encoder端用写文件的方式来模拟)。

int codec_main(int argc, char *argv[])

从上面的循环可以知道，是按照slice by slice的进行encoder的。但是从下面的RC buffer开辟来说，

从上面可以知道，在encoder后，这个mux word数据写入了RC buffer中，最后以chunk的方式输出到decoder。

encoder端用写文件的方式来输出了chunk。因为display端是按照行进行扫描的，因此数据必须是按照行来发送的，在DSC后的chunk也是必须按照行来进行交织(尽管是多slice/line)，因此最终也是按照picture line进行传输chunk的。

Rate Control

根据DSC的spec规定，RC buffer输出bits到decoder需要等待Init_xmit_delay时间后才能开始。因此上面的代码就实现了这一点。当pixel time大于了这个delay，开始输出RC 中的chunk数据。

b. linear transform

根据上一个group的VLC送过来的2个size: rcGroupSize,codedGroupSize来更新RC中的fullness。并且使用linear transform将fullness转换成rcModelBufferFullness

VESA <wbr><wbr>display <wbr><wbr>stream <wbr><wbr>compression <wbr><wbr>standard(DSC)详解

#define RC_SCALE_BINARY_POINT 3

throttle_offset = dsc_state->rcXformOffset;

关于rcXformOffset有下面的spec定义：

代码实现如下：

也就是从外部接收一个rcXform的初始值，在做linear transform之前需要减去rc_model_size，得到一个值，这个值应该是一个负值。原因是因为DSC规定的rcModelBufferFullness是一个负值(0~ -rc_model_size)

Partial group不会使用ICH mode

尽管在SPEC上没有说明这一点，但是ref 代码中明确的这么做了。

encoder中的几个buffer的大小分配

1. RC buffer

上面的buf就是RC buffer。RC buffer是每一个slice一个RC buffer。

buf[4][slice group size].

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：useful forum

后一篇：DSC问题收集和回答

新浪BLOG意见反馈留言板　欢迎批评指正