加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

算法-二叉树2-霍夫曼编码

(2017-08-21 09:52:22)
分类: 数据结构与算法

一、基本概念

霍夫曼编码是一种古老但是及其有效的数据压缩方法。在计算机中,每一个字符在没有压缩时都占有一个字节(unicode为两个字节),每个字符都有相同的位数(8位或16位)。但在实际使用时,高频字符和低频字符使用相同固定长度,必然会增加总的字节长度,因为我们可以降低高频字符所占位数,容忍低频字符位数较多的情况。

霍夫曼编码就是基于上面的思路,将高频字符使用位数较小的编码表示,而低频字符则使用相对较长的位数。需要注意的是,压缩后的高频字符不能是其他字符的前缀,因为这样在串行读入整个消息时无法区分是高频字符还是其他字符的一部分。例如如果用01表示E0111101表示X,那么无法判断。

 

二、霍夫曼编码——霍夫曼树

那么如何进行霍夫曼编码呢?霍夫曼利用二叉树构建了霍夫曼树,算法如下:

1.       霍夫曼树节点

首先,霍夫曼树的节点包含两种信息,字符权重(出现频率)和字符本身:

 

public class HuffmanNode implements Comparable{
   

   
private int weight;
   

   
private String charater;

    public
HuffmanNode leftNode;
    public
HuffmanNode rightNode;

    public int
getWeight() {
       
return weight;
   
}

   
public void setWeight(int weight) {
       
this.weight = weight;
   
}

   
public String getCharater() {
       
return charater;
   
}

   
public void setCharater(String charater) {
        
this.charater = charater;
   
}



   
public HuffmanNode getLeftNode() {
       
return leftNode;
   
}

   
public void setLeftNode(HuffmanNode leftNode) {
       
this.leftNode = leftNode;
   
}

   
public HuffmanNode getRightNode() {
       
return rightNode;
   
}

   
public void setRightNode(HuffmanNode rightNode) {
       
this.rightNode = rightNode;
   
}

   
public HuffmanNode(int weight, String charater) {
       
this.weight = weight;
        this
.charater = charater;
   
}

   
public HuffmanNode() {
    }

   
@Override
   
public boolean equals(Object o) {
       
if (this == o) return true;
        if
(o == null || getClass() != o.getClass()) return false;

       
HuffmanNode node = (HuffmanNode) o;

        return
charater != null ? charater.equals(node.charater) : node.charater == null;

   
}

   
@Override
   
public int hashCode() {
       
return charater != null ? charater.hashCode() : 0;
   
}

   
@Override
   
public int compareTo(Object o) {
        HuffmanNode node = (HuffmanNode) o
;
        if
(this.weight > node.getWeight()) {
           
return 1;
       
}
       
else if(this.weight < node.getWeight()) {
           
return -1;
       
}
       
else {
           
return 0;
       
}
    }
}

 

2.       构建霍夫曼树

算法如下:

1)  统计信息中各个字符出现的频率,然后分别创建霍夫曼节点,设置权重和字符,作为一个单节点树;

2)  将这些树插入一个优先队列中,根据权重由小到大,频率越小优先级越高。

3)  从有限队列中取出优先级最高两个树,并将其从队列中删除,创建一个它们共同的根节点组成一棵新树,然后插入到优先队列中。其中,根节点的权重为取出两棵树权重之和。根节点的左子节点为取出两棵树中优先级最高的树,由子节点为取出两棵树中优先级次高的树。

4)重复3)中过程,直到队列中只有一棵树,这棵树就是这条信息的霍夫曼树

 

http://s15/mw690/002i2qf8zy7dFprk3j85e&690

private HuffmanNode root;

private
Map, Integer> frequencyMap = new HashMap<>();

public
Map, Integer> getFrequencyMap() {
   
return frequencyMap;
}

public void setFrequencyMap(String rawMessage) {
   
for(int i = 0; i < rawMessage.length(); i++) {
        String character = String.valueOf(rawMessage.charAt(i))
;
        if
(null == frequencyMap.get(character)) {
           
frequencyMap.put(character, 1);
       
}
       
else {
           
int count = frequencyMap.get(character);
           
frequencyMap.put(character, ++count);
       
}
    }
}
public void buildHuffmanTree() {
    PriorityQueue queue = initHuffmanQueue()
;
    while
(queue.size() > 1) {
        addHuffmanQueue(queue)
;
   
}
   
root = queue.poll();
}
private PriorityQueue initHuffmanQueue() {
    PriorityQueue result =
new PriorityQueue<>();
    for
(Map.Entry, Integer> entry :  frequencyMap.entrySet()) {
        HuffmanNode node =
new HuffmanNode();
       
node.setCharater(entry.getKey());
       
node.setWeight(entry.getValue());
       
result.add(node);
   
}
   
return result;
}

private void addHuffmanQueue(PriorityQueue queue) {
    HuffmanNode first = queue.poll()
;
   
HuffmanNode second = queue.poll();
   
HuffmanNode node = new HuffmanNode();
   
node.leftNode = first;
   
node.rightNode = second;
   
node.setWeight(first.getWeight() + second.getWeight());
   
queue.add(node);
}
 
 

 

3.       霍夫曼编码表

霍夫曼树构建好了我们就可以得到一张编码表,也就是字符到霍夫曼编码的映射关系,具体实现如下:

 

private static String ZERO = "0";
private static
String ONE = "1";

private
Map, String> encodeTable = new HashMap<>();
 
public Map, String> getEncodeTable() {
   
if(encodeTable.isEmpty()) {
        generateEncodeTable(
root, "");
   
}
   
return encodeTable;
}

private void generateEncodeTable(HuffmanNode node, String binaryNum) {
   
if(node.leftNode == null && node.rightNode == null) {
        encodeTable.put(node.getCharater()
, binaryNum);
        return;
   
}
    generateEncodeTable(node.
leftNode, binaryNum + ZERO);
   
generateEncodeTable(node.rightNode, binaryNum + ONE);
}

 

 

4.       编码

编码的过程就是将信息转换成霍夫曼编码,具体实现如下:

 

public String encode(String rawMessage) {
    String result =
"";
   
Map, String> encodeTable = getEncodeTable();
    for
(int i = 0; i < rawMessage.length(); i++) {
        String character = String.valueOf(rawMessage.charAt(i))
;
       
result += encodeTable.get(character);
   
}
   
return result;
}

 

 

 

三、霍夫曼解码

下面会详细阐述如何进行简单的解码。假设我们有信息“SUSIE SAYS IT IS EASY”的霍夫曼编码表如下:

" " -> "00"

"A" -> "1110"

"S" -> "10"

"T" -> "0110"

"U" -> "0111"

"E" -> "1111"

"Y" -> "010"

"I" -> "110"

霍夫曼树如下:

http://s16/mw690/002i2qf8zy7dFpsssZFdf&690

对于给定的压缩码10011110110111100101110010100011001100011010001111111010010

从树根开始,如果1则走向右子树,如果是0走左子树,直到找到叶节点(包含字符)。然后回到树根,重复上面过程直到压缩码被解压完成。

下图为上述过程的一个例子:


实现如下:

 http://s15/mw690/002i2qf8zy7dFpuORHM5e&690


public String decode(String encoded) {
    String result =
"";
   
HuffmanNode currentNode = root;
    for
(int i = 0; i < encoded.length(); i++) {
       
if(currentNode.leftNode == null && currentNode.rightNode == null){
            result = result + currentNode.getCharater()
;
           
currentNode = root;
       
}
        String binaryNum = String.valueOf(encoded.charAt(i))
;
        if
(ZERO.equals(binaryNum)) {
            currentNode = currentNode.
leftNode;
       
}
       
else {
            currentNode = currentNode.
rightNode;
       
}
    }
   
if(currentNode.leftNode == null && currentNode.rightNode == null){
        result = result + currentNode.getCharater()
;
   
}
   
return result;
}

 

四、全部实现代码

 


public class HuffmanTree {
   
private static String ZERO = "0";
    private static
String ONE = "1";

    private
HuffmanNode root;

    private
Map, Integer> frequencyMap = new HashMap<>();

    private
Map, String> encodeTable = new HashMap<>();

    public
Map, Integer> getFrequencyMap() {
       
return frequencyMap;
   
}

   
public void setFrequencyMap(String rawMessage) {
       
for(int i = 0; i < rawMessage.length(); i++) {
            String character = String.valueOf(rawMessage.charAt(i))
;
            if
(null == frequencyMap.get(character)) {
               
frequencyMap.put(character, 1);
           
}
           
else {
               
int count = frequencyMap.get(character);
               
frequencyMap.put(character, ++count);
           
}
        }
    }

   
public void buildHuffmanTree() {
        PriorityQueue queue = initHuffmanQueue()
;
        while
(queue.size() > 1) {
            addHuffmanQueue(queue)
;
       
}
       
root = queue.poll();
   
}

   
public String decode(String encoded) {
        String result =
"";
       
HuffmanNode currentNode = root;
        for
(int i = 0; i < encoded.length(); i++) {
           
if(currentNode.leftNode == null && currentNode.rightNode == null){
                result = result + currentNode.getCharater()
;
               
currentNode = root;
           
}
            String binaryNum = String.valueOf(encoded.charAt(i))
;
            if
(ZERO.equals(binaryNum)) {
                currentNode = currentNode.
leftNode;
           
}
           
else {
                currentNode = currentNode.
rightNode;
           
}
        }
        
if(currentNode.leftNode == null && currentNode.rightNode == null){
            result = result + currentNode.getCharater()
;
       
}
       
return result;
   
}

   
public String encode(String rawMessage) {
        String result =
"";
       
Map, String> encodeTable = getEncodeTable();
        for
(int i = 0; i < rawMessage.length(); i++) {
            String character = String.valueOf(rawMessage.charAt(i))
;
           
result += encodeTable.get(character);
       
}
       
return result;
   
}

   
public Map, String> getEncodeTable() {
       
if(encodeTable.isEmpty()) {
            generateEncodeTable(
root, "");
       
}
       
return encodeTable;
   
}

   
private void generateEncodeTable(HuffmanNode node, String binaryNum) {
       
if(node.leftNode == null && node.rightNode == null) {
           
encodeTable.put(node.getCharater(), binaryNum);
            return;
       
}
        generateEncodeTable(node.
leftNode, binaryNum + ZERO);
       
generateEncodeTable(node.rightNode, binaryNum + ONE);
   
}

   
private PriorityQueue initHuffmanQueue() {
        PriorityQueue result =
new PriorityQueue<>();
        for
(Map.Entry, Integer> entry :  frequencyMap.entrySet()) {
            HuffmanNode node =
new HuffmanNode();
           
node.setCharater(entry.getKey());
           
node.setWeight(entry.getValue());
           
result.add(node);
       
}
       
return result;
   
}

   
private void addHuffmanQueue(PriorityQueue queue) {
        HuffmanNode first = queue.poll()
;
       
HuffmanNode second = queue.poll();
       
HuffmanNode node = new HuffmanNode();
       
node.leftNode = first;
       
node.rightNode = second;
       
node.setWeight(first.getWeight() + second.getWeight());
       
queue.add(node);
   
}

   
public HuffmanNode getRoot() {
       
return root;
   
}

   
public void setRoot(HuffmanNode root) {
       
this.root = root;
   
}

   
public void inOrder(HuffmanNode node) {
       
if(node == null) {
           
return;
       
}
        inOrder(node.
leftNode);
       
System.out.println(node.getCharater() + " weight:" + node.getWeight());
       
inOrder(node.rightNode);
   
}
}

 


http://blog.csdn.net/BigLeo/article/details/41243779
http://blog.csdn.net/BigLeo/article/details/41243779

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有