哈夫曼编码(Huffman Coding),又称霍夫曼编码,是一种编码方式,哈夫曼编码是可变字长编码(VLC)的一种。Huffman于1952年提出一种编码方法,该方法完全依据字符出现概率来构造异字头的平均长度最短的码字,有时称之为最佳编码,一般就叫做Huffman编码(有时也称为霍夫曼编码)。
function [zipped,info] = norm2huff(vector) % ensure to handle uint8 input vector if ~isa(vector,'uint8'), error('input argument must be a uint8 vector') end % vector as a row vector = vector(:)'; % frequency f = frequency(vector); % simbols presents in the vector are simbols = find(f~=0); % first value is 1 not 0!!! f = f(simbols); % sort using the frequency [f,sortindex] = sort(f); simbols = simbols(sortindex); % generate the codewords as the 52 bits of a double len = length(simbols); simbols_index = num2cell(1:len); codeword_tmp = cell(len,1); while length(f)>1, index1 = simbols_index{1}; index2 = simbols_index{2}; codeword_tmp(index1) = addnode(codeword_tmp(index1),uint8(0)); codeword_tmp(index2) = addnode(codeword_tmp(index2),uint8(1)); f = [sum(f(1:2)) f(3:end)]; simbols_index = [{[index1 index2]} simbols_index(3:end)]; % resort data in order to have the two nodes with lower frequency as first two [f,sortindex] = sort(f); simbols_index = simbols_index(sortindex); end % arrange cell array to have correspondance simbol <-> codeword codeword = cell(256,1); codeword(simbols) = codeword_tmp; % calculate full string length len = 0; for index=1:length(vector), len = len+length(codeword{double(vector(index))+1}); end % create the full 01 sequence string = repmat(uint8(0),1,len); pointer = 1; for index=1:length(vector), code = codeword{double(vector(index))+1}; len = length(code); string(pointer+(0:len-1)) = code; pointer = pointer+len; end % calculate if it is necessary to add padding zeros len = length(string); pad = 8-mod(len,8); if pad>0, string = [string uint8(zeros(1,pad))]; end % now save only usefull codewords codeword = codeword(simbols); codelen = zeros(size(codeword)); weights = 2.^(0:51); maxcodelen = 0; for index = 1:length(codeword), len = length(codeword{index}); if len>maxcodelen, maxcodelen = len; end if len>0, code = sum(weights(codeword{index}==1)); code = bitset(code,len+1); codeword{index} = code; codelen(index) = len; end end codeword = [codeword{:}]; % calculate zipped vector cols = length(string)/8; string = reshape(string,8,cols); weights = 2.^(0:7); zipped = uint8(weights*double(string)); % store data into a sparse matrix huffcodes = sparse(1,1); % init sparse matrix for index = 1:numel(codeword), huffcodes(codeword(index),1) = simbols(index); end % create info structure info.pad = pad; info.huffcodes = huffcodes; info.ratio = cols./length(vector); info.length = length(vector); info.maxcodelen = maxcodelen; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function codeword_new = addnode(codeword_old,item) codeword_new = cell(size(codeword_old)); for index = 1:length(codeword_old), codeword_new{index} = [item codeword_old{index}]; end A_039
标签:simbols,index,len,编译,length,vector,codeword,huffman From: https://www.cnblogs.com/51matlab/p/17038868.html