基于静态huffman编码的压缩
名词解释:哈夫曼编码(HuffmanCoding)是一种编码方式,哈夫曼编码是可变字长编码(VLC)的一种。该方法依据字符出现概率来构造异字头的平均长度最短的码字,有时称之为最佳编码,一般就叫作Huffman编码。 实现过程: 1.计算每个字符在字符串中出现的频率作为构建h
名词解释:哈夫曼编码(Huffman Coding)是一种编码方式,哈夫曼编码是可变字长编码(VLC)的一种。该方法依据字符出现概率来构造异字头的平均长度最短的码字,有时称之为最佳编码,一般就叫作Huffman编码。
实现过程:
1.计算每个字符在字符串中出现的频率作为构建huffman树的权重
2.构建huffman树
3.建立每个字符对应的编码表
4.重建字符串编码,既压缩字符串
5.解压时根据先前的huffman树和字符位长度还原字符串
<?php /** 基于静态huffman编码的压缩[PHP语言实现] author: lajabs email: aGl0dHlvQGdtYWlsLmNvbQ== 本文以PHP作为描述语言较详细讲解huffman树原理及应用 因保证程序可读性,故不做优化. */ class huffman { /** * 压缩入口 * $str:待压缩的字符串 */ public function encode($str) { $len=strlen($str); //计算每个字符权重值(出现的频度)<这边可以做成概率表> for($i=0;$i<$len;$i++) $array[ord($str{$i})]++; $HuffmanArray=array(); asort($array); /** * 构造huffman树,时间复杂度O(nlogn) * 选择两个使用频率较小<字符在字符串中出现的次数>的结点合并生成出一个树 */ while ($item1 = each($array)) { $item2 = each($array); //构建huffman树 $this->creat_tree($item1,$item2,$array,$HuffmanArray); //反复排序<优化这步可在构造树时用插入排序算法完成> asort($array); } $HuffmanArray=array_shift($HuffmanArray); //构建编码表<这步可优化为构建树时一同生成> $tab=null; $code_tab=$this->creat_tab($HuffmanArray,$tab); //压缩&转换整个字符串为二进制表达式 $binary=null; for($i=0;$i<$len;$i++) $binary.=$tab[ord($str{$i})]; //转化为压缩后的字符串 $code=$this->encode_bin($binary); //静态huffman编码算法压缩后需保留huffman树 return array('tree'=>$HuffmanArray,'len'=>strlen($binary),'code'=>$code); } /** * 解压缩入口 * $huffman:解压所使用的huffman树 * $str:被压缩的字符 * $blen:压缩前的位长度 */ public function decode($huffman,$str,$blen) { $len=strlen($str); $binary=null; //将编码解为二进制表达式 for($i=0;$i<$len;$i++) $binary.=str_pad(base_convert(ord($str{$i}),10,2),8,'0',STR_PAD_LEFT); //去除补码 $binary=substr($binary,0,$blen); //从hufman树中配比相应的编码 return $this->decode_tree($binary,$huffman,$huffman); } /** * 将压缩后的二进制表达式再转为字符串 * $binary:二进制表达式字串 */ private function encode_bin($binary) { $len=strlen($binary); //二进制转字符需要整8位,不足8位补0 $blen=$len+8-$len%8; $binary=str_pad($binary,$blen,'0'); $encode=null; //每8位转为一个字符 for($i=7;$i<$blen;$i+=8) { $frag=substr($binary,$i-7,8); $encode.=chr(base_convert($frag,2,10)); } return $encode; } /** * 构造huffman树,使用贪婪算法选择最小的两个元素作为树的子节点 * $item1:权重最小的元素1 * $item2:权重次小的元素2 * $array:所有字符出现次数表<权重表> * $HuffmanArray:保存生成的huffman树结构 */ private function creat_tree($item1,$item2,&$array,&$HuffmanArray) { list($k,$v)=$item1; list($k2,$v2)=$item2; //假设当前树的左右节点为空节点 $c1=$k; $c2=$k2; //判断两个元素若为树则直接作为节点并入主树 if(isset($HuffmanArray[$k2])) { $c2=$HuffmanArray[$k2]; unset($HuffmanArray[$k2]); } if(isset($HuffmanArray[$k])) { $c1=$HuffmanArray[$k]; unset($HuffmanArray[$k]); } //设置树结点权值 $array[$k2]=$v+$v2; //合并节点后删除元素 unset($array[$k]); //合并到huffman树中 $HuffmanArray[$k2]=array(0=>$c1,1=>$c2); } /** * 广度优先遍历树,得到所有原字符对应的二进制表达式<01010...> * $tree:已经构建好的huffman树 * $tab:编码表,保存所有字符对应的编码 * $a0:左遍历树的路径<11010...> * $a1:右遍历树的路径 */ private function creat_tab($tree,&$tab,$a0=null,$a1=null) { if($tree==null) return; //遍历左右子树 foreach($tree as $node=>$ctree) { if(is_array($ctree)) { //判断未到达叶子节点时再向下遍历 $this->creat_tab($ctree,$tab,$a0.$node,$a1.$node); } else { //遍历到叶子节点<原字符ascii码>时的所有路径,既二进制表达式,下同 $tab[$ctree]=${'a'.$node}.$node; } } } /** * 使用进制表达式深度优先遍历树,0为左子树,1为右子树,而到根节点,即为二进制表达式所指向的原字符 * $binary:二进制表达式字串 * $huffman:huffman树 * $tree:当前所遍历的子树 * $i:指向二进制表达式字串的<指针> * $code:解码后的字符串 */ private function decode_tree($binary,$huffman,$tree,$i=0,$code=null) { $lr=$binary{$i}; //遍历完成 if($lr==null) return $code; //判断是否到根节点,根节点既为二进制表达式对应的原字符ascii码 if(is_array($tree[$lr])) { //继续向下遍历子树 return $this->decode_tree($binary,$huffman,$tree[$lr],$i+1,$code); } else { //将二进制表达式解码为原字符 $code.=chr($tree[$lr]); return $this->decode_tree($binary,$huffman,$huffman,$i+1,$code); } } } ?>
$str=' In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". '; $huffman=new huffman(); $obj=$huffman->encode($str); echo '压缩前的编码长度:',strlen($str),"\n"; echo '压缩后的编码:',"\n"; var_dump($obj['code']); echo '解压后的字符:',$huffman->decode($obj['tree'],$obj['code'],$obj['len']);
压缩前的编码长度:587压缩后的编码:string(330) "sp閉h颚?6鵞+王d挓吷s霒zk洚磗脎|t?*?;娳9蹴??>楏4O3 5 F凣rRuJ解压后的字符:In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

With memory compression on Windows 11, your device will choke even with a limited amount of RAM. In this article, we will show you how to enable or disable memory compression on Windows 11. What is memory compression? Memory compression is a feature that compresses data before writing it to RAM, thus providing more storage space on it. Of course, more data stored in physical memory translates into faster system operation and better overall performance. This feature is enabled by default in Windows 11, but if it's somehow not active, you can disable or re-enable it. How to enable memory compression in Windows 11? Click the search bar, type powershell, and click

I found that the compressed package downloaded from a download website will be larger than the original compressed package after decompression. The difference is tens of Kb for a small one and several dozen Mb for a large one. If it is uploaded to a cloud disk or paid space, it does not matter if the file is small. , if there are many files, the storage cost will be greatly increased. I studied it specifically and can learn from it if necessary. Compression level: 9-Extreme compression Dictionary size: 256 or 384, the more compressed the dictionary, the slower it is. The compression rate difference is larger before 256MB, and there is no difference in compression rate after 384MB. Word size: maximum 273 Parameters: f=BCJ2, test and add parameter compression rate will be higher

Large language models (LLMs) have the ability to generate smooth and coherent text, bringing new prospects to areas such as artificial intelligence conversation and creative writing. However, LLM also has some key limitations. First, their knowledge is limited to patterns recognized from training data, lacking a true understanding of the world. Second, reasoning skills are limited and cannot make logical inferences or fuse facts from multiple data sources. When faced with more complex and open-ended questions, LLM's answers may become absurd or contradictory, known as "illusions." Therefore, although LLM is very useful in some aspects, it still has certain limitations when dealing with complex problems and real-world situations. In order to bridge these gaps, retrieval-augmented generation (RAG) systems have emerged in recent years. The core idea is

Many friends need to record screens for office work or transfer files, but sometimes the problem of files that are too large causes a lot of trouble. The following is a solution to the problem of files that are too large, let’s take a look. What to do if the win10 screen recording file is too large: 1. Download the software Format Factory to compress the file. Download address >> 2. Enter the main page and click the "Video-MP4" option. 3. Click "Add File" on the conversion format page and select the MP4 file to be compressed. 4. Click "Output Configuration" on the page to compress the file according to the output quality. 5. Select "Low Quality and Size" from the drop-down configuration list and click "OK". 6. Click "OK" to complete the import of video files. 7. Click "Start" to start the conversion. 8. After completion, you can

Common encoding methods include ASCII encoding, Unicode encoding, UTF-8 encoding, UTF-16 encoding, GBK encoding, etc. Detailed introduction: 1. ASCII encoding is the earliest character encoding standard, using 7-bit binary numbers to represent 128 characters, including English letters, numbers, punctuation marks, control characters, etc.; 2. Unicode encoding is a method used to represent all characters in the world The standard encoding method of characters, which assigns a unique digital code point to each character; 3. UTF-8 encoding, etc.

In-depth analysis of the role and usage of the static keyword in C language. In C language, static is a very important keyword, which can be used in the definition of functions, variables and data types. Using the static keyword can change the link attributes, scope and life cycle of the object. Let’s analyze the role and usage of the static keyword in C language in detail. Static variables and functions: Variables defined using the static keyword inside a function are called static variables, which have a global life cycle

Office workers use wps software very frequently at work. Sometimes they input multiple files a day and then send them to the leader or to a designated location. So how does wps software compress a folder and package it for sending? The editor below will teach you. This operation step. First, organize the files and folders you want to send into the same folder. If you have a lot of files, it's a good idea to name each file so it's easier to identify when sending. Second step, this time click on this large folder and then right-click. Select "Add to archive". Step 3: At this time, the software will automatically help us package our files. Select "Compress to XX.zip". This zip is the packaging format, and then click Compress Now.

Packaging and compression are commonly used operations in Linux, but many users tend to confuse the two concepts. This article will discuss the differences between packaging and compression in Linux systems in detail, and use specific code examples to help readers better understand. First, we need to clarify the difference between packaging and compression. Packaging is the combination of multiple files or directories into a single file, often used to organize, archive, or transfer files. Compression is to compress one or more files through an algorithm to reduce the size of the file, save storage space or speed up the transmission.
