python实现bitmap数据结构详解-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

python实现bitmap数据结构详解

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 16, 2016 am 08:45 AM

bitmap data structure

bitmap是很常用的数据结构，比如用于Bloom Filter中；用于无重复整数的排序等等。bitmap通常基于数组来实现，数组中每个元素可以看成是一系列二进制数，所有元素组成更大的二进制集合。对于Python来说，整数类型默认是有符号类型，所以一个整数的可用位数为31位。

bitmap实现思路

bitmap是用于对每一位进行操作。举例来说，一个Python数组包含4个32位有符号整型，则总共可用位为4 * 31 = 124位。如果要在第90个二进制位上操作，则要先获取到操作数组的第几个元素，再获取相应的位索引，然后执行操作。

python实现bitmap数据结构详解

上图所示为一个32位整型，在Python中默认是有符号类型，最高位为符号位，bitmap不能使用它。左边是高位，右边是低位，最低位为第0位。

bitmap是用于对每一位进行操作。举例来说，一个Python数组包含4个32位有符号整型，则总共可用位为4 * 31 = 124位。如果要在第90个二进制位上操作，则要先获取到操作数组的第几个元素，再获取相应的位索引，然后执行操作。

初始化bitmap

首先需要初始化bitmap。拿90这个整数来说，因为单个整型只能使用31位，所以90除以31并向上取整则可得知需要几个数组元素。代码如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = int((max + 31 - 1) / 31) #向上取整

if __name__ == '__main__':
bitmap = Bitmap(90)
print '需要 %d 个元素。' % bitmap.size

复制代码代码如下:

$ python bitmap.py
需要 3 个元素。

计算在数组中的索引

计算在数组中的索引其实是跟之前计算数组大小是一样的。只不过之前是对最大数计算，现在换成任一需要存储的整数。但是有一点不同，计算在数组中的索引是向下取整，所以需要修改calcElemIndex方法的实现。代码改为如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

if __name__ == '__main__':
bitmap = Bitmap(90)
print '数组需要 %d 个元素。' % bitmap.size
print '47 应存储在第 %d 个数组元素上。' % bitmap.calcElemIndex(47)

复制代码代码如下:

$ python bitmap.py
数组需要 3 个元素。
47 应存储在第 1 个数组元素上。

所以获取最大整数很重要，否则有可能创建的数组容纳不下某些数据。

计算在数组元素中的位索引

数组元素中的位索引可以通过取模运算来得到。令需存储的整数跟31取模即可得到位索引。代码改为如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

def calcBitIndex(self, num):
return num % 31

if __name__ == '__main__':
bitmap = Bitmap(90)
print '数组需要 %d 个元素。' % bitmap.size
print '47 应存储在第 %d 个数组元素上。' % bitmap.calcElemIndex(47)
print '47 应存储在第 %d 个数组元素的第 %d 位上。' % (bitmap.calcElemIndex(47), bitmap.calcBitIndex(47),)

别忘了是从第0位算起哦。

置1操作

二进制位默认是0，将某位置1则表示在此位存储了数据。代码改为如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

def calcBitIndex(self, num):
return num % 31

def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1

if __name__ == '__main__':
bitmap = Bitmap(90)
bitmap.set(0)
print bitmap.array

因为从第0位算起，所以如需要存储0，则需要把第0位置1。

清0操作

将某位置0，也即丢弃已存储的数据。代码如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

def calcBitIndex(self, num):
return num % 31

def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1

def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1

if __name__ == '__main__':
bitmap = Bitmap(87)
bitmap.set(0)
bitmap.set(34)
print bitmap.array
bitmap.clean(0)
print bitmap.array
bitmap.clean(34)
print bitmap.array

清0和置1是互反操作。

测试某位是否为1

判断某位是否为1是为了取出之前所存储的数据。代码如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

def calcBitIndex(self, num):
return num % 31

def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1

def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1

def test(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  if self.array[elemIndex] & (1    return True
  return False

if __name__ == '__main__':
bitmap = Bitmap(90)
bitmap.set(0)
print bitmap.array
print bitmap.test(0)
bitmap.set(1)
print bitmap.test(1)
print bitmap.test(2)
bitmap.clean(1)
print bitmap.test(1)

复制代码代码如下:

$ python bitmap.py
[1, 0, 0]
True
True
False
False

接下来实现一个不重复数组的排序。已知一个无序非负整数数组的最大元素为879，请对其自然排序。代码如下：

复制代码代码如下:

#!/usr/bin/env python
#coding: utf8

class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]

def calcElemIndex(self, num, up=False):
  '''up为True则为向上取整, 否则为向下取整'''
  if up:
   return int((num + 31 - 1) / 31) #向上取整
  return num / 31

def calcBitIndex(self, num):
return num % 31

def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1

def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1

def test(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  if self.array[elemIndex] & (1    return True
  return False

if __name__ == '__main__':
MAX = 879
suffle_array = [45, 2, 78, 35, 67, 90, 879, 0, 340, 123, 46]
result       = []
bitmap = Bitmap(MAX)
for num in suffle_array:
  bitmap.set(num)

for i in range(MAX + 1):
  if bitmap.test(i):
   result.append(i)

print '原始数组为: %s' % suffle_array
print '排序后的数组为: %s' % result

bitmap实现了，则利用其进行排序就非常简单了。其它语言也同样可以实现bitmap，但对于静态类型语言来说，比如C/Golang这样的语言，因为可以直接声明无符号整型，所以可用位就变成32位，只需将上述代码中的31改成32即可，这点请大家注意。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7611

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Compare complex data structures using Java function comparison Apr 19, 2024 pm 10:24 PM

When using complex data structures in Java, Comparator is used to provide a flexible comparison mechanism. Specific steps include: defining the comparator class, rewriting the compare method to define the comparison logic. Create a comparator instance. Use the Collections.sort method, passing in the collection and comparator instances.

Java data structures and algorithms: in-depth explanation May 08, 2024 pm 10:12 PM

Data structures and algorithms are the basis of Java development. This article deeply explores the key data structures (such as arrays, linked lists, trees, etc.) and algorithms (such as sorting, search, graph algorithms, etc.) in Java. These structures are illustrated through practical examples, including using arrays to store scores, linked lists to manage shopping lists, stacks to implement recursion, queues to synchronize threads, and trees and hash tables for fast search and authentication. Understanding these concepts allows you to write efficient and maintainable Java code.

In-depth understanding of reference types in Go language Feb 21, 2024 pm 11:36 PM

Reference types are a special data type in the Go language. Their values do not directly store the data itself, but the address of the stored data. In the Go language, reference types include slices, maps, channels, and pointers. A deep understanding of reference types is crucial to understanding the memory management and data transfer methods of the Go language. This article will combine specific code examples to introduce the characteristics and usage of reference types in Go language. 1. Slices Slices are one of the most commonly used reference types in the Go language.

PHP data structure: The balance of AVL trees, maintaining an efficient and orderly data structure Jun 03, 2024 am 09:58 AM

AVL tree is a balanced binary search tree that ensures fast and efficient data operations. To achieve balance, it performs left- and right-turn operations, adjusting subtrees that violate balance. AVL trees utilize height balancing to ensure that the height of the tree is always small relative to the number of nodes, thereby achieving logarithmic time complexity (O(logn)) search operations and maintaining the efficiency of the data structure even on large data sets.

Full analysis of Java collection framework: dissecting data structure and revealing the secret of efficient storage Feb 23, 2024 am 10:49 AM

Overview of Java Collection Framework The Java collection framework is an important part of the Java programming language. It provides a series of container class libraries that can store and manage data. These container class libraries have different data structures to meet the data storage and processing needs in different scenarios. The advantage of the collection framework is that it provides a unified interface, allowing developers to operate different container class libraries in the same way, thereby reducing the difficulty of development. Data structures of the Java collection framework The Java collection framework contains a variety of data structures, each of which has its own unique characteristics and applicable scenarios. The following are several common Java collection framework data structures: 1. List: List is an ordered collection that allows elements to be repeated. Li

PHP SPL data structures: Inject speed and flexibility into your projects Feb 19, 2024 pm 11:00 PM

Overview of the PHPSPL Data Structure Library The PHPSPL (Standard PHP Library) data structure library contains a set of classes and interfaces for storing and manipulating various data structures. These data structures include arrays, linked lists, stacks, queues, and sets, each of which provides a specific set of methods and properties for manipulating data. Arrays In PHP, an array is an ordered collection that stores a sequence of elements. The SPL array class provides enhanced functions for native PHP arrays, including sorting, filtering, and mapping. Here is an example of using the SPL array class: useSplArrayObject;$array=newArrayObject(["foo","bar","baz"]);$array

Hash table-based data structure optimizes PHP array intersection and union calculations May 02, 2024 pm 12:06 PM

The hash table can be used to optimize PHP array intersection and union calculations, reducing the time complexity from O(n*m) to O(n+m). The specific steps are as follows: Use a hash table to map the elements of the first array to a Boolean value to quickly find whether the element in the second array exists and improve the efficiency of intersection calculation. Use a hash table to mark the elements of the first array as existing, and then add the elements of the second array one by one, ignoring existing elements to improve the efficiency of union calculations.

Learn the secrets of Go language data structures in depth Mar 29, 2024 pm 12:42 PM

In-depth study of the mysteries of Go language data structure requires specific code examples. As a concise and efficient programming language, Go language also shows its unique charm in processing data structures. Data structure is a basic concept in computer science, which aims to organize and manage data so that it can be accessed and manipulated more efficiently. By in-depth learning the mysteries of Go language data structure, we can better understand how data is stored and operated, thereby improving programming efficiency and code quality. 1. Array Array is one of the simplest data structures

See all articles