golang has no in. Golang does not provide a similar Python operator in, nor does it provide such a standard library function like other languages, such as in_array in PHP. Reasons: 1. The implementation of the in function is very simple and unnecessary; 2. In different scenarios, we also need to analyze which method to implement based on the actual situation, rather than a fixed method.
The operating environment of this tutorial: Windows 7 system, GO version 1.18, Dell G3 computer.
I saw a question on Zhihu before: Why doesn’t Golang have the same function as in in Python? So, I searched for this question and found that many people still have such questions.
Let’s talk about this topic today.
in is a very commonly used function. It may also be called contains in some languages. Although the expressions in different languages are different, they are basically there. But unfortunately, Go does not. It does not provide a similar Python operator in, nor does it provide such a standard library function like other languages, such as in_array in PHP.
Go’s philosophy is that less is more. I think maybe the Go team feels this is a feature that is not practical to implement.
Why do you say it is insignificant? If you want to implement it yourself, how should you do it?
There are three implementation methods that I can think of, one is traversal, the other is sort’s binary search, and the third is map’s key index.
Traversal
Traversal should be the simplest implementation method that we can easily think of.
The example is as follows:
func InIntSlice(haystack []int, needle int) bool { for _, e := range haystack { if e == needle { return true } } return false }
The above demonstrates how to find an example of whether a specified int exists in a []int type variable. Isn’t it very simple? From this, we can also feel why I Say it's trivial to implement.
This example has a flaw, it only supports a single type. If you want to support general in functionality like interpreted languages, you need to use reflection.
The code is as follows:
func In(haystack interface{}, needle interface{}) (bool, error) { sVal := reflect.ValueOf(haystack) kind := sVal.Kind() if kind == reflect.Slice || kind == reflect.Array { for i := 0; i < sVal.Len(); i++ { if sVal.Index(i).Interface() == needle { return true, nil } } return false, nil } return false, ErrUnSupportHaystack }
In order to be more versatile, the input parameters haystack and needle of the In function are both interface{} types.
Let’s briefly talk about the benefits of all input parameters being interface{}. There are two main points, as follows:
First, haystack is of interface{} type, so The types supported by in are not limited to slice, but also include array. We see that the function internally performs type checking on haystack through reflection, and supports slices and arrays. If it is other types, an error will be prompted. Adding new type support, such as map, is actually very simple. But this method is not recommended, because the effect of in can be achieved through the syntax of _, ok := m[k].
Second, haystack is interface{}, then []interface{} also meets the requirements, and needle is interface{}. In this way, we can achieve the same effect as an interpreted language.
How to understand? A direct example is as follows:
gotin.In([]interface{}{1, "two", 3}, "two")
haystack is []interface{}{1, "two", 3}, and needle is interface{}, and the value at this time is "two". It seems that in an interpreted language, elements can be of any type and do not have to have the same effect. In this way, we can use it as we please.
But one thing to note is that there is such a piece of code in the implementation of the In function:
if sVal.Index(i).Interface() == needle { ... }
Not all types in Go can be compared using ==. If the element contains slice or map, An error may be reported.
Binary search
There is a disadvantage in traversing to confirm whether an element exists, that is, if the array or slice contains a large amount of data, such as 1,000,000 pieces of data, that is, one million. The worst case scenario is that we have to traverse 1,000,000 times to confirm, and the time complexity is On.
Is there any way to reduce the number of traversals?
The method that naturally comes to mind is binary search, whose time complexity is log2(n). But this algorithm has a prerequisite and relies on an ordered sequence.
So, the first problem we need to solve is to order the sequence. Go's standard library already provides this function under the sort package.
The sample code is as follows:
fmt.Println(sort.SortInts([]int{4, 2, 5, 1, 6}))
For []int, the function we use is SortInts. If it is other types of slices, sort also provides related functions, such as []string through SortStrings Sort.
After sorting, you can perform a binary search. Fortunately, Go also provides this function. The corresponding function of the []int type is SearchInts.
Let’s briefly introduce this function, first look at the definition:
func SearchInts(a []int, x int) int
The input parameters are easy to understand, search for x from slice a. The key point is to talk about the return value, which is crucial for us to confirm whether the element exists later. The meaning of the return value is to return the position of the element in the slice. If the element does not exist, it will return the position at which the element should be inserted while keeping the slice in order.
For example, the sequence is as follows:
1 2 6 8 9 11
Assume that x is 6, and after searching, it will be found that its position is at index 2; if x is 7, it is found that the element does not exist, and if it is inserted into the sequence , will be placed between 6 and 8, and the index position is 3, so the return value is 3.
Code test:
fmt.Println(sort.SearchInts([]int{1, 2, 6, 8, 9, 11}, 6)) // 2 fmt.Println(sort.SearchInts([]int{1, 2, 6, 8, 9, 11}, 7)) // 3
如果判断元素是否在序列中,只要判断返回位置上的值是否和查找的值相同即可。
但还有另外一种情况,如果插入元素位于序列最后,例如元素值为 12,插入位置即为序列的长度 6。如果直接查找 6 位置上的元素就可能发生越界的情况。那怎么办呢?其实判断返回是否小于切片长度即可,小于则说明元素不在切片序列中。
完整的实现代码如下:
func SortInIntSlice(haystack []int, needle int) bool { sort.Ints(haystack) index := sort.SearchInts(haystack, needle) return index < len(haystack) && haystack[index] == needle }
但这还有个问题,对于无序的场景,如果每次查询都要经过一次排序并不划算。最好能实现一次排序,稍微修改下代码。
func InIntSliceSortedFunc(haystack []int) func(int) bool { sort.Ints(haystack) return func(needle int) bool { index := sort.SearchInts(haystack, needle) return index < len(haystack) && haystack[index] == needle } }
上面的实现,我们通过调用 InIntSliceSortedFunc 对 haystack 切片排序,并返回一个可多次使用的函数。
使用案例如下:
in := gotin.InIntSliceSortedFunc(haystack) for i := 0; i<maxNeedle; i++ { if in(i) { fmt.Printf("%d is in %v", i, haystack) } }
二分查找的方式有什么不足呢?
我想到的重要一点,要实现二分查找,元素必须是可排序的,如 int,string,float 类型。而对于结构体、切片、数组、映射等类型,使用起来就不是那么方便,当然,如果要用,也是可以的,不过需要我们进行一些适当扩展,按指定标准排序,比如结构的某个成员。
到此,二分查找的 in 实现就介绍完毕了。
map key
本节介绍 map key 方式。它的算法复杂度是 O1,无论数据量多大,查询性能始终不变。它主要依赖的是 Go 中的 map 数据类型,通过 hash map 直接检查 key 是否存在,算法大家应该都比较熟悉,通过 key 可直接映射到索引位置。
我们常会用到这个方法。
_, ok := m[k] if ok { fmt.Println("Found") }
那么它和 in 如何结合呢?一个案例就说明白了这个问题。
假设,我们有一个 []int 类型变量,如下:
s := []int{1, 2, 3}
为了使用 map 的能力检查某个元素是否存在,可以将 s 转化 map[int]struct{}。
m := map[interface{}]struct{}{ 1: struct{}{}, 2: struct{}{}, 3: struct{}{}, 4: struct{}{}, }
如果检查某个元素是否存在,只需要通过如下写法即可确定:
k := 4 if _, ok := m[k]; ok { fmt.Printf("%d is found\n", k) }
是不是非常简单?
补充一点,关于这里为什么使用 struct{},可以阅读我之前写的一篇关于 Go 中如何使用 set 的文章。
按照这个思路,实现函数如下:
func MapKeyInIntSlice(haystack []int, needle int) bool { set := make(map[int]struct{}) for _ , e := range haystack { set[e] = struct{}{} } _, ok := set[needle] return ok }
实现起来不难,但和二分查找有着同样的问题,开始要做数据处理,将 slice 转化为 map。如果是每次数据相同,稍微修改下它的实现。
func InIntSliceMapKeyFunc(haystack []int) func(int) bool { set := make(map[int]struct{}) for _ , e := range haystack { set[e] = struct{}{} } return func(needle int) bool { _, ok := set[needle] return ok } }
对于相同的数据,它会返回一个可多次使用的 in 函数,一个使用案例如下:
in := gotin.InIntSliceMapKeyFunc(haystack) for i := 0; i<maxNeedle; i++ { if in(i) { fmt.Printf("%d is in %v", i, haystack) } }
对比前两种算法,这种方式的处理效率最高,非常适合于大数据的处理。接下来的性能测试,我们将会看到效果。
性能
介绍完所有方式,我们来实际对比下每种算法的性能。测试源码位于 gotin_test.go 文件中。
基准测试主要是从数据量大小考察不同算法的性能,本文中选择了三个量级的测试样本数据,分别是 10、1000、1000000。
为便于测试,首先定义了一个用于生成 haystack 和 needle 样本数据的函数。
代码如下:
func randomHaystackAndNeedle(size int) ([]int, int){ haystack := make([]int, size) for i := 0; i<size ; i++{ haystack[i] = rand.Int() } return haystack, rand.Int() }
输入参数是 size,通过 http://rand.Int() 随机生成切片大小为 size 的 haystack 和 1 个 needle。在基准测试用例中,引入这个随机函数生成数据即可。
举个例子,如下:
func BenchmarkIn_10(b *testing.B) { haystack, needle := randomHaystackAndNeedle(10) b.ResetTimer() for i := 0; i < b.N; i++ { _, _ = gotin.In(haystack, needle) } }
首先,通过 randomHaystackAndNeedle 随机生成了一个含有 10 个元素的切片。因为生成样本数据的时间不应该计入到基准测试中,我们使用 b.ResetTimer() 重置了时间。
其次,压测函数是按照 Test+函数名+样本数据量 规则编写,如案例中 BenchmarkIn_10,表示测试 In 函数,样本数据量为 10。如果我们要用 1000 数据量测试 InIntSlice,压测函数名为 BenchmarkInIntSlice_1000。
测试开始吧!简单说下我的笔记本配置,Mac Pro 15 版,16G 内存,512 SSD,4 核 8 线程的 CPU。
测试所有函数在数据量在 10 的情况下的表现。
$ go test -run=none -bench=10$ -benchmem
匹配所有以 10 结尾的压测函数。
测试结果:
goos: darwin goarch: amd64 pkg: github.com/poloxue/gotin BenchmarkIn_10-8 3000000 501 ns/op 112 B/op 11 allocs/op BenchmarkInIntSlice_10-8 200000000 7.47 ns/op 0 B/op 0 allocs/op BenchmarkInIntSliceSortedFunc_10-8 100000000 22.3 ns/op 0 B/op 0 allocs/op BenchmarkSortInIntSlice_10-8 10000000 162 ns/op 32 B/op 1 allocs/op BenchmarkInIntSliceMapKeyFunc_10-8 100000000 17.7 ns/op 0 B/op 0 allocs/op BenchmarkMapKeyInIntSlice_10-8 3000000 513 ns/op 163 B/op 1 allocs/op PASS ok github.com/poloxue/gotin 13.162s
表现最好的并非 SortedFunc 和 MapKeyFunc,而是最简单的针对单类型的遍历查询,平均耗时 7.47ns/op,当然,另外两种方式表现也不错,分别是 22.3ns/op 和 17.7ns/op。
表现最差的是 In、SortIn(每次重复排序) 和 MapKeyIn(每次重复创建 map)两种方式,平均耗时分别为 501ns/op 和 513ns/op。
测试所有函数在数据量在 1000 的情况下的表现。
$ go test -run=none -bench=1000$ -benchmem
测试结果:
goos: darwin goarch: amd64 pkg: github.com/poloxue/gotin BenchmarkIn_1000-8 30000 45074 ns/op 8032 B/op 1001 allocs/op BenchmarkInIntSlice_1000-8 5000000 313 ns/op 0 B/op 0 allocs/op BenchmarkInIntSliceSortedFunc_1000-8 30000000 44.0 ns/op 0 B/op 0 allocs/op BenchmarkSortInIntSlice_1000-8 20000 65401 ns/op 32 B/op 1 allocs/op BenchmarkInIntSliceMapKeyFunc_1000-8 100000000 17.6 ns/op 0 B/op 0 allocs/op BenchmarkMapKeyInIntSlice_1000-8 20000 82761 ns/op 47798 B/op 65 allocs/op PASS ok github.com/poloxue/gotin 11.312s
表现前三依然是 InIntSlice、InIntSliceSortedFunc 和 InIntSliceMapKeyFunc,但这次顺序发生了变化,MapKeyFunc 表现最好,17.6 ns/op,与数据量 10 的时候相比基本无变化。再次验证了前文的说法。
同样的,数据量 1000000 的时候。
$ go test -run=none -bench=1000000$ -benchmem
测试结果如下:
goos: darwin goarch: amd64 pkg: github.com/poloxue/gotin BenchmarkIn_1000000-8 30 46099678 ns/op 8000098 B/op 1000001 allocs/op BenchmarkInIntSlice_1000000-8 3000 424623 ns/op 0 B/op 0 allocs/op BenchmarkInIntSliceSortedFunc_1000000-8 20000000 72.8 ns/op 0 B/op 0 allocs/op BenchmarkSortInIntSlice_1000000-8 10 138873420 ns/op 32 B/op 1 allocs/op BenchmarkInIntSliceMapKeyFunc_1000000-8 100000000 16.5 ns/op 0 B/op 0 allocs/op BenchmarkMapKeyInIntSlice_1000000-8 10 156215889 ns/op 49824225 B/op 38313 allocs/op PASS ok github.com/poloxue/gotin 15.178s
MapKeyFunc 依然表现最好,每次操作用时 17.2 ns,Sort 次之,而 InIntSlice 呈现线性增加的趋势。一般情况下,如果不是对性能要特殊要求,数据量特别大的场景,针对单类型的遍历已经有非常好的性能了。
从测试结果可以看出,反射实现的通用 In 函数每次执行需要进行大量的内存分配,方便的同时,也是以牺牲性能为代价的。
总结
本文通过一个问题引出主题,为什么 Go 中没有类似 Python 的 In 方法。我认为,一方面是实现非常简单,没有必要。除此以外,另一方面,在不同场景下,我们还需要根据实际情况分析用哪种方式实现,而不是一种固定的方式。
接着,我们介绍了 In 实现的三种方式,并分析了各自的优劣。通过性能分析测试,我们能得出大致的结论,什么方式适合什么场景,但总体还是不能说足够细致,有兴趣的朋友可以继续研究下。
The above is the detailed content of Does golang have in. For more information, please follow other related articles on the PHP Chinese website!