【redis存储结构设计】存储坐标点及其多维度点击数
伊谢尔伦
伊谢尔伦 2017-04-22 08:59:44
0
2
853

我现在有个需求

需要记录页面点击数据,上游吐到redis中,

上游怎么吐到redis中对我们来说是透明的,

我们只用关心redis中如何存储就好。


查询需求:

  1. 查询某天某页面下所有点击数,即有效点击总数+无效点击总数

  2. 查询某天某页面某分辨率下 所有有效点击总数无效点击总数

  3. 查询某天某页面某分辨率下所有的坐标点及点击数

  4. 框选查询(相当于范围查询) 查询某天某页面某分辨率下 某个范围(比如100<x<1000,30<y<600)坐标点的有效点击总数无效点击总数
    同时还有各种维度的有效点击数和无效点击数

需求解释:

关于有效点击和无效点击:我们进行存储时可以用0和1区分,至于前端如何定义有效或者无效,对我们透明。

关于分辨率:按宽度区分共有三种:比如1380 1190 1000; 根据现有实现:有了分辨率可以将zset切割的小一些,比如没有分辨率可能有共10w个key 的zset,有了分辨率我一次最多查询某个分辨率下 可能只有3w个key 的zset

关于框选: 就是用鼠标在页面上从左上到右下划出一个框, 我们会查询这个选择框范围(如100<x<1000,30<y<600)内所有的点相关的数据。

关于维度: 就是点击这个点的用户 所在地区所使用浏览器

当前实现

上游吐过来的点经过处理存入redis,
x,y都经过

Math.ceil(realx / 4.0) * 4;
Math.ceil(realy / 4.0) * 4;

处理,即相当于4个点为一个点存储到redis.

使用4个zset来实现需求。

一个 zset 记录某天某页面某分辨率的数据
key 为 date_pageid_分辨率 member为: 有效OR无效_ 浏览器_ 地区
score 为点击数
举例key : 20140908_0001_1000
member: 0_1_1 0对应无效点击,1对应浏览器表中的QQ浏览器,1对应地区表中的上海
score:10


每个坐标点相关数据都用一个对应的zset记录
key为 date_pageid_分辨率_ 横坐标_ 纵坐标
member为: 有效OR无效浏览器地区
score为点击数
举例key : 20140908_0001_1000_23_478
member: 0_1_2 0对应无效点击,1对应浏览器表中的QQ浏览器,2对应地区表中的北京
score:12
这样可以理解为,坐标为(23,478)这个点,在20140908这一天,pageid为0001的页面上,
分辨率为1000的时候,来自北京地区的,使用QQ浏览器,进行的无效点击数为12


两个zset 做辅助范围查询

通过zrangebyscore 分别获得x,y范围(如100<x<1000,30<y<600)对应的key集

然后取交集获得需要查询的真正key集

y的辅助查询zet
key为: date_pageid_分辨率y eg.20140908_0001_1000_y
member: 为 ​date_pageid
分辨率_ 横坐标 _纵坐标 eg.20140908_0001_1000_23_478
score为:横坐标y的值​ eg.478

x的辅助查询zet
key为: date_pageid_分辨率x eg.20140908_0001_1000_x
member: 为 ​date_pageid
分辨率_ 横坐标 _纵坐标 eg.20140908_0001_1000_23_478
score为:横坐标X的值​ eg.23


当前实现存在的问题

查询速度太慢

举例比如我想一次取出某天某页面某分辨率下所有的点
可能需要一次查询几万个key eg. keys("20140908_0001_1000_*");
获得查询的key集之后 ,还需要使用zrange(key) 得到每个key下的member集,然后再使用
zscore(key,member) 获得对应的key和 member下的score值

可以看到这个操作: 串行化执行,不容易改成并行化。

暂时的解决方案:可以利用异步任务执行 ,进行缓存以优化查询速度,但是有可能引起redis慢查询问题。


框选行为
举例:查询范围(如100<x<1000,30<y<600)

使用zrangeByScore(key, 100, 1000)``zrangeByScore(key, 30, 600)

查出x,y在各自范围分别对应的key集,然后取交集 获得最终需要查询的key集

获得查询的key集之后 ,还需要使用zrange(key) 得到每个key下的member集,

然后再使用zscore(key,member) 获得对应的key和 member下的score值

缺点:因为查询范围不定,所以无法进行缓存,当查询范围很大时,即key很多的时候,查询速度很慢。和上面查询坐标点一样串行化执行,不容易改成并行化。有可能引起redis慢查询问题。


不知道大家针对我现在的实现方案有什么更好的优化策略
或者针对查询需求有没有什么更好的设计方案
新人第一次发帖,感谢@暗雨西喧对排版的提醒。
请大家多指教。

伊谢尔伦
伊谢尔伦

小伙看你根骨奇佳,潜力无限,来学PHP伐。

reply all(2)
PHPzhong

That is, when there are many keys, the query speed is very slow

Many of the key queries are slow. Does this mean that the zset actually clicked on the last query is used?

Not sure how many resolutions there will be? You can modify the key of zset not to have resolution, but to have resolution in value. This can reduce a lot of keys. If your search conditions have resolution, you can do some filtering after searching for value, and the speed should be very fast.

But the box selection behavior is because the range is variable
Frame selection query (equivalent to range query) Query on a certain day, a certain page, a certain resolution
The total number of valid clicks and the total number of invalid clicks at coordinate points in a certain range (such as 100<x<1000,30<y<600).

It’s like asking users to manually draw an area for search. Can you consider changing this condition to include the entire image? Cut into 10 parts (100 parts, 10,000 parts). Each part is a square. The condition can only select a certain square, rather than just drawing it randomly. In this way, the data in each square can be "summarized" predictably. .

Let’s talk about these first, see if it helps, if you still need to optimize, you can modify the query description in the question. There are some places that can be supplemented by your brain, but I don’t know if you want to express this, so I will give you a simpler one. Write the examples in detail and use typesetting, it looks very tiring


I wrote them separately. Here is what you have done after correcting the question

First of all, you are not using the essence of zset, which is automatically sorting the index according to scop. It seems that you must not understand the resolution I mentioned above when you put it in value. Let me give you an example

A zset records the data of a certain page and a certain resolution on a certain day
The key is date_pageid_resolution and the member is: valid OR invalid_browser_region
score is the number of clicks
Example: key : 20140908_0001_1000
member: 0_1_1 0 corresponds to invalid clicks, 1 corresponds to QQ browser in the browser table, 1 corresponds to Shanghai in the region table
score:10

Suppose there are 3 resolutions: A, B, C
Saving the key as you said will look like this
20140908_0001_A
20140908_0001_B
20140908_0001_C
The storage method I am talking about is
key:20140908_0001
member:valid OR invalid_browser_region_number of clicks
score:resolution

When searching like this, you actually only need to get the 0001 page of the day 20140908 (just 1 key), and then range A resolution and look at its members. This is not easy to use because it does not display the nice resolution. It's not interesting here. There are problems with using zset in this case.

The above is just an example! Actually, don't do this. There is a better way. After you revised the question and understood the requirements, I came up with a new approach.

zset:data set
key:date-page-resolution
score: coordinates (think about turning x and y into a number)
member: browser-region-number of valid clicks-number of invalid clicks

If the date becomes an optional range, this set is needed to specifically store the date. We call it: date set
key:page
score:date
member:data set key
The purpose of the date set is to index the data set key. Your method of using key() is very slow because it will perform an all search. Your example is a certain day. I understand that there may be no date range, so the date set can be unnecessary. Similarly, if there are too many resolutions and it is impossible to master, you can also imitate this set to make a collection of keys!

Then there are two coordinates zset. I didn’t look at them carefully. Let’s think carefully about using zset.

You gave 4 query examples below

A Query the number of all clicks on a certain page on a certain day, that is, the total number of valid clicks + the total number of invalid clicks

B Query the total number of valid clicks and the total number of invalid clicks on a certain page and a certain resolution on a certain day

C Query all the coordinate points and number of clicks on a certain page and a certain resolution on a certain day

D Box selection query (equivalent to range query) Query the total number of valid clicks and the total number of invalid clicks at a certain range (such as 100<x<1000,30<y<600) coordinate points on a certain day at a certain resolution on a certain page

A: You said there are 3 resolutions, then add 3 resolutions after the key, range 0 and -1 are all taken
20150415-page1-1380,20150415-page1-1190,20150415-page1-1000

B: This is good. Just check one key and get all range 0 and -1
20150415-page1-1380

C: Okay, you can also get the coordinates for the first two, but you don’t have a show

D: After using your coordinate set to get the key, check the data set range coordinates

I finished writing, but I found a small problem when checking for typos. It seems that you need to record the valid and invalid browsers in each region? If it is not necessary, the member in the data set can just record valid and invalid numbers. If it is necessary, the design needs to be considered based on the number of browsers in the region. Your question does not seem to introduce this aspect.

Ty80

Maybe my understanding of redis is different from the questioner’s. According to my idea, to achieve the above requirements may be

Remember log, etl transfer data

Finally available for inquiry

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template