Hadoop Pig Uv
UV计算的方式,有好几种处理方式,粗略计算的方式有 ip, 或者服务器下发一个 id,如百度用的好像是 BDUSS或者BAIDUID之类的 但是要计算手机的用户的UV好像比PC上面复杂一点,这得感谢很多山寨厂商,并且各种平台,还有用户权限很多因素。 如 Android有的版本
UV计算的方式,有好几种处理方式,粗略计算的方式有 ip, 或者服务器下发一个 id,如百度用的好像是 BDUSS或者BAIDUID之类的
但是要计算手机的用户的UV好像比PC上面复杂一点,这得感谢很多山寨厂商,并且各种平台,还有用户权限很多因素。
如 Android有的版本如果要获取用户的 mac或者imei之类的,必须用户授权,ios平台的mac和openudid之类的,也会有这种情况
计算UV的标准,分平台,Android是用mac+imei,而ios是用 mac+openudid。
以android平台分例,会存在如下四种情况
MAC | IMEI |
A | 1 |
1 | |
A | |
B | 2 |
C | |
3 | |
D | 4 |
从上面的例子可以看出存在如下几种情况
1. 有mac 并且有imei
2. 有mac, 无imei
3. 无mac, 有imei
4. 无mac,无imei
计算uv的方法很简单,基本思路如下
1. 先找出有mac并且有imei的做为集合A
2. 找出MAC非空的集合标记为B
3. 找出MAC为空的集合标记为C
4. 用B LEFT JOIN A BY MAC得到集合D
5. FILTER D by imei IS NULL得到集合E
6. 用C LEFT JOIN A BY imei 得到集合F
7. FILTER F by mac IS NULL 得到集合G
8. UNIQUESET = UNION G, E, A
按照上面的计算步骤,用pig实现如下
<code> A = FILTER UVSET BY (mac is not null) AND (imei is not null); B = FILTER UVSET BY (mac is not null); C = FILTER UVSET BY (mac is null); D = JOIN B BY mac LEFT OUTER, A BY mac; E = FILTER D by (A::mac is null); E1 = FOREACH E GENERATE B::mac as mac, B::imei as imei; F = JOIN C BY imei LEFT OUTER, A BY imei; G = FILTER F BY (A::imei is null); G1 = FOREACH G GENERATE G::mac as mac, G::imei as imei; UNIQUESET = UNION G1, E1, A; TMPSET = GROUP UNIQUESET ALL; OUTRES = FOREACH TMPSET GENERATE COUNT(UNIQUESET); DUMP OUTRES; </code>
原文地址:Hadoop Pig Uv, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

WORD is a powerful word processor. We can use word to edit various texts. In Excel tables, we have mastered the calculation methods of addition, subtraction and multipliers. So if we need to calculate the addition of numerical values in Word tables, How to subtract the multiplier? Can I only use a calculator to calculate it? The answer is of course no, WORD can also do it. Today I will teach you how to use formulas to calculate basic operations such as addition, subtraction, multiplication and division in tables in Word documents. Let's learn together. So, today let me demonstrate in detail how to calculate addition, subtraction, multiplication and division in a WORD document? Step 1: Open a WORD, click [Table] under [Insert] on the toolbar, and insert a table in the drop-down menu.

1. First, we right-click the blank space of the taskbar and select the [Task Manager] option, or right-click the start logo, and then select the [Task Manager] option. 2. In the opened Task Manager interface, we click the [Services] tab on the far right. 3. In the opened [Service] tab, click the [Open Service] option below. 4. In the [Services] window that opens, right-click the [InternetConnectionSharing(ICS)] service, and then select the [Properties] option. 5. In the properties window that opens, change [Open with] to [Disabled], click [Apply] and then click [OK]. 6. Click the start logo, then click the shutdown button, select [Restart], and complete the computer restart.

General Matrix Multiplication (GEMM) is a vital part of many applications and algorithms, and is also one of the important indicators for evaluating computer hardware performance. In-depth research and optimization of the implementation of GEMM can help us better understand high-performance computing and the relationship between software and hardware systems. In computer science, effective optimization of GEMM can increase computing speed and save resources, which is crucial to improving the overall performance of a computer system. An in-depth understanding of the working principle and optimization method of GEMM will help us better utilize the potential of modern computing hardware and provide more efficient solutions for various complex computing tasks. By optimizing the performance of GEMM

How to use Python's count() function to calculate the number of an element in a list requires specific code examples. As a powerful and easy-to-learn programming language, Python provides many built-in functions to handle different data structures. One of them is the count() function, which can be used to count the number of elements in a list. In this article, we will explain how to use the count() function in detail and provide specific code examples. The count() function is a built-in function of Python, used to calculate a certain

Quickly learn how to open and process CSV format files. With the continuous development of data analysis and processing, CSV format has become one of the widely used file formats. A CSV file is a simple and easy-to-read text file with different data fields separated by commas. Whether in academic research, business analysis or data processing, we often encounter situations where we need to open and process CSV files. The following guide will show you how to quickly learn to open and process CSV format files. Step 1: Understand the CSV file format First,

In C#, there is a Math class library, which contains many mathematical functions. These include the function Math.Pow, which calculates powers, which can help us calculate the power of a specified number. The usage of the Math.Pow function is very simple, you only need to specify the base and exponent. The syntax is as follows: Math.Pow(base,exponent); where base represents the base and exponent represents the exponent. This function returns a double type result, that is, the power calculation result. Let's

In the process of PHP development, dealing with special characters is a common problem, especially in string processing, special characters are often escaped. Among them, converting special characters into single quotes is a relatively common requirement, because in PHP, single quotes are a common way to wrap strings. In this article, we will explain how to handle special character conversion single quotes in PHP and provide specific code examples. In PHP, special characters include but are not limited to single quotes ('), double quotes ("), backslash (), etc. In strings

Generators in PHP7: How to handle large-scale data efficiently and save memory? Overview: PHP7 introduces generators as a powerful tool in terms of large-scale data processing and memory saving. Generators are a special type of function in the PHP language. Unlike ordinary functions, generators can pause execution and return intermediate results instead of returning all results at once. This makes the generator ideal for processing large batches of data, reducing memory usage and improving processing efficiency. This article will introduce students
