PHP实现简单线性回归之数据研究工具-php手册-php.cn

Home

php教程

php手册

PHP实现简单线性回归之数据研究工具

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 13, 2016 am 10:33 AM

php return Basic accomplish tool Modeling data concept Target Research Simple Linear back

概念

　　简单线性回归建模背后的基本目标是从成对的 X值和 Y值（即 X和 Y测量值）组成的二维平面中找到最吻合的直线。一旦用 最小方差法找到这条直线，就可以执行各种统计测试，以确定这条直线与观测到的 Y值的偏离量吻合程度。

　　线性方程（ y = mx + b）有两个参数必须根据所提供的 X和 Y数据估算出来，它们是斜率（ m）和 y 轴截距（ b）。一旦估算出这两个参数，就可以将观测值输入线性方程，并观察方程所生成的 Y预测值。

　　要使用最小方差法估算出 m和 b参数，就要找到 m 和 b 的估计值，使它们对于所有的 X值得到的 Y值的观测值和预测值最小。观测值和预测值之差称为误差（ y _i- (mx _i+ b) ），并且，如果对每个误差值都求平方，然后求这些残差的和，其结果是一个被称为 预测平方差的数。使用最小方差法来确定最吻合的直线涉及寻找使预测方差最小的 m和 b的估计值。

　　可以用两种基本方法来找到满足最小方差法的估计值 m和 b。第一种方法，可以使用数值搜索过程设定不同的 m和 b值并对它们求值，最终决定产生最小方差的估计值。第二种方法是使用微积分找到用于估算 m和 b 的方程。我不打算深入讨论推导出这些方程所涉及的微积分，但我确实在 SimpleLinearRegression 类中使用了这些分析方程，以找到 m和 b 的最小平方估计值（请参阅 SimpleLinearRegression 类中的 getSlope() 和 getYIntercept 方法）。

　　即使拥有了可以用来找到 m和 b的最小平方估计值的方程，也并不意味着只要将这些参数代入线性方程，其结果就是一条与数据良好吻合的直线。这个简单线性回归过程中的下一步是确定其余的预测方差是否可以接受。

　　可以使用统计决策过程来否决“直线与数据吻合”这个备择假设。这个过程基于对 T 统计值的计算，使用概率函数求得随机大的观测值的概率。正如第 1 部分所提到的， SimpleLinearRegression 类生成了为数众多的汇总值，其中一个重要的汇总值是 T 统计值，它可以用来衡量线性方程与数据的吻合程度。如果吻合良好，则 T 统计值往往是一个较大的值；如果 T 值很小，就应该用一个缺省模型代替您的线性方程，该模型假定 Y值的平均值是最佳预测值（因为一组值的平均值通常可以是下一个观测值的有用的预测值）。

　　要测试 T 统计值是否大到可以不用 Y值的平均值作为最佳预测值，需要计算随机获得 T 统计值的概率。如果概率很低，那就可以不采用平均值是最佳预测值这一无效假设，并且相应地可以确信简单线性模型是与数据良好吻合的。（有关计算 T 统计值概率的更多信息，请参阅第 1 部分。）

　　回过头讨论统计决策过程。它告诉您何时不采用无效假设，却没有告诉您是否接受备择假设。在研究环境中，需要通过理论参数和统计参数来建立线性模型备择假设。

　　您将构建的数据研究工具实现了用于线性模型（T 测试）的统计决策过程，并提供了可以用来构造理论和统计参数的汇总数据，这些参数是建立线性模型所需要的。数据研究工具可以归类为决策支持工具，供知识工作者在中小规模的数据集中研究模式。

　　从学习的角度来看，简单线性回归建模值得研究，因为它是理解更高级形式的统计建模的必由之路。例如，简单线性回归中的许多核心概念为理解多次回归（Multiple Regression）、要素分析（Factor Analysis）和时间序列（Time Series）等建立了良好的基础。

　　简单线性回归还是一种多用途的建模技术。通过转换原始数据（通常用对数或幂转换），可以用它来为曲线数据建模。这些转换可以使数据线性化，这样就可以使用简单线性回归来为数据建模。所生成的线性模型将被表示为与被转换值相关的线性公式。

　　概率函数

　　在前一篇文章中，我通过交由 R 来求得概率值，从而避开了用 PHP 实现概率函数的问题。我对这个解决方案并非完全满意，因此我开始研究这个问题：开发基于 PHP 的概率函数需要些什么。

　　我开始上网查找信息和代码。一个两者兼有的来源是书籍 Numerical Recipes in C 中的概率函数。我用 PHP 重新实现了一些概率函数代码（ gammln.c 和 betai.c 函数），但我对结果还是不满意。与其它一些实现相比，其代码似乎多了些。此外，我还需要反概率函数。

　　幸运的是，我偶然发现了 John Pezzullo 的 Interactive Statistical Calculation。John 关于概率分布函数的网站上有我需要的所有函数，为便于学习，这些函数已用 JavaScript 实现。

　　我将 Student T 和 Fisher F 函数移植到了 PHP。我对 API 作了一点改动，以便符合 Java 命名风格，并将所有函数嵌入到名为 Distribution 的类中。该实现的一个很棒的功能是 doCommonMath 方法，这个库中的所有函数都重用了它。我没有花费力气去实现的其它测试（正态测试和卡方测试）也都使用 doCommonMath 方法。

　　这次移植的另一个方面也值得注意。通过使用 JavaScript，用户可以将动态确定的值赋给实例变量，譬如：

            var PiD2 = pi() / 2

Copy after login

　　在 PHP 中不能这样做。只能把简单的常量值赋给实例变量。希望在 PHP5 中会解决这个缺陷。

　　请注意清单 1中的代码并未定义实例变量 — 这是因为在 JavaScript 版本中，它们是动态赋予的值。

　　清单 1. 实现概率函数

            ＜?php
            // Distribution.php
            // Copyright John Pezullo
            // Released under same terms as PHP.
            // PHP Port and OOfying by Paul Meagher
            class Distribution {
            function doCommonMath($q, $i, $j, $b) {
            $zz = 1;
            $z  = $zz;
            $k  = $i;
            while($k ＜= $j) {
            $zz = $zz * $q * $k / ($k - $b);
            $z  = $z + $zz;
            $k  = $k + 2;
            }
            return $z;
            }
            function getStudentT($t, $df) {
            $t  = abs($t);
            $w  = $t  / sqrt($df);
            $th = atan($w);
            if ($df == 1) {
            return 1 - $th / (pi() / 2);
            }
            $sth = sin($th);
            $cth = cos($th);
            if( ($df % 2) ==1 ) {
            return
            1 - ($th + $sth * $cth * $this-＞doCommonMath($cth * $cth, 2, $df - 3, -1))
            / (pi()/2);
            } else {
            return 1 - $sth * $this-＞doCommonMath($cth * $cth, 1, $df - 3, -1);
            }
            }
            function getInverseStudentT($p, $df) {
            $v =  0.5;
            $dv = 0.5;
            $t  = 0;
            while($dv ＞ 1e-6) {
            $t = (1 / $v) - 1;
            $dv = $dv / 2;
            if ( $this-＞getStudentT($t, $df) ＞ $p) {
            $v = $v - $dv;
            } else {
            $v = $v + $dv;
            }
            }
            return $t;
            }
            function getFisherF($f, $n1, $n2) {
            // implemented but not shown
            }
            function getInverseFisherF($p, $n1, $n2) {
            // implemented but not shown
            }
            }
            ?＞

Copy after login

　输出方法

　　既然您已经用 PHP 实现了概率函数，那么开发基于 PHP 的数据研究工具剩下的唯一难题就是设计用于显示分析结果的方法。

　　简单的解决方案是根据需要将所有实例变量的值都显示到屏幕上。在第一篇文章中，当显示燃耗研究（Burnout Study）的线性方程、 T值和 T 概率时，我就是这么做的。能根据特定目的而访问特定值是很有帮助的， SimpleLinearRegression 支持此类用法。

　　然而，另一种用于输出结果的方法是将输出的各部分系统化地进行分组。如果研究用于回归分析的主要统计软件包的输出，就会发现它们往往是用同样的方式对输出进行分组的。它们往往有 摘要表（Summary Table）、 偏离值分析（Analysis Of Variance）表、 参数估计值（Parameter Estimate）表和 R 值（R Value）。类似地，我创建了一些输出方法，名称如下：

showSummaryTable()

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7375

Java Tutorial

1628

CakePHP Tutorial

1355

Laravel Tutorial

1267

PHP Tutorial

1216

Related knowledge

Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

How to optimize Debian Hadoop Apr 02, 2025 am 08:54 AM

To improve the performance of DebianHadoop cluster, we need to start from hardware, software, resource management and performance tuning. The following are some key optimization strategies and suggestions: 1. Select hardware and system configurations carefully to select hardware configurations: Select the appropriate CPU, memory and storage devices according to actual application scenarios. SSD accelerated I/O: Use solid state hard drives (SSDs) as much as possible to improve I/O operation speed. Memory expansion: Allocate sufficient memory to NameNode and DataNode nodes to cope with larger data processing and tasks. 2. Software configuration optimization Hadoop configuration file adjustment: core-site.xml: Configure HDFS default file system

What is the impact of Debian Message on network configuration Apr 02, 2025 am 07:51 AM

The network configuration of the Debian system is mainly implemented through the /etc/network/interfaces file, which defines network interface parameters, such as IP address, gateway, and DNS server. Debian systems usually use ifup and ifdown commands to start and stop network interfaces. By modifying the ifeline in the interfaces file, you can set a static IP or use DHCP to dynamically obtain the IP address. It should be noted that Debian12 and subsequent versions no longer use NetworkManager by default, so other command-line tools, such as IP commands, may be required to manage network interfaces. You can edit /etc/netwo

How to monitor system performance through Debian logs Apr 02, 2025 am 08:00 AM

Mastering Debian system log monitoring is the key to efficient operation and maintenance. It can help you understand the system's operating conditions in a timely manner, quickly locate faults, and optimize system performance. This article will introduce several commonly used monitoring methods and tools. Monitoring system resources with the sysstat toolkit The sysstat toolkit provides a series of powerful command line tools for collecting, analyzing and reporting various system resource metrics, including CPU load, memory usage, disk I/O, network throughput, etc. The main tools include: sar: a comprehensive system resource statistics tool, covering CPU, memory, disk, network, etc. iostat: disk and CPU statistics. mpstat: Statistics of multi-core CPUs. pidsta

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

Which libraries in Go are developed by large companies or provided by well-known open source projects? Apr 02, 2025 pm 04:12 PM

Which libraries in Go are developed by large companies or well-known open source projects? When programming in Go, developers often encounter some common needs, ...

How to implement hot reload in Debian Apr 02, 2025 am 07:54 AM

Experience the convenience of Flutter hot reloading on the Debian system, just follow the steps below: Install FlutterSDK: First, you need to install FlutterSDK on the Debian system. Visit Flutter official website to download the latest stable version of SDK and decompress to the specified directory (for example, ~/flutter). After that, add Flutter's bin directory to the system PATH environment variable. Edit the ~/.bashrc or ~/.profile file, add the following code: exportPATH="$PATH:~/flutter/bin" Save the file and execute source~/.bas

How to troubleshoot Debian Syslog Apr 02, 2025 am 09:00 AM

Syslog for Debian systems is a key tool for system administrators to diagnose problems. This article provides some steps and commands to troubleshoot common Syslog problems: 1. Log viewing real-time viewing of the latest log: tail-f/var/log/syslog viewing kernel logs (start errors and driver problems): dmesg uses journalctl (Debian8 and above, systemd system): journalctl-b (viewing after startup logs), journalctl-f (viewing new logs in real-time). 2. System resource monitoring and viewing process and resource usage: psaux (find high resource occupancy process) real-time monitoring

See all articles