The Importance of Mathematical Library for Implementing Simple Linear Regression in PHP

Home

Backend Development

PHP Tutorial

The Importance of Mathematical Library for Implementing Simple Linear Regression in PHP_PHP Tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 20, 2016 am 11:17 AM

perl php python and and return accomplish Library open math Source code of Simple Linear language importance

Compared to other open source languages such as Perl and Python, the PHP community lacks a strong effort to develop math libraries.

One reason for this situation may be that there are already a large number of mature mathematical tools, which may hinder the community from developing PHP tools on their own. For example, I worked on a powerful tool, S System, which had an impressive set of statistical libraries, was specifically designed to analyze data sets, and won an ACM Award in 1998 for its language design. If S or its open source cousin R is just an exec_shell call, why go to the trouble of implementing the same statistical computing functionality in PHP? For more information about the S System, its ACM Award, or R, see related references.

Isn’t this a waste of developer energy? If the motivation for developing a PHP math library was to save developer effort and use the best tool for the job, then PHP's current topic makes sense.

On the other hand, pedagogical motivations may encourage the development of PHP math libraries. For about 10% of people, mathematics is an interesting subject to explore. For those who are also proficient in PHP, the development of a PHP math library can enhance the math learning process. In other words, don't just read the chapter about T-tests, but also implement a program that can calculate the corresponding intermediate values and display them in a standard format. their classes.

Through coaching and training, I hope to demonstrate that developing a PHP math library is not a difficult task and may represent an interesting technical and learning challenge. In this article, I will provide a PHP math library example called SimpleLinearRegression that demonstrates a general approach that can be used to develop PHP math libraries. Let's start by discussing some general principles that guided me in developing this SimpleLinearRegression class.

Guiding Principles

I used six general principles to guide the development of the SimpleLinearRegression class.

Create a class for each analysis model.
Use reverse linking to develop classes.
Expect a large number of getters.
Store intermediate results.
Set preferences for detailed APIs.
Perfection is not the goal.

Let’s examine each of these guidelines in more detail.

Create a class for each analysis model

Each major analysis test or process should have a PHP class with the same name as the test or process. This class contains input functions, functions for calculating intermediate and summary values, and output functions (the intermediate and summary values are Display all on screen in text or graphic format).

Use reverse linking to develop classes

In mathematical programming, the coding target is usually the standard output value that an analysis procedure (such as MultipleRegression , TimeSeries , or ChiSquared ) wishes to produce. From a problem-solving perspective, this means you can use backward chaining to develop mathematical-like methods.

For example, the summary output screen displays one or more summary statistics. These summary statistical results rely on the calculation of intermediate statistical results, and these intermediate statistical results may involve deeper intermediate statistical results, and so on. This backlink-based development approach leads to the next principle.

Anticipate a large number of getters

Most of the class development work in mathematics involves calculating intermediate values and summary values. In practice, this means that you shouldn't be surprised if your class contains many getter methods that calculate intermediate and aggregate values.

Store intermediate results

Storing intermediate calculation results within a result object allows you to use the intermediate results as input for subsequent calculations. This principle is implemented in the S language design. In the current context, this principle is implemented by selecting instance variables to represent calculated intermediate values and summary results.

Set preferences for detailed APIs

When developing a naming scheme for the member functions and instance variables in the SimpleLinearRegression class, I discovered that if I use longer names (something like getSumSquaredError instead of getYY2) to describe the member functions and instance variables, then It is easier to understand the operation content of the function and the meaning of the variables.

I haven’t given up on abbreviated names entirely; however, when I use an abbreviated form of a name, I have to try to provide a comment that fully explains the meaning of the name. My take is this: highly abbreviated naming schemes are common in mathematical programming, but they make it more difficult to understand and prove that a certain mathematical routine is correct than it need be.

Perfection is not the goal

The goal of this coding exercise is not necessarily to develop a highly optimized and rigorous math engine for PHP. In the early stages, emphasis should be placed on learning to implement meaningful analytical tests and solving difficult problems in this area.

Instance variables

When modeling a statistical test or process, you need to indicate which instance variables are declared.

The selection of instance variables can be determined by accounting for the intermediate and summary values generated by the analysis process. Each intermediate and summary value can have a corresponding instance variable, with the variable's value as an object property.

I used this analysis to determine which variables to declare for the SimpleLinearRegression class in Listing 1. Similar analysis can be performed on MultipleRegression, ANOVA, or TimeSeries procedures.

Listing 1. Instance variables of the SimpleLinearRegression class

// Copyright 2003, Paul Meagher
// Distributed under GPL
class SimpleLinearRegression {
var $n;
var $X = array();
var $Y = array();
var $ConfInt;
var $Alpha;
var $XMean;
var $YMean;
var $SumXX;
var $SumXY;
var $SumYY;
var $Slope;
var $YInt;
var $PredictedY = array();
var $Error = array();
var $SquaredError = array();
var $TotalError;
var $SumError;
var $SumSquaredError;
var $ErrorVariance;
var $StdErr;
var $SlopeStdErr;
var $SlopeVal; // T value of Slope
var $YIntStdErr;
var $YIntTVal; // T value for Y Intercept
var $R;
var $RSquared;
var $DF; // Degrees of Freedom
var $SlopeProb; // Probability of Slope Estimate
var $YIntProb; // Probability of Y Intercept Estimate
var $AlphaTVal; // T Value for given alpha setting
var $ConfIntOfSlope;
var $RPath = "/usr/local/bin/R"; // Your path here

var $format = "%01.2f"; // Used for formatting output

}
?>

Constructor

The constructor method of the SimpleLinearRegression class accepts an X and a Y vector, each with the same number of values. You can also set a default 95% confidence interval for your expected Y value.

The constructor method starts by verifying that the data form is suitable for processing. Once the input vectors pass the "equal size" and "value greater than 1" tests, the core part of the algorithm is executed.

Performing this task involves calculating the intermediate and summary values of a statistical process through a series of getter methods. Assign the return value of each method call to an instance variable of the class. Storing calculation results in this way ensures that intermediate and summary values are available to calling routines in chained calculations. You can also display these results by calling the output method of the class, as described in Listing 2.

Listing 2. Calling class output method

// Copyright 2003, Paul Meagher
// Distributed under GPL
function SimpleLinearRegression($X, $Y, $ConfidenceInterval="95") {
$numX = count($X);
$numY = count($Y);
if ($numX != $numY) {
    die("Error: Size of X and Y vectors must be the same.");
}
if ($numX <= 1) {
    die("Error: Size of input array must be at least 2.");
}

$this->n               = $numX;
$this->X               = $X;
$this->Y               = $Y;

$this->ConfInt         = $ConfidenceInterval;
$this->Alpha           = (1 + ($this->ConfInt / 100) ) / 2;
$this->XMean           = $this->getMean($this->X);
$this->YMean           = $this->getMean($this->Y);
$this->SumXX           = $this->getSumXX();
$this->SumYY           = $this->getSumYY();
$this->SumXY           = $this->getSumXY();
$this->Slope           = $this->getSlope();
$this->YInt            = $this->getYInt();
$this->PredictedY      = $this->getPredictedY();
$this->Error           = $this->getError();
$this->SquaredError    = $this->getSquaredError();
$this->SumError        = $this->getSumError();
$this->TotalError      = $this->getTotalError();
$this->SumSquaredError = $this->getSumSquaredError();
$this->ErrorVariance   = $this->getErrorVariance();
$this->StdErr          = $this->getStdErr();
$this->SlopeStdErr     = $this->getSlopeStdErr();
$this->YIntStdErr      = $this->getYIntStdErr();
$this->SlopeTVal       = $this->getSlopeTVal();
$this->YIntTVal        = $this->getYIntTVal();
$this->R               = $this->getR();
$this->RSquared        = $this->getRSquared();
$this->DF              = $this->getDF();
$this->SlopeProb       = $this->getStudentProb($this->SlopeTVal, $this->DF);
$this->YIntProb        = $this->getStudentProb($this->YIntTVal, $this->DF);
$this->AlphaTVal       = $this->getInverseStudentProb($this->Alpha, $this->DF);
$this->ConfIntOfSlope = $this->getConfIntOfSlope();
return true;
}
?>

Method names and their sequences were derived through a combination of backlinking and reference to a statistics textbook used by undergraduate students, which explains step-by-step how to calculate intermediate values. The name of the intermediate value I need to calculate is prefixed with "get", thus deriving the method name.

Fit the model to the data

The SimpleLinearRegression procedure is used to produce a straight line fit to the data, where the straight line has the following standard equation:

　y = b + mx

The PHP format of this equation looks similar to Listing 3:

Listing 3. PHP equations to fit the model to the data

$PredictedY[$i] = $YIntercept + $Slope * $X[$i]

The SimpleLinearRegression class uses the least squares criterion to derive estimates of the Y-intercept (Y Intercept) and slope (Slope) parameters. These estimated parameters are used to construct a linear equation (see Listing 3) that models the relationship between the X and Y values.

Using the derived linear equation, you can get the predicted Y value corresponding to each X value. If the linear equation fits the data well, then the observed and predicted values of Y tend to be consistent.

How to determine whether it is a good match

The SimpleLinearRegression class generates quite a few summary values. An important summary value is the T statistic, which measures how well a linear equation fits the data. If the agreement is very good, the T statistic will tend to be large. If the T statistic is small, then the linear equation should be replaced with a model that assumes that the mean of the Y values is the best predictor (that is, the mean of a set of values is usually a useful predictor of the next observation, make it the default model).

To test whether the T statistic is large enough not to consider the mean Y value as the best predictor, you need to calculate the random probability of obtaining the T statistic. If the probability of obtaining a T-statistic is low, then you can reject the null hypothesis that the mean is the best predictor and, accordingly, be confident that the simple linear model fits the data well.

So, how to calculate the probability of T statistic value?

Calculate the probability of T statistic

Since PHP lacks mathematical routines for calculating the probability of T statistic values, I decided to leave this task to the statistical computing package R (see www.r-project.org in Resources) to obtain the necessary values. I also want to draw attention to this bag because:

R provides many ideas that PHP developers might emulate in PHP math libraries.
With R, it is possible to determine whether the values obtained from the PHP math library are consistent with those obtained from mature, freely available open source statistical packages.
The code in Listing 4 demonstrates how easy it is to leave it to R to get a value.

Listing 4. Handling it to the R statistical package to get a value

// Copyright 2003, Paul Meagher
// Distributed under GPL
class SimpleLinearRegression {

var $RPath = "/usr/local/bin/R"; // Your path here
function getStudentProb($T, $df) {
$Probability = 0.0;
$cmd = "echo 'dt($T, $df)' | $this->RPath --slave";
$result = shell_exec($cmd);
list($LineNumber, $Probability) = explode(" ", trim($result));
Return $Probability;
}
function getInverseStudentProb($alpha, $df) {
$InverseProbability = 0.0;
$cmd = "echo 'qt($alpha, $df)' | $this->RPath --slave";
$result = shell_exec($cmd);
list($LineNumber, $InverseProbability) = explode(" ", trim($result));
Return $InverseProbability;
}
}
?>

Note that the path to the R executable has been set and used in both functions. The first function returns the probability value associated with the T statistic based on the Student's T distribution, while the second inverse function computes the T statistic corresponding to the given alpha setting. The getStudentProb method is used to evaluate the fit of the linear model; the getInverseStudentProb method returns an intermediate value, which is used to calculate the confidence interval for each predicted Y value.

Due to limited space, it is impossible for me to detail all the functions in this class one by one, so if you want to figure out the terminology and steps involved in simple linear regression analysis, I encourage you to refer to the statistics textbook used by undergraduate students. .

Burnup research

To demonstrate how to use this class, I can use data from a study of burnout in utilities. Michael Leiter and Kimberly Ann Meechan studied the relationship between a measure of burnout called the Exhaustion Index and an independent variable called Concentration. Concentration refers to the proportion of people's social contacts that come from their work environment.

To study the relationship between consumption index values and concentration values for individuals in their sample, load these values into an appropriately named array and instantiate this class with these array values. After instantiating a class, display some summary values generated by the class to evaluate how well the linear model fits the data.

Listing 5 shows the script that loads the data and displays summary values:

Listing 5. Script to load data and display summary values

// BurnoutStudy.php
// Copyright 2003, Paul Meagher
// Distributed under GPL
include "SimpleLinearRegression.php";
// Load data from burnout study
$Concentration = array(20,60,38,88,79,87,
                        68,12,35,70,80,92,
to                                                                                   $ExhaustionIndex = array(100,525,300,980,310,900,
                        410,296,120,501,920,810,
506,493,892,527,600,855,
709,791,718,684,141,400,970);
                                                                          $slr = new SimpleLinearRegression($Concentration, $ExhaustionIndex);
$YInt = sprintf($slr->format, $slr->YInt);
$Slope = sprintf($slr->format, $slr->Slope);
$SlopeTVal = sprintf($slr->format, $slr->SlopeTVal);
$SlopeProb = sprintf("%01.6f", $slr->SlopeProb);
?>

Equation:
T:
Prob > T:

Running this script through a web browser produces the following output:

Equation: Exhaustion = -29.50 + (8.87 * Concentration)

T: 6.03

Prob > T: 0.000005

The last row of this table indicates that the random probability of obtaining such a large value of T is very low. It can be concluded that a simple linear model has better predictive power than simply using the mean of the consumption values.

Knowing the concentration of connections in someone’s workplace can be used to predict the level of burnout they may be consuming. This equation tells us that for every 1 unit increase in the concentration value, the consumption value of a person in the social services field will increase by 8 units. This is further evidence that to reduce potential burnout, individuals in social services should consider making friends outside of their workplace.

This is just a rough description of what these results might mean. To fully explore the implications of this data set, you may want to study the data in more detail to make sure this is the correct interpretation. In the next article I will discuss what other analyzes should be performed.

What did you learn?

For one, you don’t have to be a rocket scientist to develop meaningful PHP-based math packages. By adhering to standard object-oriented techniques and explicitly adopting a backlink problem-solving approach, some relatively basic statistical procedures can be implemented relatively easily in PHP.

From a teaching standpoint, I think this exercise is very useful, if only because it requires you to think about statistical tests or routines at higher and lower levels of abstraction. In other words, a great way to supplement your statistical testing or procedural learning is to implement the procedure as an algorithm.

Implementing statistical tests often requires going beyond the scope of the given information and creatively solving and discovering problems. It is also a good way to discover gaps in knowledge about a subject.

On the downside, you find that PHP lacks inherent means for sampling distributions, which is necessary to implement most statistical tests. You'll need to let R do the processing to get these values, but I'm afraid you won't have the time or interest to install R. Native PHP implementations of some common probability functions can solve this problem.

Another problem: this class generates many intermediate and summary values, but the summary output doesn't actually take advantage of this. I've provided some unwieldy output, but it's neither sufficient nor well organized so that you can adequately interpret the results of the analysis. Actually, I have absolutely no idea how I can integrate the output method into this class. This needs to be addressed.

Finally, to understand the data, it’s not just about looking at the summary values. You also need to understand how individual data points are distributed. One of the best ways to do this is to graph your data. Again, I don't know much about this, but if you want to use this class to analyze real data, you need to solve this problem.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7589

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

123

Related knowledge

PHP: An Introduction to the Server-Side Scripting Language Apr 16, 2025 am 12:18 AM

PHP is a server-side scripting language used for dynamic web development and server-side applications. 1.PHP is an interpreted language that does not require compilation and is suitable for rapid development. 2. PHP code is embedded in HTML, making it easy to develop web pages. 3. PHP processes server-side logic, generates HTML output, and supports user interaction and data processing. 4. PHP can interact with the database, process form submission, and execute server-side tasks.

Why Use PHP? Advantages and Benefits Explained Apr 16, 2025 am 12:16 AM

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.

PHP vs. Python: Use Cases and Applications Apr 17, 2025 am 12:23 AM

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

PHP and the Web: Exploring its Long-Term Impact Apr 16, 2025 am 12:17 AM

PHP has shaped the network over the past few decades and will continue to play an important role in web development. 1) PHP originated in 1994 and has become the first choice for developers due to its ease of use and seamless integration with MySQL. 2) Its core functions include generating dynamic content and integrating with the database, allowing the website to be updated in real time and displayed in personalized manner. 3) The wide application and ecosystem of PHP have driven its long-term impact, but it also faces version updates and security challenges. 4) Performance improvements in recent years, such as the release of PHP7, enable it to compete with modern languages. 5) In the future, PHP needs to deal with new challenges such as containerization and microservices, but its flexibility and active community make it adaptable.

Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

See all articles