Are Git commits diffs, snapshots, or history?-Computer Knowledge-php.cn

Table of Contents

What is a snapshot? " >What is a snapshot?

Is "how Git is implemented" really the right way to explain it? " >Is "how Git is implemented" really the right way to explain it?

How Git represents commits internally - Snapshot" >How Git represents commits internally - Snapshot

Home

Computer Tutorials

Computer Knowledge

Are Git commits diffs, snapshots, or history?

PHPz

Feb 19, 2024 am 11:39 AM

git submit Quick move

Git 提交是差异、快照还是历史记录？

It’s easy for me to understand how Git commits are implemented, but it’s difficult to understand other people’s views on submissions. So I asked some questions to others on Mastodon.

What do you think of Git submission?

I conducted a very unscientific survey and asked people what they think of Git commits: is it a snapshot, a diff, or a list of all previous commits? (Of course, it's reasonable to think of it as all three, but I'm curious about people's main

turn out:

51% difference
42% Snapshot
4% History of all previous commits
3% "Other"

I'm surprised how close the ratios are for the two options of Difference and Snapshot. People also made some interesting but conflicting points, like
"In my opinion the commit is a diff, but I think it's actually implemented as a snapshot" and
"In my opinion , the commit is a snapshot, but I think it's actually implemented as a diff". We'll talk more about how submission is actually implemented later.

Before we go any further: What do we mean by "a difference" or "a snapshot"?

What is the difference?

The "difference" I'm talking about is probably pretty obvious: the difference is what you get when you run git show COMMIT_ID. For example, here's a typo fix in the rbspy project:

diff --git a/src/ui/summary.rs b/src/ui/summary.rs
index 5c4ff9c..3ce9b3b 100644
--- a/src/ui/summary.rs
+++ b/src/ui/summary.rs
@@ -160,7 +160,7 @@ mod tests {
";
let mut buf: Vec = Vec::new();
-stats.write(&mut buf).expect("Callgrind write failed");
+stats.write(&mut buf).expect("summary write failed");
let actual = String::from_utf8(buf).expect("summary output not utf8");
assert_eq!(actual, expected, "Unexpected summary output");
}

Copy after login

You can see it on GitHub: https://github.com/rbspy/rbspy/commit/24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b

What is a snapshot?

What I mean by "snapshot" is "all the files you get when you run git checkout COMMIT_ID".

Git usually refers to the list of submitted files as a "tree" (such as a "directory tree"). You can see all the files submitted above on GitHub:

https://github.com/rbspy/rbspy/tree/24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b (it is /tree/ instead of /commit/)

Is "how Git is implemented" really the right way to explain it?

The advice I hear most often about learning Git is probably "Just learn how Git represents things internally, and everything will become clearer." I obviously really like this perspective (if you've spent some time reading this blog, you'll know that I like

But as a method of learning Git, it was not as successful as I hoped! Normally I would excitedly start explaining "Okay, so a Git
commit is a snapshot, it has a pointer to its parent commit, then a branch is a pointer to the commit, and then...", but I tried People who help will tell me that they didn't really find this explanation very useful, they still don't get it. So I've been looking at other options.

But let’s talk about the internal implementation first.

How Git represents commits internally - Snapshot

Internally, Git represents commits as snapshots (which store a "tree" of the current version of each file). I'm in a Git repository, where are your files? I've written about this in , but here's a very quick overview of the internal format.

This is a submission representation:

$ git cat-file -p 24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b
tree e197a79bef523842c91ee06fa19a51446975ec35
parent 26707359cdf0c2db66eb1216bf7ff00eac782f65
author Adam Jensen1672104452 -0500
committer Adam Jensen1672104890 -0500
Fix typo in expectation message

Copy after login

And, when we view this tree object, we see a list of every file/subdirectory under the root of the repository in this commit:

$ git cat-file -p e197a79bef523842c91ee06fa19a51446975ec35
040000 tree 2fcc102acd27df8f24ddc3867b6756ac554b33ef.cargo
040000 tree 7714769e97c483edb052ea14e7500735c04713eb.github
100644 blob ebb410eb8266a8d6fbde8a9ffaf5db54a5fc979a.gitignore
100644 blob fa1edfb73ce93054fe32d4eb35a5c4bee68c5bf5ARCHITECTURE.md
100644 blob 9c1883ee31f4fa8b6546a7226754cfc84ada5726CODE_OF_CONDUCT.md
100644 blob 9fac1017cb65883554f821914fac3fb713008a34CONTRIBUTORS.md
100644 blob b009175dbcbc186fb8066344c0e899c3104f43e5Cargo.lock
100644 blob 94b87cd2940697288e4f18530c5933f3110b405bCargo.toml

Copy after login

This means that checking out a Git commit is always fast: it's just as easy for Git to check out yesterday's commit as it is to check out a million commits ago. Git never needs to reapply 10,000 diffs to determine the current state because commits are never stored as diffs at all.

Snapshots are compressed using packfile

I just mentioned that Git commit is a snapshot, but when someone says "In my opinion, the commit is a snapshot, but I think it is a difference in implementation"
, this is actually also true. ! Git
commits aren't represented in the form of diffs you might be used to (they're not stored on disk as a diff from the previous commit), but the basic intuition is that if you're going to do a 10,000## If the file in line # is edited 500 times, then the efficiency of storing 500 files will be very low.

Git has a way to store files in the form of differences. This is called a "packfile" and Git will periodically garbage collect your data into a packfile to save disk space. When you

git clone a repository, Git also compresses the data.

I don't have enough space here to fully explain how packfiles work (Aditya Mukerjee's "Unpacking Git packfiles" is my favorite article to explain how they work). However, I can briefly summarize my understanding of how deltas work and how they differ from diff here:

Something weird is actually happening when you look at the diff

What actually happens when we run

git show SOME_COMMIT to see the diff of a certain commit is a bit counter-intuitive. My understanding is:

Git will look in the packfile and apply the changes to rebuild the tree of this commit and its parent commits.

Git will perform a difference comparison between two directory trees (the directory tree of the current commit and the directory tree of the parent commit). This is usually fast because almost all files are exactly the same, so git can just compare hashes of identical files, almost all the time doing nothing.

Finally Git will show the differences

So, Git will convert the changes into a snapshot and then calculate the difference. It feels a little weird because it starts with something like a difference and ends up with another thing like a difference, but the amount of change and the difference are actually completely different, so it makes sense.

That said, I think Git stores commits as snapshots, and packfile is just an implementation detail to save disk space and speed up cloning. I've never actually had to know how packfile works, but it does help me understand how Git snapshots commits without taking up too much disk space.

A “wrong” Git understanding: commits are diffs

I think a fairly common understanding of Git’s “error” is:

This understanding is of course wrong (in reality, commits are stored in the form of snapshots, and diffs are calculated from these snapshots), but it seems very useful and makes sense to me! It's a little weird when thinking about merge commits, but maybe we could say that this is just the difference based on the first parent commit of the merge commit.

I think this misunderstanding is sometimes very useful, and it doesn't seem to be a problem for daily Git use. I really like that it makes the things we use most (differences) the most basic elements - it's very intuitive to me.

I've also been thinking about some other useful but "wrong" understandings of Git, such as:

Commit information can be edited (actually not, you just copy an identical commit and give it new information, the old commit still exists)
Commits can be moved to a different base (similarly, they are copied)

I think there is a range of "wrong" understandings of Git that make perfect sense, are largely supported by the Git user interface, and do not cause problems in most cases. But it can get confusing when you want to undo a change or something goes wrong.

Some advantages of treating commits as diffs

Even if I know commits are snapshots in Git, I probably treat them as diffs most of the time because:

Most of the time I focus on the changes I'm making - if I just change a line of code, obviously I'm mainly thinking about that line of code rather than the current state of the entire codebase
You'll see the difference when you click on a Git commit on GitHub or use git show, so it's just something I'm used to seeing
I use rebasing a lot, it's all about reapplying differences

Some advantages of treating commits as snapshots

But I also sometimes think of commits as snapshots because:

Git is often confused by the movement of files: sometimes I move a file and edit it, and Git doesn't recognize that it has been moved, and instead displays it as
"old.py removed, new.py added". This is because Git only stores snapshots, so when it says "Move old.py -> new.py"
At this time, it is just a guess because the contents of old.py and new.py are similar.
This way it's easier to understand what git checkout COMMIT_ID is doing (the idea of reapplying 10,000 commits stresses me out)
Merge commits look more like snapshots to me, since the merged commit can actually be anything (it's just a new snapshot!). It helped me understand why arbitrary changes can be made when resolving merge conflicts, and why care should be taken when resolving conflicts.

Some other understandings about submission

Some of Mastodon’s replies also mentioned:

"Additional" out-of-band information about the commit, such as an email, a GitHub pull request, or a conversation you had with a colleague
Think of "difference" as a "state before state after"
And, of course, many people view submissions differently depending on the circumstances

Some other words people use when talking about commits that may be less ambiguous:

"Revision" (seems more like a snapshot)
"Patch" (looks more like a diff)

That’s it!

I have a hard time understanding the different understandings people have of Git. What's especially tricky is that, although "wrong" understandings are often very useful, people are so keen to be wary of "wrong" mental models that people are reluctant to share their "wrong" ideas for fear of some Git interpreter Will stand up and explain to them why they are wrong. (These Git
interpreters usually mean well, but it can have a negative impact regardless)

But I learned a lot! I'm still not entirely sure how to talk about commits, but we'll figure it out eventually.

Thanks to Marco Rogers, Marie Flanagan, and everyone at Mastodon for discussing Git commits with me.

The above is the detailed content of Are Git commits diffs, snapshots, or history?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Saving in R.E.P.O. Explained (And Save Files)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7564

CakePHP Tutorial

1385

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to delete a repository by git Apr 17, 2025 pm 04:03 PM

To delete a Git repository, follow these steps: Confirm the repository you want to delete. Local deletion of repository: Use the rm -rf command to delete its folder. Remotely delete a warehouse: Navigate to the warehouse settings, find the "Delete Warehouse" option, and confirm the operation.

How to generate ssh keys in git Apr 17, 2025 pm 01:36 PM

In order to securely connect to a remote Git server, an SSH key containing both public and private keys needs to be generated. The steps to generate an SSH key are as follows: Open the terminal and enter the command ssh-keygen -t rsa -b 4096. Select the key saving location. Enter a password phrase to protect the private key. Copy the public key to the remote server. Save the private key properly because it is the credentials for accessing the account.

How to download git projects to local Apr 17, 2025 pm 04:36 PM

To download projects locally via Git, follow these steps: Install Git. Navigate to the project directory. cloning the remote repository using the following command: git clone https://github.com/username/repository-name.git

What to do if the git download is not active Apr 17, 2025 pm 04:54 PM

Resolve: When Git download speed is slow, you can take the following steps: Check the network connection and try to switch the connection method. Optimize Git configuration: Increase the POST buffer size (git config --global http.postBuffer 524288000), and reduce the low-speed limit (git config --global http.lowSpeedLimit 1000). Use a Git proxy (such as git-proxy or git-lfs-proxy). Try using a different Git client (such as Sourcetree or Github Desktop). Check for fire protection

How to connect to the public network of git server Apr 17, 2025 pm 02:27 PM

Connecting a Git server to the public network includes five steps: 1. Set up the public IP address; 2. Open the firewall port (22, 9418, 80/443); 3. Configure SSH access (generate key pairs, create users); 4. Configure HTTP/HTTPS access (install servers, configure permissions); 5. Test the connection (using SSH client or Git commands).

How to detect ssh by git Apr 17, 2025 pm 02:33 PM

To detect SSH through Git, you need to perform the following steps: Generate an SSH key pair. Add the public key to the Git server. Configure Git to use SSH. Test the SSH connection. Solve possible problems according to actual conditions.

How to solve the efficient search problem in PHP projects? Typesense helps you achieve it! Apr 17, 2025 pm 08:15 PM

When developing an e-commerce website, I encountered a difficult problem: How to achieve efficient search functions in large amounts of product data? Traditional database searches are inefficient and have poor user experience. After some research, I discovered the search engine Typesense and solved this problem through its official PHP client typesense/typesense-php, which greatly improved the search performance.

How to add public keys to git account Apr 17, 2025 pm 02:42 PM

How to add a public key to a Git account? Step: Generate an SSH key pair. Copy the public key. Add a public key in GitLab or GitHub. Test the SSH connection.

See all articles