NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Resistance Against Git Merge Hell (2015) (tugberkugurlu.com)
pjc50 4 days ago [-]
This is a very short advert for rebasing your personal branch.

People seem to have very strong opinions about this. I don't much, either way; I have to use the gerrit flow at work, which mandates a lot of rebasing, and in general I prefer not having a commit which only records a merge. Some people seem to want highly detailed tracking of what an individual developer has done in their personal checkout. I will only note that the act of observing changes what is observed.

klyrs 16 hours ago [-]
I'm an ADHD weirdo who will mix six issues in my personal branch, and use rebase, cherry-pick, and in a pinch, difftool to carve out clean patches when something is good and ready. If I were to give people an honest history of what was actually going on in my repo, and I used merge and revert to really keep it all when I cut a patch, it would be a forensic nightmare.

There is some benefit to my tendency to do experimental work in a branch where I've started to work on something else. In that mudpit, I find serendipitous solutions to multiple problems, and I find incompatibilities between desired changes. And when multiple issues have overlapping changes, there's usually an optimal ordering of which to address first -- and that's not always obvious.

I greatly prefer to do all the forensic work up-front, to make a clean history with a cogent story in the commit message. When I see other people's messy histories with every merge/revert, I need to do that forensic work every time I go back in history; without the benefit of recent first-person experience.

juped 15 hours ago [-]
Same.

But there's a difference between a few separate things here:

- assembling your own chaotic actual work into cogent commits using interactive rebase and friends (but not moving the base, e.g. git rebase -i --keep-base): good and necessary. people doing reviews should also review commit history, not just a megadiff of all changes.

- rebasing in the sense of "moving the base upon which your work is based", which for unfortunate historical reasons uses the same command: this is rarely ever useful, despite how much people seem to enjoy doing it. replace this with a no-op in nearly all cases; and in the few cases where you might want to (the upstream you want to integrate with changed incompatibly) merging a recent version tag into your topic is better.

- rebasing in the sense of "as a maintainer adding completed work to an integration branch like master, rebasing the commits atop the integration branch rather than simply merging": this is actively harmful (as is "squash merging", which is this _plus_ destroying the commits crafted in point 1) and is also unfortunately what most people seem to mean.

em-bee 5 hours ago [-]
why is merging upstream changes into my branch better than rebasing my branch on top of those changes?

and for the last one, how is the maintainers rebasing my branch onto master any different from me doing it before submitting my changes? and why is it actively harmful? (i agree with squash merge, that's not useful. in a project that does that i have resorted to carefully crafting my commits such that each commit can be a separate PR so they can't be squashed)

juped 5 hours ago [-]
The dichotomy isn't between merging upstream changes into your branch vs. rebasing your branch on top of them. It's between doing one of those things and doing nothing, and doing nothing is the better (and easier!) option here.

(When I mention it above, it's for a case like "the branch is several months old, two versions have been released in that time, and I'm revisiting it to finish it up now", and in that case you just merge the latest version tag (tag, not branch) before continuing. It's a very specific thing.)

The main reason I see in practice that people do something, rather than nothing, is because someone is insisting on clicking the button on Github's web interface to merge a topic, which insists the merge be trivial. Faced with this sort of (bad) maintainership, the topic author has to choose between rebasing the topic, losing some useful graph properties (independence of unrelated branches) in the process, or merging upstream just before upstream merges them, which is incredibly stupid-looking (and very annoying when reading history, nothing is in the place you look for it).

Usually they choose the former, or have it chosen for them. And now the branch, topologically, includes a bunch of unrelated things, and if you have any sort of triaging integration, QA, maintenance versions, or just want to read the graph to see what topic depends on what other topic, this is broken on this no-longer-independent topic. (In very bad cases, you might need to acquire a lock on a human, precisely what the whole system is meant to avoid.)

The best thing is just to do nothing. If you're using Git as originally intended, your branch became patches anyway, it doesn't have a history. (As the maintainer receiving them, you just apply the patches to the latest version tag, pretty much, unless it's in series with something else, or it's a bugfix (always apply a bugfix commit directly onto the commit that introduced the bug. it can then be merged into any maintenance branch or other branch containing the bug. the word "cherry-pick" need never cross your keyboard.)) But a bad webapp broke things for a lot of people, and it was worked around in various ways.

em-bee 4 hours ago [-]
i don't understand why the difference between tag and branch matters? why should i merge a tag and not the branch head?

as a maintainer, why should i have to make the effort of merging a contribution myself, instead of asking the contributor to make sure that their changes can be cleanly merged?

i agree that merging upstream before they merge me seems odd, but that suggests that rebasing is the only way to go. or how else can i ensure that my code can be merged cleanly without creating extra work for the maintainer?

juped 3 hours ago [-]
Integrating things is 100% of a maintainer's job. As someone who wrote one topic, you don't have any insight into the arbitrarily many other topics that coincidentally were written at a similar time to yours; as the maintainer who integrates all submitted topics, you have read and are familiar with all of them.

As a topic author, you have no such responsibility to deal with arbitrary other topics you had nothing to do with; this holds in an opportunistic open source contribution, in a workplace, anywhere. There's someone who reads all the patches and has a working knowledge of them: the maintainer. And if they ask you to do their job for them (which you will likely do poorly, not being the maintainer), they are a _bad_ maintainer. (And if software encourages this it's bad software; I'm not attributing this to maintainer malice or something, most people are just taking their cues from common bad software like Github's website.)

> why should i merge a tag and not the branch head?

Because a branch head is an arbitrary collection of things and a version is a specific supported collection of things.

em-bee 1 hours ago [-]
i am going to have to disagree to an extent. while on the face of it this makes sense. realities are often different.

as a FOSS contributor, making the work of a maintainer easier can mean the difference between my submission being accepted or rejected. if i want to push a change upstream it is therefore in my interest to not rely on the maintainer to do that work. especially not if that maintainer is a volunteer. not every contributor is going to do that, nor should every contributor do that. only those that have sufficient experience to work at the level of the maintainer, but simply are not the designated maintainer because it is not their project.

the same holds true at work. i expect everyone in a team to be familiar with the whole project and be capable of doing the work of the maintainer, at least everyone at senior level. whether they end up doing it or not. having a single person be responsible for all merges is an idea that i do not agree with. among other reasons it causes the designated maintainer to become a gatekeeper preventing others from developing that skill and step up to do that kind of work. whether it is practical for the maintainer to do all the integration then is a matter of their workload.

the suggestion that only the designated maintainer can do a good job at merging is even somewhat patronizing.

on tag vs branch, this seems to me only makes sense if we are developing against a release branch. a development branch often does not have release tags. the "releases" in a development branch are the merges of new feature branches. if each feature is merged on completion then there should not be any work in progress commits anyways. even if rebasing is used, those commits will be pushed all at once, so HEAD is always at a completed feature.

i think that these are simply different approaches to development, and it can't be said that one is better than the other. it comes down to preference. i favor an environment where everyone can step up to contribute to the extent of their experience and knowledge over an environment where my capacity is limited by my role.

being told that i am not capable of doing something because it is not part of my role is not something i want to hear ever.

juped 1 hours ago [-]
I think you're having a strong emotional reaction to something unclear, which is outside the scope of what I'm willing to discuss further. Thanks for the comment chain, however, I do appreciate it.
em-bee 1 hours ago [-]
that's a pity because i think that is the area where i could learn something new. but thank you for bearing with me so far.
klyrs 15 hours ago [-]
Yeah, I have a love/hate relationship with squash. On one hand, I think everybody misses an autocommit hook once and a while and in Python its not uncommon to have "black" commits interspersed throughout it all. Love because it's convenient to squash all that noise with a single click; hate because I want a surgical squash where tooling/typo commits are automatically squashed into the parent.
juped 15 hours ago [-]
commit --fixup=HEAD~1 (or any form of reference to the commit to fix up); rebase -i --autosquash later

my "reroll" alias is "reroll = rebase --interactive --keep-base --rebase-merges"; but the --rebase-merges is less useful to most people (I do maintainer operations as well, and it's sometimes useful there)

autosquash is on in the config, though i should really add it to the alias for redundancy, it's an alias and length doesn't matter.

wobblyasp 4 days ago [-]
Clean is such a weasel word. Does it matter if you visual tooling produces a series of squiggly lines?

The information is all there. You can grep it to find specific commits.

This isn't a hill worth dying on with your team if it's their MO already.

IshKebab 4 days ago [-]
The information is all there but the point is that it's an unreadable mess.
stevage 18 hours ago [-]
To me it is weird to talk about readability when not talking about a specific tool. Can a given tool not make a history more readable by filtering out merge commits?
ClumsyPilot 4 days ago [-]
Exactly - it’s subjective - detailed vs clean - if you want every detail of reality, it’s not clean, it’s messy. Ever seen inside of an engine? It’s greasy, dirty, etc. a clean engine doesn’t run.

This is putting lipstick on a pig

Which can be fine, but let’s not pretend there is a perfect approach, needs differ

INTPenis 4 days ago [-]
I was guilty of merge hell until I learned about rebase.

I think it's a natural instinct to want to protect your production environment so I'm sure others have made the same mistake as I did. Main is my default branch, I don't want anything pushed into the default branch to result in a deployment to production.

So naturally I create a new branch called production. The relationship makes sense to me because we develop in main, might even deploy to staging from main, but it's not until we feel we're ready that we merge main with production. And to a user of git this requires a manual step where you explicitly specify the word production, git checkout production.

But that resulted in merge hell until I learned I could just rebase main onto production.

And the command structure is even exactly the same, standing in production branch I either do git merge main, or git rebase main.

This comment was a message to my younger self.

GrantMoyer 4 days ago [-]
In my experience, if there are no conflicts to resolve, then merge and rebase have near equivalent UX. Just a single command and git handles the rest.

However, if there are conflicts, I've always found them to be far easier to resolve during a merge than a rebase. That's not to say it's always easy, but at least with merge you only need to resolve them once. With rebase, you resolve the conflict, then have to resolve it again when git tries to apply the next change, and so on. Sometimes you even need to resolve conflicts bacwards if you're reordering commits. Much of this can be mitigated with git rerere, but if you're averse to merge, you probably don't want to learn about rerere either.

m4lvin 18 hours ago [-]
Thanks for making me look up rerere which I first thought would be a typo but actually seems like a really useful thing :-)

https://git-scm.com/book/en/v2/Git-Tools-Rerere

pjc50 4 days ago [-]
As a promoter of the rebase faction .. that's one case where I think "merge" is correct. Since you're not changing production without going through main (I hope), every merge should just go straight through ("fast forward"?)

    git checkout production
    git pull --ff-only main production
    git push
(at which point CI auto-deploys 'production')

(edit: have you got "rebase main onto production" the wrong way round?)

GrantMoyer 4 days ago [-]
I think they must have been cherry picking changes into production. Git merge might choke on the duplicate changes, but git rebase will usually see the change from each cherry pick is already applied and skip it.
01HNNWZ0MV43FF 18 hours ago [-]
Your comment says 6 minutes ago but when I go to reply it says three days... And I do remember reading it three days ago

Does hacker News fudge timestamps on comments when it boosts a post?

Uvix 17 hours ago [-]
Production shouldn't have its own branch - it shouldn't even be a new build at all. The same binaries that were deployed to staging should be redeployed to production once you approved. Otherwise, your "staging" environment has the wrong name... and the testing there doesn't represent what goes to production.
toast0 17 hours ago [-]
In a system with production branches and staging, you build for staging from the production branch.

When you're in the release process, the branch won't match what's currently deployed, but that's ok. The point of a production branch is not to indicate what is on production at this instant, but to be a record of what was deployed to production or at least was intended to be, for changes that get canceled before deployment.

Uvix 16 hours ago [-]
Building for staging from the production branch would be fine. Grandparent explicitly said they build for staging from the dev branch, though.
toast0 16 hours ago [-]
Oh yeah. I missed that. Sounds like they use their staging for dev work too then. Which is like ok, but confusing.
hiddew 15 hours ago [-]
Why not? A branch is a pointer to a commit, so the commit built for staging can be the same binary that will be deployed to production, if the production branch is updated to point to the same commit as staging was pointing to. No rebuild necessary because it is the same commit.
Uvix 14 hours ago [-]
If you have fully reproducible builds, and if it's from the same commit, then it's probably okay. With multiple branches in play, and staging built from a different branch than production, the latter doesn't hold.
jakjak123 16 hours ago [-]
Good point. That is my conclusion as well, its been a while since I worked on deployments, but that is how we set up. Deploying is just a different pipeline on main branch that runs same exact artifact build in production
jakjak123 16 hours ago [-]
Its not a hill im dying on, but when you rebase the default branch, its gonna create conflicts for every single other team member. Its annoying to cause so much noise for every team member. Now I run with auto.rebase set, but that is not the default. Ymmv.
pandemic_region 4 days ago [-]
> But that resulted in merge hell until I learned I could just rebase main onto production.

Did you mean rebase production onto main rather? Or do I have my branches and trees mixed up.

dools 4 days ago [-]
I think the commands are:

git checkout production

git rebase main

that gets everything from main into production.

pandemic_region 4 days ago [-]
Yes, but rebase is like rebranch. So you take a branch from a tree, and plant it on top of a new tree. So, you rebranch (rebase) production branch on top of the main branch.

git rebase (current branch onto) main

jakjak123 16 hours ago [-]
This would rewrite the history for production. Not what you want to do
LegionMammal978 15 hours ago [-]
The point, as I understand it, is that production has no meaningful history separate from main's history: it acts as a bare pointer, and rebasing it just fast-forwards it to later points in main's history.
giantg2 16 hours ago [-]
Most merge issues can be avoided with proper architecture and planning. If your files are so large that you have multiple devs working in the same file at the same time on a regular basis, then it's probably time to get more modular.

I honestly hate Git compared to SVN. The Git tool seems technically better. However, the easier aspects of the tool seems to have promoted less planning, more stepping on each other's toes, and what I see as non-value added work (merges, rebaseing, retesting, etc).

srvaroa 15 hours ago [-]
If a pizza team struggles to work in small independent changes, and end up having to deal with long lived branches, merge festivals etc. then the problem is really not in the tool chain.
giantg2 15 hours ago [-]
Yeah, it just seems that the tool chain has made it easier to fall into those poor practices.

But I probably just work for a really shitty company.

hardlianotion 4 days ago [-]
Oh. I wanted this to be about the London Tube Map.
madeofpalk 4 days ago [-]
Of course, "What went wrong with the Tube Map?" from Jay Foreman

https://www.youtube.com/watch?v=jaEhvWXmLyk

hardlianotion 4 days ago [-]
Much better.
jfengel 16 hours ago [-]
The Tube could definitely stand for some rebasing. Its history is rather convoluted.

Also removal of code smells. And actual smells.

/I love the Tube. So damned useful.

Aethelwulf 4 days ago [-]
Do people really look at their git commit history like this? Why?
Snild 4 days ago [-]
Yes. To figure out what happened, and which path it took to get into my repo. Or to see how branches have diverged.

I also recommend it to git newbies, as a tool to understand the state of their repo when something has gone wrong (e.g. they did a bad merge or rebase).

kaffekaka 4 days ago [-]
But with rebase, the history does not show what happened. It shows the changes to the code yes, but not how they came into being.
neallindsay 4 days ago [-]
People often bring up this objection, but I don't want a complete history—I want an easy-to-understand history.

If I make a variable in commit A and think of a better name for it in commit C, why wouldn't I use rebase to squash C into A? Some sense of purity of history?

Or more directly related to the rebase vs. merge debate, if I fix an issue across three functions, and in the meantime someone has removed one of those functions on the main branch, rebasing eliminates the "history" of me fixing that removed function and I think that's good. It makes my commit simpler.

Rebase can certainly be used to simplify the history too much, but that will always be a judgement call. That shouldn't keep us from editing our branches in ways that are clarifying instead of confusing.

We never capture the full complexity of what we went through writing code in our source control. It would be bad if we did.

kaashif 17 hours ago [-]
I think a lot of people are anti rebase in general, but doing that on your own is fine IMO. I have a different issue.

The issue with is with rebasing multiple commits onto master instead of merging - the intermediate commits were never code that anyone ever actually wrote or tested, so any issues with them stem purely from the fakeness of the history.

If you have commit A then you write a branch A -> B -> C while someone else writes A -> D and they merge first, rebasing to get A -> D -> B' -> C' means B' was never something you wrote or tested. This code never existed on anyone's machine or had CI run on it before the rebase.

Does it really make sense to run CI on all of the new intermediate commits that rebase invented? What if some of those fail, are you really going to go through and fix tests for fake intermediate commits?

The solution for me has always been to squash. Inventing fake history is totally pointless and counterproductive. If you want cleanliness and bisectability via destroying the "real" history, just go ahead and really destroy it, don't invent a fake, possibly broken history.

LegionMammal978 15 hours ago [-]
I think the Rust project strikes a pretty good compromise for this issue: rebased linear branches with conflict-free merges onto master. When you make a PR, the CI sees whether your branch can be merged onto master without causing a conflict; if not, it directs you to rebase your branch onto the latest version of master. This check is repeated for all open PRs whenever the master branch is updated.

Once it's satisfied that your branch can be merged, it runs a subset of the tests, and throws an error if they fail. This way, even if you do rebase your branch, its latest commit will still be tested. (Having intermediate commits pass tests is encouraged but not required.) Finally, it regularly takes groups of 8 or so accepted PRs, tries merging them all in sequence, and runs the full test suite on the result. If it succeeds, the merge commits are pushed to master; if not, a human operator gets it to try again without the offending PR.

By your terminology, I suppose this would count as running CI on all the "invented" commits, and forcing PR authors to fix all their tests. But in practice, it's not too odious, since most PRs don't conflict (unless you're touching half the codebase), and any test failures from a non-conflicting change will get caught by the merge step.

dllthomas 4 days ago [-]
An easy to understand history that is correct about the relevant details... but "David didn't happen to run the formatter before his WIP commit that time" isn't ever going to be a relevant detail.
jen20 4 days ago [-]
That is fine. No-one in 5 years cares that a dev did 10 “fix test”, “ci format”, “fix misc derp”. Changes should be single units, such that bisect works cleanly to find the resulting bugs.
Snild 4 days ago [-]
It shows where you ended up, and with the help of `git reflog`, you can also show where you came from (in the same graph!).
nicoburns 4 days ago [-]
If you only rebase feature branches before merging to main/master/trunk then you still get most of the history.
arrowsmith 18 hours ago [-]
Why would I care how the changes came into being?
thomasfromcdnjs 4 days ago [-]
I've been using git since the first year it came out, I've never looked at it visually.
kaashif 18 hours ago [-]
I've been using git for ~15 years and I've also never looked at it visually, except by accident. And yes, this does include working on big repos with lots of other people. Maybe we're the weird ones.
kstrauser 17 hours ago [-]
You can look at it visually?
IshKebab 4 days ago [-]
Same reason anyone visualises any data. It makes it easier and quicker to understand (unless it is a mess due to not rebasing & too fine commit granularity).
sschueller 4 days ago [-]
The map? Sometimes it helps to get a visual queue (at least for me) but most of the time I don't need it.
secondcoming 4 days ago [-]
I've asked this before. I think it's down to flaws in development practices elsewhere.
planede 4 days ago [-]
problem statement: git history, as normally presented, is hard to follow and contains too much noise.

article's proposed solution: simplify the git history by destroying information that are irrelevant. That's what rebase is.

The problem is what is or isn't relevant depends on context. I think the right way to go about it is to simplify the presentation and otherwise improve tooling to get information out of the git history. git already has some ways to filter the history, but it lacks very feature rich query language, like mercurial. git guis should also step their game up.

robertlagrant 4 days ago [-]
I agree. I'd prefer a messy history but have a way to just see MRs into master/main as the main sequence of changes. Then be able to zoom in futher if necessary to see how an MR was arrived at.
planede 4 days ago [-]
> I'd prefer a messy history but have a way to just see MRs into master/main as the main sequence of changes.

git log --first-parent gets you there.

> Then be able to zoom in futher if necessary to see how an MR was arrived at.

Yeah, an interactive UI would be nice for this, maybe there are some, but I really only use the CLI.

breckenedge 4 days ago [-]
Surprised to not see a mention of the —no-merges flag.
adityaathalye 17 hours ago [-]
Rebasing-only is a manual way to "linearise history". Whereas, a git log is a query target. And the `git-log` tool is one's friend. I bash-alias these for quick use:

  git log --oneline # `gl`, to see "linear" history with branch annotations
  git log --oneline --merges # `glm`, to see merge commits only
  git log --oneline --graph # `glg` see the train tracks too, if I need to
And I know that git log has me covered, if/when I need to narrow / slice history more.
stevage 18 hours ago [-]
So why can't the tools that display history just filter out all the merge commits? I don't really understand why this requires merges to be done differently at the time.
jFriedensreich 15 hours ago [-]
To be able to work with large repos i found a few more important strategies:

- in your git client: enable "only follow first parent" for the base branch which only shows merge commits and only open merge commit history as needed/ when relevant, i'm only aware of fork.app doing this nicely but should be an option in more git clients

- hide all branches by default and only show your base branch and your own work, just let go of caring what others are doing, the only way to interact with other branches should be a review system at which point you can selectively pull in the relevant commits as needed to try locally. (exception is maybe a technical manager or team lead who should of course care somewhat if branches get abandoned or not cleaned up properly, but this is a completely separate workflow)

- Look at saplings notion of public vs draft commits, this is a game changer and works just as well for git, just not supported by the tooling in the same way. Don't try to argue and adopt workflows that are the same for both, they are completely different phases of work with completely different needs. In a nutshell: a) public commits are the ones that were pushed to a branch that is shared with anyone else. This maybe main, a feature branch worked on by multiple colleagues etc. These commits are considered immutable, never amended or rebased so the only option to bring them up to date with another branch is a merge. b) BUT every commit that is not part of one of these branches is considered a "draft" these are never merged into, always amended or modified and mutable draft state. It is considered rude to expose your internal working struggles like merging in master 100 times a day to avoid big merge conflict buildup or reverting changes into your work submitted to review, you submit a clean stack of pull requests, one per reviewable batch of changes

- rebasing and amending has a few other big advantages. One of them is that syncing to the base branch does not pollute history and can be done as often as possible without a down side. This allows keeping all work in sync without buildup of big hard to resolve conflicts. There is also tooling to do this for all your open work at once and to resolve conflicts smarter (.eg git rerere)

- merges should be set to allow only squash + rebase as this works the same for devs working with clean commits as well as devs keeping their commit history in their branches. this way at least the base branch is mostly clean

Charon77 16 hours ago [-]
There's this handy git config 'pull.rebase' in case you miss it.

A lot of the merge commit was triggered when you pull and merge instead of rebasing locally.

https://git-scm.com/book/en/v2/Git-Branching-Rebasing

tilsammans 4 days ago [-]
This doesn't look dirty or confusing to me at all.
gsliepen 4 days ago [-]
Ideally, your main branch is always in a known good state (every commit results in compilable code that passes the tests you have so foar), which makes it bisectable. Keeping all commit (including broken ones and subsequent fixes) in topic branches and then merging them without some squashing and rebasing gets in the way of that.
dllthomas 4 days ago [-]
> Keeping all commit (including broken ones and subsequent fixes) in topic branches and then merging them without some squashing and rebasing gets in the way of that.

It seems to me that, with respect to bisecting, a merge workflow with git bisect --first-parent is equivalent to a squash workflow with a bare git bisect. Am I missing some way in which that's not the case?

gsliepen 4 days ago [-]
You might not want to squash everything into one commit before merging, you can still have multiple commits in one (fast-forward) merge, as long as each of them is in a good state. This is made relatively easy by using the `--autosquash` feature of `git rebase`.

One issue I've seen a few times is that some commit in the middle of a topic branch is the problem, but if you didn't rebase it then that commit itself would look fine on top of the topic branch's parent. However, after merging it's now also on top of other commits, and the interaction with those was the problem. That makes it very hard to find such a problem. Rebasing the history of the topic branch before merging will make finding it much easier.

dllthomas 2 days ago [-]
Ah, I wasn't trying to say those were the only two options, or that either was what you were suggesting. I also typically prefer other things.

I just find that "always squash the entire branch" is a common reaction to "history is messy" and I wanted to surface that (per my understanding) it doesn't actually improve the situation (vis-a-vis bisect in particular, assuming you're passing the correct arguments for your situation) over merging (no-ff, I neglected to specify...) branches where some of the commits do not build.

17 hours ago [-]
chx 4 days ago [-]
git bisect is extremely, extremely powerful.

I have found an almost security hole with it in the dropping of a BC layer which just no one would've expected to do that. And because it was all deleted code tracking down the origin bug any other way is fair impossible.

larsnystrom 4 days ago [-]
The more I work with git, the more I wish there was a rebase (including squash/fixup) which kept the original commits, but hides them. I’m not sure how that would work in practice, but there is value in keeping all change history, and there is also value in having a readable commit history, but git does not let you do both.
GrantMoyer 4 days ago [-]
That's effectively what merge does. If you want to think of your branch as one linear history, then git merge creates a single commit in your branch which represents a collection of commits from some other space (and there's no rule that you need to use the default merge commit message). Then you use `git log --first-parent` to view your branch's simple linear history.
atq2119 16 hours ago [-]
Not really, for two reasons.

First, merge only ever allows you to arrive at a single commit, so it's strictly less powerful than rebase. With rebase, you could start with a sequence of three commits, "A", "B", and "squash! A", and turn that into two commits, "A'" and "B'". Merge doesn't let you do that.

Second, sometimes a merge commit really is semantically useful and you want it to be shown as a merge with two parents. There is no canonical way to distinguish this kind of merge from the kind of merge you seem to be thinking of.

Personally, I think the way to resolve this would be to have optional "squash/cherry-pick parent" metadata on a commit, so that commits that result from a cherry-pick or a rebase can point back to the original commit(s) in a more structured way (remember that a rebase is really just a sequence of cherry-picks). This metadata could also be used to preserve `git commit --amend` version history. Augment it with a "reverse diff" bit and it can be used to track reverts as well.

OJFord 4 days ago [-]
Work with it a bit more, discover reflog, and you'll find that's exactly what happens (until gc) ;)

It could be a bit more visible somehow though, I get the sentiment. Maybe it's more of an add-on to git's role though, at least without plenty else also becoming more visible/GUI-like too.

dieortin 4 days ago [-]
If it’s only saved until gc then it isn’t something you can rely on
dllthomas 4 days ago [-]
It's saved while it's in the reflog, and then saved until gc. How long things stay in the reflog is configurable.

The bigger deal is that things are never shared simply for being in the reflog - which is probably correct for its intended use but doesn't really fit what's asked for up thread.

tome 4 days ago [-]
> I wish there was a rebase (including squash/fixup) which kept the original commits, but hides them

That's exactly what rebase does.

(OJFord said that too, but buried the lede slightly, so I thought it worth saying in a single sentence.)

IshKebab 4 days ago [-]
I definitely agree with this. If you merge to this extent then you've basically given up on having an understandable history. Rebase is much better where possible.

I think a lot of the disagreement about this is really people talking about different things. Some people say "always squash; nobody cares about the trivial typo fix commits and whatnot" and other people say "never squash; you lose important history" and really they're both right... you should squash when it's not important to preserve the history. Obviously people are going to disagree about when that is but in my experience if a PR is big enough that you think it should be more than one commit then it's too big, unless it's a big feature branch that has been worked on by multiple authors.

Similarly with rebase vs merge, if it's a small single author PR then definitely rebase. For big feature branches you may want to use merges though I would still suggest rebase is better. You just need to make sure everyone is using the safety flags when they force push.

peanut-walrus 4 days ago [-]
You squash merge when stuff gets added to master (or any other shared/long-lived branch) and delete the development branch, nobody has absolutely any interest in what happened on your development branch. There you go, clean commit history if you care about that sort of thing.

Rebase workflows are awful and unintuitive. Leave the rebasing to the git wizards who actually know what they're doing, in no circumstances should this be part of your day-to-day work.

lolinder 4 days ago [-]
> Leave the rebasing to the git wizards who actually know what they're doing, in no circumstances should this be part of your day-to-day work.

This sounds like the refrain of someone who doesn't want to actually learn how one of the most fundamental tools of their profession works. I realize that git wasn't the optimal choice for the industry to settle on, but it's what we picked, and simply avoiding a feature that a majority (65%) of your peers use to at least some degree [0] will hamper your professional development.

Learn git. It's not pretty, but it's what we've got, and it's not going anywhere anytime soon.

[0] https://jvns.ca/blog/2024/03/28/git-poll-results/

peanut-walrus 4 days ago [-]
That poll shows exactly the problem with rebase workflows. 41% say they mostly rebase but 48% do not know that merge/rebase conflicts have swapped order of local/incoming changes. Granted, the percentages are exactly right that those 41% might not at all overlap with the 48% but how likely is that?

The reality is that most people work on their private branches, alone. In that case it makes almost no difference if you rebase or merge. In almost any other scenario, merging still works as expected, while rebasing without understanding git will almost definitely lead to losing work and spending an absurd amount of time resolving conflicts. Why would you want to inflict that on yourself? Just use the approach that always works instead.

Learning Git really isn't high on the priority list for most developers, as they know what they need to use to get stuff done. The complexity gets really high really fast, so it's quite understandable why most people treat Git like DNS or other infrastructure - it's there, I know the basics to get stuff done and if anything goes wrong I ask an expert to take a look. And guess what? There is NOTHING wrong with that.

nemetroid 4 days ago [-]
> 48% do not know that merge/rebase conflicts have swapped order of local/incoming changes

I'm well aware about the differing meaning of HEAD in merge and rebase, and if I had to think about it, would probably realize that it makes sense to always display HEAD first. And that as a consequence, the order would be swapped.

But I would definitely have answered "no" to the question as written.

happytoexplain 4 days ago [-]
> nobody has absolutely any interest in what happened on your development branch

I absolutely do. Every lead I've ever worked with absolutely does. If you merge an unnecessarily big commit to master, you are potentially making life very difficult in the future.

If you make a meandering series of commits during development, squash them for the PR. But please, please do not squash the entire feature upon merge.

Also, I'm not sure how rebase could be confusing except in the case where there are multiple commits with big conflicts, but that's rare, and you can make exceptions if the developer is really that unsure with git in those rare cases (or ask for help).

CamouflagedKiwi 4 days ago [-]
I disagree. It's a nice property that every commit in master is (conceptually) fully buildable & all tests passed. I've not found much benefit from having heaps of tiny commits and trying to work out their state; also cherry-picking (if needed) is simple with squash commits, but extremely difficult without.
lolinder 4 days ago [-]
That's why they said this:

> If you make a meandering series of commits during development, squash them for the PR. But please, please do not squash the entire feature upon merge.

I typically turn my PRs into a series of buildable and testable commits. I also put a lot of work into making those commits tell a useful story to the reviewer and to anyone doing git blame later, and squashing them all into one commit undoes that work.

dllthomas 4 days ago [-]
Buildable and testable, but notably sometimes not formatted or linted, when (atypically but not rarely) the semantic change is much easier to understand separated from the superficial changes it motivates in the context of the formatting/linting rules of the project.
lolinder 3 days ago [-]
Yep. And in a similar vein, not squashing allows you to separate a file move from changes to it so that git can track the move.
nemetroid 4 days ago [-]
Of course all commits should be buildable and pass tests. There’s a scale between ”heaps of tiny commits” and ”squash everything”.
dllthomas 4 days ago [-]
All commits that will be shared. There's nothing wrong with a local WIP commit when you want to plant a flag to compare your next changes against.
63stack 4 days ago [-]
Why are you interested in looking at individual commits? What can you possibly learn from that that you can't get from looking at the entire diff the PR is introducing?

Does your CI test each individual commit? Afaik most of them only test the top commit. How do you know/enforce that all the inbetween commits also build/pass tests?

How do you know how many commits to revert, if you need to revert the feature? Instead of reverting 1, now you have to revert N where N is not recorded anywhere.

Snild 4 days ago [-]
> Why are you interested in looking at individual commits?

Because they can explain their individual rationales, while still making the most sense to merge all together.

> What can you possibly learn from that that you can't get from looking at the entire diff the PR is introducing?

Ease of review (both before merge, and in the future when wondering why something was done a certain way). Saves me as a reviewer from having to guess which parts of the commit are meant to do what.

This, of course, means that every commit needs to be a reasonable change in itself -- fixup commits done while developing should be squashed into the original change with a local rebase (these are the "meandering" commits your parent post mentioned).

> How do you know/enforce that all the inbetween commits also build/pass tests?

I'm no CI expert, but I would hope that most systems allow this as an option.

> How do you know how many commits to revert, if you need to revert the feature? Instead of reverting 1, now you have to revert N where N is not recorded anywhere.

It's recorded in the merge, assuming you always make a merge commit.

Otherwise, since each commit is actually its own logical change, you figure it out the same way as you would figure it out in the "squash PR" model -- bisect to find it, then see if reverting it helps.

loloquwowndueo 4 days ago [-]
A well-crafted commit history for a merge conveys useful information and makes reviewers’ jobs easier by splitting a large change set into smaller logical chunks that are easier to reason about.

If the extra “noise” bothers you you can use —-first parent with git log.

shruggedatlas 4 days ago [-]
This wouldn't affect what the reviewer sees because the squash commit only happens after the PR is complete and is being merged to master
4 days ago [-]
bvrmn 4 days ago [-]
For more than 15 years of Git usage I've never seen a well crafted commit history for a single PR. It either a series of well crafted multiple PRs with separate reviews or a brain dumpster fire of multiple commits trying to make it work.
3 days ago [-]
loloquwowndueo 3 days ago [-]
Well I have seen them. It’s your anecdata vs my anecdata :)
bvrmn 3 days ago [-]
If you have public GH contributions you could show examples of PRs with stories told by commits and my anecdata would be invalid :)
Faaak 4 days ago [-]
Why would you squash merge if you have two different atomic commits ? Makes bisecting + reverting a pain... I'd avocate for a rebase instead
baryphonic 4 days ago [-]
> Rebase workflows are awful and unintuitive. Leave the rebasing to the git wizards who actually know what they're doing, in no circumstances should this be part of your day-to-day work.

Hard disagree. I hardly consider myself a "rebase wizard," but I've been a near-exclusive practitioner of rebase workflows since I can remember. I find rebasing much more intuitive than workflows with merge commits. Squash merges are fine, but with proper intuition, they appear like a special case of rebase.

In my experience, the resistance to rebasing comes down to fears about "rewriting history" and false intuitions about how git works. I usually allay the former by pointing out that squash merges - which almost everyone approves of - also rewrite history. The latter issue seems to arise from arrows in popular git visualizations pointing in the wrong direction, e.g. in Gitflow.[0] In git, the child commit points to its parent, because each node is immutable. The git data structures are extremely simple (hence why git is so named), consisting of blobs, trees, commits, tags and references. Once you understand how these work in practice, rebase becomes intuitive.[1][2]

IMO, the only thing unintuitive about git is the CLI. Translating the graph operation I want into the commands is sometimes a challenge. Maybe that makes me a (frustrated) wizard after all?

[0] https://nvie.com/img/git-model@2x.png

[1] https://speakerdeck.com/pbhogan/power-your-workflow-with-git...

[2] https://eagain.net/articles/git-for-computer-scientists/

tichiian 4 days ago [-]
> In my experience, the resistance to rebasing comes down to fears about "rewriting history" [...]

There is 'git bisect' which only gives you detailed answers if your commit history is as fine-grained as possible while still being compileable and testable. Also, changing commit IDs on branches that are visible to others are a problem.

Which is why my approach is as follows: Work on private branches, one per feature/fix/..., then do interactive rebase to create a compilable and testable patch series out of those onto a for-review branch. While developing, rebasing onto whatever public branch you want to merge with next is of course OK and necessary. When you think the feature in the for-review branch is ready, do a merge-request into the public target (mostly devel) branch as usual, and either merge or rebase, whatever you like best.

But: Whatever branches are public are only ever merged. Never squash-merged, never rebased, never force-pushed, never filter-branched (except maybe after a court order). Because all commits on those branches are necessary to trace what people were doing with the code. All those commits and their relations are necessary for git-blame, bisect, sloccount and other things. Any commit ID there could wind up in some test binary, release ID or stuff, and you absolutely truly need those later on. And while a simple rebase might keep some of the necessary details intact (the other branch will still be there after all), only a plain merge will also preserve all the relations between the branches properly.

bvrmn 4 days ago [-]
Squash and merge is another source for beginners confusion. Now they have two different histories (main and local feature branch) with the same content. It's often case they want to base following work on feature branch (staled history) not start from main and get potential conflicts.
madeofpalk 4 days ago [-]
The beauty of this is that the rebase wizards are free to rebase all they want in their feature branches, but everything is squash-merged ito main and no one has to know.
lolinder 4 days ago [-]
This is terrible for "rebase wizards" because about 80% of the reason why I rebase is to make my commits useful when someone inevitably needs to do a git blame in 5 years to understand the history of this code. A squashed commit can tell them "this was part of this feature", but a well-crafted history plus a merge commit with the PR name and number can tell them that and show what specific code changes were related to one another in one atomic step towards a feature.
tome 4 days ago [-]
But one reason that rebase wizards might curate a a rebased branch of small commits is exactly so that the small commit structure remains on merge to main. That way they can track down any problems with it more easily in the future.

(Ironically, I've found that this style of development makes it less likely for bugs to be introduced in the first place.)

echelon_musk 4 days ago [-]
> end up with git commit history which looks like London tube map

I've saved you a click. TFA has nothing to do with TFL beyond this line.

fargle 16 hours ago [-]
the way i look at it is this: we actually use git for at least two different but related things

- a public change history (e.g. commit history of master, dev-1.0, whatever)

- a personal change history, especially useful when prototyping a fix or feature on some independent branch.

you might even call this independent branch "master" but it's your personal version of master in a clone on your laptop. it's a time machine, you can go backward/forward. you can make a change and then reverse course. you can commit every work-in-progress that compiles as you work get it running or passing tests. whatever you like. you can pull or merge or rebase as you wish.

when submitting it to the public "master" branch, you are effectively saying "i want to commit this package of changes on top of the master branch.

do you really want to see "fhgiry pulled from master into fix-splork-feature on date x" and https://xkcd.com/1597/ in your public history of master? every branch you merged from or rebased on and every failed attempt and spelling mistake and WIP and addressing of review comments?

or do we want to see a series of commits that just record the end result in a series of nice little patches that simply add a finished changes like: "splork: fix canoe paddle explosion issue by decreasing default gamma"

the common ubiquitous guidance to "never rewrite history" is nearly always valid for the public history.

converting a messy personal history into a neat series of patches/changes/commits that apply cleanly onto the latest public master should not fall under that same guidance. i'd say it's closer to "never rewrite public history"

juped 15 hours ago [-]
What a disjointed writeup of a terrible idea. We only use git _because_ its history is a graph rather than a sequence, freeing us from the necessity to ever acquire locks on fellow humans. There's a reason git.git never does anything like this, despite the history being completely constructed by Junio Hamano (who applies patches with git am, rather than taking a set of commits from someone's remote as is typical on Github). A clean history full of useful information looks like git.git's history: nonlinear.
cess11 16 hours ago [-]
I have never seen something like the example consisting of pretty much only merges, and most places where I've worked didn't use rebase or squash or whatever it's called. It has also been uncommon with 'fix' and 'did a thing' commit messages, usually people type in what they actually did, except in one code base but that was someone who really didn't enjoy software development.
16 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 07:09:36 GMT+0000 (Coordinated Universal Time) with Vercel.