Edward Thomson

Git with Unity

March 22, 2018  •  11:54 PM
This is a follow-up to my post earlier this week introducing the correct .gitattributes settings for line endings; it dives a little bit more deeply into some of the configuration that you might be interested in if you're getting started building games in Unity.

Around the VSTS team, we've got a lot of gamers. A couple of people on the team used to develop commercial games, or even Xbox itself. Some people build games in their spare time, as hobby projects. And of course some don't want to develop games, but still love to play them.

And me? I'm actually none of those. I get excited every time there's a new Mario platformer, but otherwise, I don't really play games. But I wanted to try my hand at hacking on them, so I decided to grab Unity and give it a spin.

So far I haven't actually built anything useful, but I did start to understand how Unity fits in with Git.

.gitignore

The .gitignore file is a metadata file that controls how Git operates on your repository. Files listed in .gitignore will be — like the name implies — ignored. They won't show up in git status and they won't be added to the repository.

It's important to make sure that you .gitignore your build output directories, any cache data and temporary files or directories that your tools make. And Unity actually has a lot of these, including Build directories and a cache directory, Library.

Thankfully, you don't have to know what all these directories are. You can instead go to the gitignore repository, which contains crowd-sourced best practices that you can just drop into your repository. There's a .gitignore file customized for each type of project you might encounter. So you can just grab the Unity .gitignore, drop it into your repository, and go.

Easier still: if you use a Git hosting provider like GitHub or Visual Studio Team Services, you can select one of these .gitignore files when you create the repository. We have the same crowd-sourced .gitignore files ready to go, so that when you create a repository they're there for you to get started with.

Unity .gitignore in VSTS

.gitattributes

The other metadata file that Git uses is called .gitattributes. You might be familiar with .gitattributes because you use it to control how Git handles line endings for your file.

You do use .gitattributes to configure your line endings, don't you?

But .gitattributes is more than just line endings - you can also configure how files are merged when two people change the same file in two different branches.

This means that you can set up Unity's "Smart Merge" functionality. By default, Git is totally unaware of the type of content that you're checking in. If a file is changed in two different branches, it will try to merge the file just by looking at the lines, without understanding them.

But Unity includes a semantic merge tool that understands the actual contents of the scene files, so it can help deal with merging them. You just need to configure .gitattributes to use it.

You can add these lines to your .gitattributes:

*.anim merge=unityyamlmerge eol=lf
*.asset merge=unityyamlmerge eol=lf
*.controller merge=unityyamlmerge eol=lf
*.mat merge=unityyamlmerge eol=lf
*.meta merge=unityyamlmerge eol=lf
*.physicsMaterial merge=unityyamlmerge eol=lf
*.physicsMaterial2D merge=unityyamlmerge eol=lf
*.prefab merge=unityyamlmerge eol=lf
*.unity merge=unityyamlmerge eol=lf

Git LFS

One of the great things about Git is that it's a distributed version control system. That means that you get an entire copy of the repository from the server. That means not just all of the files in the current version of the branch that you're interested in, but all the branches, and all the history that you've ever checked in.

This means that you can work completely disconnected from your server: you can run git log or git blame to analyze the changes that have been made, even if you're on an airplane1.

But it's problematic when you're checking in large files. If you have assets like images, audio or movies, Git starts to choke. And it's not even the size of the assets themselves as much as the history that's problematic.

If you have a 100K PNG, then that's not so bad. The problem is that you've changed that 100K ping a dozen times. Now you've got 1.2 MB in history that you have to download every time you run git clone. And that's just one file. So it adds up very quicky.

Git LFS helps here: it's the Large File Storage extension to Git.

Instead of storing every copy of these assets in the repository directly, Git LFS stores this data in a separate location, the large file storage area. In the repository, it just checks in a little stub file, the "git-lfs pointer file", that lets Git LFS know where it can get the data when it needs it.

So when you clone the repository, you don't download all those assets, just the tiny (128 byte) git-lfs pointer files. When git needs the files, to write them to your working directory, Git LFS will download them from the server and put them on disk. It's a nice hybrid system between a totally distributed version control system, and a centralized system.

You can download git-lfs - or, if you use Git for Windows, it's already included. It's easy to set up, you just add some more lines to your .gitattributes file to make sure that your textures and artwork are handled by LFS:

*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text

*.wav filter=lfs diff=lfs merge=lfs -text
*.ogg filter=lfs diff=lfs merge=lfs -text
*.mp3 filter=lfs diff=lfs merge=lfs -text

*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text

*.fbx filter=lfs diff=lfs merge=lfs -text
*.blend filter=lfs diff=lfs merge=lfs -text
*.obj filter=lfs diff=lfs merge=lfs -text

Locking

Git LFS 2.0 introduces the ability to put advisory locks on files. This is critical if you're working with multiple artists. Otherwise, two people might start working on the same image. When they go to merge their branches, they'll realize that they've both done this work, and they have a merge conflict.

Unfortunately, there's no "smart merge" for images. They'll have to figure out how to resolve this manually - probably losing one of the other's work.

The new locking functionality does require additional support on the server. Both GitHub and Visual Studio Team Services offer locking, so if you're using one of those services, you can just run:

git lfs lock file.png

to lock a file. When you've finished editing, and want to unlock it, you can run:

git lfs unlock file.png

I'm excited to continue playing with Unity for building games. But to be completely honest, I'm even more excited to use a totally new tool with Git. There's a lot of new functionality here, and I'm looking forward to learning how the Git community can help make Unity developers even more productive with version control.

  1. I'm old enough to remember when airplanes didn't have Wifi. Back in those bad old days, the version control nerds used to talk about "working on an airplane" meaning "working without being able to talk to your version control server". 

Git for Windows: Line Endings

March 20, 2018  •  5:04 PM

If you’re on a team of Windows developers - or more importantly, on a cross-platform development team - one of the things that comes up constantly is line endings. Your line ending settings can be the difference between development productivity and constant frustration.

The key to dealing with line endings is to make sure your configuration is committed to the repository, using .gitattributes. For most people, this is as simple as creating a file named .gitattributes at the root of your repository that contains one line:

* text=auto

With this set, Windows users will have text files converted from Windows style line endings (\r\n) to Unix style line endings (\n) when they’re added to the repository.

If you're bored already, you can probably stop reading right now. For most developers - in most repositories - this is all you need to know.

Why not core.autocrlf?

Originally, Git for Windows introduced a different approach for line endings that you may have seen: core.autocrlf. This is a similar approach to the attributes mechanism: the idea is that a Windows user will set a Git configuration option core.autocrlf=true and their line endings will be converted to Unix style line endings when they add files to the repository.

The difference between these two options is subtle, but critical: the .gitattributes is set in the repository, so its shared with everybody. But core.autocrlf is set in the local Git configuration. That means that everybody has to remember to set it, and set it identically.

The first, best option you have to get this right is when you’re installing Git for Windows:

Git for Windows Installer: core.autocrlf

You probably want the first option, but you’d be forgiven if you didn’t know that the first time you ran the installer.

The problem with core.autocrlf is that if some people have it set to true and some don’t, you’ll get a mix of line endings in your repository. And that’s not good - because his setting doesn’t just tell Git what you want it to do with files going in to your repository. It also tells Git what you’ve already done, and what the line endings look like on the files that are already checked in.

This is why one of the most common symptoms of a line ending configuration problem is seeing “phantom changes”: running git status tells you that you’ve changed a file, but running git diff doesn’t show you any changes. How can that be? Line endings.

Phantom Changes

Imagine that some file is checked in to your repository with Windows-style line endings. For some reason, somebody hadn't set core.autocrlf=true when they added the file. You, on the other hand, being a diligent Git for Windows user, did set that option.

When you run git status, git will look at that file to decide whether you've made any changes to it. When it compares what's on disk to what's in your repository, it will convert the line endings on-disk from Windows-style style to Unix-style in the repository. Since the existing file in the repository had Windows-style line endings, and you expect them to be Unix style, git will determine that the file is different. (It is, byte for byte, different.)

By using .gitattributes, you ensure that these settings exist at the repository level, instead of leaving it up to individual users to understand to configure correctly. This means there’s no opportunity for misconfiguration by an individual user.

Of course, the best time to set this up is at the very moment you create your repository, before you add any files. Doing it after the fact means that you may still have some files added with the wrong configuration.

Over time, these files will be updated as you edit them. You can try to renormalize files, updating the line endings, but doing so will cause annoying merge conflicts for anybody who created a branch before the renormalization.

What About Binaries?

Generally speaking, git is pretty good at detecting whether a file is a binary or not. If it decides that a file is a binary, then it will refuse to convert line endings. But it's still good practice to configure git not to convert line endings for your binary files.

You can remove the text attribute from files that you don't want to have line ending conversions. For example, if you have PNGs in your repository, your .gitattributes might look like this:

* text=auto
*.png -text

Of course, there are more advanced settings in your .gitattributes that can be applied. These are especially useful in particular development scenarios. We'll dive deeper into some of those - like using Unity - in the next blog post.

Git Security: Further Reading

February 21, 2018  •  7:31 PM

In my talk at Git Merge today, I mentioned that there were some other difficulties handling security issues in Git. Here's some more information:

Talking about Git at Barcelona .NET Core: 6 March 2018

February 21, 2018  •  7:31 PM

I'm excited to be speaking at the Barcelona .NET Core group on Tuesday, 6 March, 2018. I'll be talking about How Microsoft "Got Git", adopting the Git version control system in Visual Studio and VSTS, and "Scripting Git", how to build .NET applications that interact with Git repositories using LibGit2Sharp.

Please join me! Especially if you're already in town for the Git Merge conference happening on the 7th and 8th of March. And if you're on the fence about coming to Git Merge, I can offer you 15% off your ticket, just because you seem like a nice person. When you register, just use the code:

GITCOMMUNITY15

I hope to see you there!

Merge vs Rebase: Do They Produce the Same Result?

December 21, 2017  •  10:29 PM

I get asked quite a lot whether I recommend a merge-based workflow, or one where people rebase onto master. But to be quite honest, I couldn't possibly care less. Your workflow is your workflow after all, it's up to your team to work in the way that's most productive for you. For some teams that's merging, for some teams that's rebasing… n the end, the code gets integrated and the end result is the same either way, whether you merge or rebase it, right?

Right?

If you're a rebase fan, you've probably run into cases where you get conflicts during a rebase that you wouldn't get during a merge. But that's not very interesting… is there a case where merge and rebase both finish and produce a result, but a different tree?

Is git-merge guaranteed to produce the same results as git-rebase?

No!

It's actually not a guarantee; in fact, you can create two branches that merge differently than they rebase. To avoid any spoilers, I've hidden the details in case you want to think about this on your own. 🤔 Click "expand" below to see the details.

Click to expand…

You can follow along with this GitHub repository.

Two branches

Imagine that you have two branches, one is master, and the other is the unimaginatively named branch branch. They're both based off a common ancestor 0d7088f. Further, imagine that your branch has two commits based off that common ancestor:

Ancestor 0d7088f branch 3f3ca4f branch 09d3ac4
One One One
Two 2 Two
Three Three Three
Four Four Four
Five Five Five
Six Six Six
Seven Seven 7
Eight Eight Eight

Finally, imagine that your master branch has a single commit based off the common ancestor:

Ancestor 0d7088f master f2e864b
One One
Two 2
Three Three
Four Four
Five Five
Six Six
Seven Seven
Eight Eight

What happens when you try to merge or rebase these?

Merge

When Git merges two branches, it only look at the tip commit in each branch, and compares them to their common ancestor. It does not look at any intermediate commits. In the above example, when we merge branch into master, the algorithm looks at the changes made in branch by comparing commit 09d3ac4 to the common ancestor commit 0d7088f. It also looks at the changes made in master by comparing commit f2e864b to the common ancestor commit.

The merge algorithm compares each line1 in the common ancestor, comparing it to the file in branch and the file in master. If the line is unchanged in all branches, then there's no problem - that line is brought into the merge result. In this example, line 1 in unchanged in both branches, so line 1 of the merge result will be One.

If a line is changed in only one branch, then that change is brought forward into the merge result. In this example, line 7 is changed only in branch. So in the resulting merge, line 7 will have the contents from branch, which is the digit 7. Also, line 2 is changed only in master, so in the merge result it will be the digit 2.

Merge Result
One
2
Three
Four
Five
Six
7
Eight

Remember that merge only looks at the tip commits, so comparing the common ancestor to branch, line two appears unchanged, since the ancestor and tip are identical.

Rebase

Rebase works a bit differently - instead of doing a three-way merge between the tip commits on each branch, it tries to replay the commits on one branch onto another. In the above example, if we want to rebase branch onto master, then Git will create a patch for each commit on branch and apply those patches onto master.2

When you rebase, Git will switch you to the master branch, checking out f2e864b. Then Git will apply the differences between the common ancestor and the first commit on branch. In this example, the patch between the common ancestor and the branch changes line two from Two to 2. But that's already the value of the file in master. So there's nothing to do, and the patch for 3f3ca4f applies cleanly.

Then a patch for the second commit on the branch is applied: it changes like two back to the text representation, and changes line seven to a digit. So the rebase result is:

Rebase Result
One
Two
Three
Four
Five
Six
7
Eight

So rebase preserves the changes in the branch while merge preserved the changes in master.

Conclusion

Generally these sorts of changes will cause a conflict instead of different results. It was key that in branch we changed the contents of line 2 back to the contents in the common ancestor. That allowed the merge engine to consider that the line in branch was unchanged.

Merge Result Rebase Result
One One
2 Two
Three Three
Four Four
Five Five
Six Six
7 7
Eight Eight

So… is this a problem?

It might seem concerning that this comes up when there was an apparent revert of your changes. Logically, both the branch and the master branches changed line two, but then branch changed it back. So although this seems rather derived, it's not that unlikely.

But whether you prefer a merge workflow or a rebase workflow, you should be careful of your integration and following good development practices:

  1. Code review, ideally using pull requests, so that your team members have visibility into changes before they're integrated into master.

  2. Continuous integration builds and tests, as part of your integration workflow. Ideally, with build policies to ensure that builds succeed and tests pass.

So make sure to do proper code reviews, which keep this an interesting difference instead of an actual problem in your workflow.

  1. Strictly speaking, the merge engine doesn't actually look at lines, it looks at groups of lines, or "hunks". But it's easier to reason about individual lines for this example. 

  2. By default, rebase will create and then apply patches, but when invoked with git rebase --merge then it will cherry-pick the changes. This uses the merge engine instead of patch application, but in this example, the results are the same.