My workflow for “SVN to Git and back”

You are a happy user of SVN? Fine. You probably don’t need my receipt.

You use Git wherever possible but sometimes you need to resort to SVN as not everybody has yet seen the beauty of the Git workflow? 🙂 Good. Maybe this workflow fits yours.

We’ve been starting to use the Object RDF Mapper SuRF in our team and are happy with its overall support. There are however a few cases missing and some issues here and there we’d like to fix. Now SuRF uses Google Code for hosting which is perfect for many projects. I myself host two of my projects there. But it’s showing its age a little and Github is the place to got for all my new projects.

So while working a bit on SuRF which is hosted on SVN I’d like to use the beauty and ease of Git on Github without making it impossible to keep the fork in sync.

If you come from the SVN world, you might understand “fork” differently to those from a Git background. Forking with the former is a generally negative process as a project splits up and people who probably worked together before now go their own ways. With git however forking and merging back is so easy that everybody forks right away and then creates a merge request (a Github feature) to sync the changes back to “upstream”, the main author’s source.

So now that I get a personal Git account I can move freely and also share my changes with the world. No need to have my patches hidden away separately in a bug tracker. Also in general, once the main projects looses momentum others are readily available to take over. This is especially useful with smaller projects I’d say.

Synchronizing Git from SVN

There are several articles out there on how to set up a bridge between SVN and Git. I found http://www.fnokd.com/2008/08/20/mirroring-svn-repository-to-github/ to be very helpful.

Using git-svn the author sets up a repository on his server and creates a “vendor” branch to be updated with changes from SVN. A cron job then regularly checks the SVN repository for changes and pushes them to the Github Git repository. Own changes go to the master branch which can from time to time has the “vendor” branch merged in via “git merge origin/vendor”.

Getting changes back to SVN

In my case as I will play around a bit to test different ideas not all changes should go straight back to SVN. Also as I’m not sure what policy upstream applies for changes I am not interested in directly pushing changes upstream. Also, I’ve already set up a project in the past that tries to run Git and SVN both ways which resulted in a not-so-nice Git history due to the incompatibilities between both versioning systems.

Instead I will focus on extracting patches to go upstream.

However, nothing is as easy as it seems, and git doesn’t produce Diffs consumable by SVN. There is a short script from http://gist.github.com/582239 called git-svn-diff which will do the conversion from the diff between the last SVN version and the version given to the script. As this script needs the git-svn history locally I simplified the script and documented the workflow for everybody to use. You’ll find the script under http://github.com/cburgmer/surf/blob/master/git-svn-diff.sh.

The following steps will download (“clone”) the repository, create an empty branch and pick the given Git commit that we will create the diff for. Not too simple but doable:

$ git clone git://github.com/cburgmer/surf.git$ cd surf$ git checkout -b mydiff origin/surfrdf$ git cherry-pick COMMIT_SHA1_ID$ wget https://github.com/cburgmer/surf/raw/master/git-svn-diff.sh$ sh git-svn-diff.sh > /tmp/my_patch.diff

This patch can then be applied to SVN from the source directory:

$ patch -p0 < /tmp/my_patch.diff

Enjoy.

Update: Fixed an issue with selecting the right branch

PIP requirements.txt and setup.py

PIP is Python’s next generation package installer following in the footsteps of easy_install. It is pretty handy in conjunction with virtualenv, you probably should be using if you develop Python modules. Virtualenv creates isolated Python environments you can install packages to without the need to use your system’s (or distribution’s) packaging service (e.g. “aptitude”) or more importantly for keeping around different versions of the same software.

A neat feature about PIP is the feature to describe all dependencies in a simple text file called “requirement.txt“. With PIP you can install from almost everywhere: the Python Package Index (PyPI), from a arbitrary download site, from svn, git and other versioning systems. This file comes in handy when setting up a virtual environment with virtualenv. A simple “pip install -r requirements.txt” will download and install all dependencies described there.

Complementary to that, “setup.py” which is used by distutils and setuptools to package python modules already comes with dependencies handled with keywords to the setup call named  “install_requires” (and “dependency_links” and “extras_require”).

For the project pdfserver I wanted to use both worlds for a good reason: If somebody wants to install the package via “pip install pdfserver” all dependencies should be fulfilled during the install in setup.py. But as the software is to be deployed on a server many users will probably download and extract the source package and then created a virtualenv around it using “pip install -r requirements.txt”.

As I found no simple solution to answer my needs, I extended pdfserver’s setup.py to parse the dependencies given in a requirements.txt file. The file is parsed twice, first to extract all dependency names and then again for all URLs for packages not found on PyPI. What it doesn’t do so far is parse the versioning information. 

Here’s the basic code:

def parse_requirements(file_name):    requirements = []    for line in open(file_name, 'r').read().split('n'):        if re.match(r'(s*#)|(s*$)', line):            continue        if re.match(r's*-es+', line):            requirements.append(re.sub(r's*-es+.*#egg=(.*)$', r'1', line))        elif re.match(r's*-fs+', line):            pass        else:            requirements.append(line)    return requirementsdef parse_dependency_links(file_name):    dependency_links = []    for line in open(file_name, 'r').read().split('n'):        if re.match(r's*-[ef]s+', line):            dependency_links.append(re.sub(r's*-[ef]s+', '', line))    return dependency_linkssetup(    install_requires = parse_requirements('requirements.txt'),    dependency_links = parse_dependency_links('requirements.txt'),    ...)

You can find the full setup.py here.

With this setup all dependencies go to requirements.txt and setup.py will automatically pick them up.

Distutils2 will make setup.py obsolete so I read. Let’s see how to handle dependencies then. For now this works for me.

Update: Distribute in Version 0.6.14 cannot install from distributed version control systems (DVCS). So urls like git+git://… won’t work with “python setup.py install”. Pip though can still install packages built with “python setup.py sdist” (even thought with a bug: http://bitbucket.org/ianb/pip/issue/194/dvcs-support-for-uppercase-package-na…