Python in Docker
I've been playing around a lot lately with Docker containers.
I know, I know; there've been a lot of blog posts about Docker lately, the whole thing is practically a religious war by now. But that's not what I want to talk about. You like Docker? That's fine. You hate it? That's fine, too.
This article isn't about the merits of docker-ization or anything like that, I'm not going to try to convince you adopt a dockerized infrastructure or play up the benefits you'd get by doing so (dev-prod parity! self-healing infrastructure! a cure for male-pattern baldness!). My only goal here is to explain how I built Python containers that are conducive to a rapid development cycle. Do you use Python in Docker? Read on.
Basically, this post is about the volumization problem. Take the following Dockerfile:
1 2 3 4 5 6 7 8 9
This Dockerfile does a lot of things right: it's clean and simple, it copies the requirements before the project to help with layer caching, and it should be easy to understand for... well, pretty much everyone.
Do you see what it does wrong?
1 2 3 4 5 6
Woah, why didn't that second test fix it?
The problem here is that Docker implicitly adds a build step to an app that otherwise wouldn't need one. This increases development time horrendously -- what used to be a simple alternation between coding and running tests now includes a third, sometimes long, build step.
Well, how do we fix that?
There's a few options here, pretty much all of which involve volumes. With this flag, we can mount any folder on our host machine to any folder on a container.
1 2 3 4 5 6
Perfect! Marvelous! We're done here, everything works! Good night everyone, hope you enjoyed this post.
Still here? Well, if you're still here, that probably means you ran into the same issue I did and are looking for a real solution. What's this additional problem?
Let's pretend you have the following test:
1 2 3 4
Pretty good test, eh?
Now, if you have a test like this, even with the above volumization fix, you still will need to re-build the container before you can get a passing test.
Why is this?
Well, the answer lies in our Dockerfile. When we do a
pip install <folder>,
we're basically making a copy of the current state of that directory and
copying it to a more system-wide location. When we change the original
directory -- either manually within the container or by doing a volume mount,
this system-wide install is not updated to reflect the changes.
In order to have the installed version of a package updated when we make any changes, pip provides "editable" mode:
So what if we add the
-e flag to our
pip install /src line in the
Well, it turns out there's a problem with this approach: the "editable" flag creates a "my-cool-project.egg-info" file in your project's directory that is basically used to make that system-wide installation I talked about earlier into a symlink to this directory.
See the problem yet?
When you volume-mount your host machine's project directory over the one in the
container, it won't have an
egg-info file. Instead, you'll end up with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I crawled through the pip documentation for a way to fix this; as far as I can tell there is no solution to this. Only the "editable" flag solves our problem, but it will only ever create an "egg-info" in the project's directory.
Now here's our saving grace: at its core, Python package management has a...
easy_install... each of these
projects has contributed a bit towards making the state of Python package
management into what it is today. How about if we go backwards a little bit?
pip install -e <folder> is just a wrapper around calling
&& python setup.py develop, with a bit of added goodness. What if we were to
call this directly?
Aha! This looks like it works almost exactly the same as pip's editable mode, except for one crucial difference: rather than creating the "egg-info" in the project directory, it creates it in the current directory.
What if we install our project from a directory that won't be mounted over? It
turns out this works perfectly, the only catch being we need to update the
$PYTHONPATH to deal with our fuckery.
So what does our Dockerfile look like now?
1 2 3 4 5 6 7 8 9 10 11
Note there are two changes here: we add
$PYTHONPATH so that the
"egg-info" file which will be created in
site-packages has a way of finding
our project directory, and we use
python setup.py develop rather than pip in
editable mode. Note that you can use any directory here, I just used the global
site-packages for consistency with other packages.
Now, we can either run our container without a volume mount and get the
standard behaviour we would expect out of a Docker container or we can mount
our project directory to
/src and gain the ability to do rapid development
without needing to rebuild the container!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Happy hacking, folks!