Python in Docker
I've been playing around a lot lately with Docker containers.
I know, I know; there've been a lot of blog posts about Docker lately, the whole thing is practically a religious war by now. But that's not what I want to talk about. You like Docker? That's fine. You hate it? That's fine, too.
This article isn't about the merits of docker-ization or anything like that, I'm not going to try to convince you adopt a dockerized infrastructure or play up the benefits you'd get by doing so (dev-prod parity! self-healing infrastructure! a cure for male-pattern baldness!). My only goal here is to explain how I built Python containers that are conducive to a rapid development cycle. Do you use Python in Docker? Read on.
Basically, this post is about the volumization problem. Take the following Dockerfile:
1 2 3 4 5 6 7 8 9
FROM python:3.5.2-alpine COPY requirements.txt /src/requirements.txt RUN pip install --no-cache-dir -r /src/requirements.txt COPY . /src RUN pip install /src CMD my-cool-project
This Dockerfile does a lot of things right: it's clean and simple, it copies the requirements before the project to help with layer caching, and it should be easy to understand for... well, pretty much everyone.
Do you see what it does wrong?
1 2 3 4 5 6
$ docker build -t my-cool-image . $ docker run --rm -it my-cool-image py.test /src # ... 1 test failure ... $ echo "fix my app" >> my-app.py $ docker run --rm -it my-cool-image py.test /src # ... 1 test failure ...
Woah, why didn't that second test fix it?
The problem here is that Docker implicitly adds a build step to an app that otherwise wouldn't need one. This increases development time horrendously -- what used to be a simple alternation between coding and running tests now includes a third, sometimes long, build step.
Well, how do we fix that?
There's a few options here, pretty much all of which involve volumes. With this flag, we can mount any folder on our host machine to any folder on a container.
1 2 3 4 5 6
$ docker build -t my-cool-image . $ docker run --rm -it -v $(pwd):/src my-cool-image py.test /src # ... 1 test failure ... $ echo "fix app" >> my-app.py $ docker run --rm -it -v $(pwd):/src my-cool-image py.test /src # ... all tests pass ...
Perfect! Marvelous! We're done here, everything works! Good night everyone, hope you enjoyed this post.
Still here? Well, if you're still here, that probably means you ran into the same issue I did and are looking for a real solution. What's this additional problem?
Let's pretend you have the following test:
1 2 3 4
def test_everything(): import my_cool_project assert my_cool_project.does_everything()
Pretty good test, eh?
Now, if you have a test like this, even with the above volumization fix, you still will need to re-build the container before you can get a passing test.
Why is this?
Well, the answer lies in our Dockerfile. When we do a
pip install <folder>,
we're basically making a copy of the current state of that directory and
copying it to a more system-wide location. When we change the original
directory -- either manually within the container or by doing a volume mount,
this system-wide install is not updated to reflect the changes.
In order to have the installed version of a package updated when we make any changes, pip provides "editable" mode:
$ pip install --help | grep editable -e, --editable <path/url> Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
So what if we add the
-e flag to our
pip install /src line in the
Well, it turns out there's a problem with this approach: the "editable" flag creates a "my-cool-project.egg-info" file in your project's directory that is basically used to make that system-wide installation I talked about earlier into a symlink to this directory.
See the problem yet?
When you volume-mount your host machine's project directory over the one in the
container, it won't have an
egg-info file. Instead, you'll end up with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Traceback (most recent call last): File "/usr/local/bin/my-cool-project", line 5, in <module> from pkg_resources import load_entry_point File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2927, in <module> @_call_aside File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2913, in _call_aside f(*args, **kwargs) File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 635, in _build_master ws.require(__requires__) File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 943, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/local/lib/python3.5/site-packages/pkg_resources/__init__.py", line 829, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'my-cool-project' distribution was not found and is required by the application
I crawled through the pip documentation for a way to fix this; as far as I can tell there is no solution to this. Only the "editable" flag solves our problem, but it will only ever create an "egg-info" in the project's directory.
Now here's our saving grace: at it's core, Python package management has a...
easy_install... each of these
projects has contributed a bit towards making the state of Python package
management into what it is today. How about if we go backwards a little bit?
pip install -e <folder> is just a wrapper around calling
&& python setup.py develop, with a bit of added goodness. What if we were to
call this directly?
$ python setup.py --help-commands | grep develop develop install package in 'development mode' to the current working directory
Aha! This looks like it works almost exactly the same as pip's editable mode, except for one crucial difference: rather than creating the "egg-info" in the project directory, it creates it in the current directory.
What if we install our project from a directory that won't be mounted over? It
turns out this works perfectly, the only catch being we need to update the
$PYTHONPATH to deal with our fuckery.
So what does our Dockerfile look like now?
1 2 3 4 5 6 7 8 9 10 11
FROM python:3.5.2-alpine ENV PYTHONPATH=/src COPY requirements.txt /src/requirements.txt RUN pip install --no-cache-dir -r /src/requirements.txt COPY . /src RUN cd /usr/local/lib/python3.5/site-packages && python /src/setup.py develop CMD my-cool-project
Note there are two changes here: we add
$PYTHONPATH so that the
"egg-info" file which will be created in
site-packages has a way of finding
our project directory, and we use
python setup.py develop rather than pip in
editable mode. Note that you can use any directory here, I just used the global
site-packages for consistency with other packages.
Now, we can either run our container without a volume mount and get the
standard behaviour we would expect out of a Docker container or we can mount
our project directory to
/src and gain the ability to do rapid development
without needing to rebuild the container!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
$ docker run --rm -it -v $(pwd):/src my-cool-image py.test /src ======================= test session starts ================================ platform linux -- Python 3.5.2, pytest-3.0.4, py-1.4.31, pluggy-0.4.0 rootdir: /src, inifile: collected 1 items tests/test_all.py F ============================= FAILURES ===================================== ____________________________ test_thing ____________________________________ def test_thing(): import my_cool_project > assert my_cool_project.does_everything() E assert False E + where False = <function does_everything at 0x7f1c09a0e8c8>() E + where <function does_everything at 0x7f1c09a0e8c8> = <module 'my_cool_project' from '/src/my_cool_project/__init__.py'>.does_everything tests/test_all.py:3: AssertionError ====================== 1 failed in 0.02 seconds ============================ $ nvim my_cool_project/__init__.py $ docker run --rm -it -v $(pwd):/src my-cool-image py.test /src ======================= test session starts ================================ platform linux -- Python 3.5.2, pytest-3.0.4, py-1.4.31, pluggy-0.4.0 rootdir: /src, inifile: collected 1 items tests/test_all.py . ====================== 1 passed in 0.01 seconds ============================
Happy hacking, folks!