Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Baseimage-docker, fat containers and “treating containers as VMs” (phusion.nl)
68 points by specto on Jan 20, 2015 | hide | past | favorite | 12 comments


I used baseimage-docker for a while but wasn't satisfied with the image size (around 300mb with nodejs installed), so I switched to debian as the base and removed stuff I didn't need such as syslog and openssh. That alone brought it down to 140mb.

However seeing as other people have managed to distribute their applications in images as small as 7mb (e.g. progrium/logspout), I decided to create one based on busybox with s6 as the process supervisor, with help from this article [1]. I'm now pretty happy with my 33mb nodejs environment :)

[1]: http://blog.tutum.co/2014/12/02/docker-and-s6-my-new-favorit...


Sounds like somewhat similar than what docker-nano was doing, see e.g. nano/node.js: https://registry.hub.docker.com/u/nano/node.js/

docker-nano is based on buildroot which is kinda automated way of doing that


I really like this idea, but it really only works for stuff you can build statically.

I'm wondering if there's a middle ground— for example, a busybox container which gives you python and pip, or workflow which lets you install a deb and all its dependencies into a container, without the container needing to itself have all the apt machinery and other bootstrap detritus on board.


You can bundle busybox with opkg and use that to install python 2.7 and pip. I tried it out myself (using progrium/busybox [1] as a base) and the image comes to 27mb, not bad.

[1]: https://github.com/progrium/busybox


Ah, that's fantastic. Thanks for the pointer!


Sounds very interesting, do you have an image in the docker hub?


I'll get around to putting up a GitHub repo soon, but for now you can pull gigablah/busybox-node. You can either override the entrypoint or daemonize it and use docker exec.


I don't use phusion/baseimage, but I'm not against it existing. I would, however, like to clarify a couple of things in your two blog posts today:

1. "we are the most popular third party image on the Docker Registry". This is true based on the # of stars, but that can be misleading when you look at the actual number of pulls. Don't get me wrong, phusion/baseimage is popular with about 230k pulls, but if you look around there are dozens of images with millions of downloads, so your claims are a bit misleading.

2. phusion/baseimage inherits from the official ubuntu:14.04 image. It adds a lot of things, and starts a number of services by default, so it is absolutely a "fat" VM-like container. I'm not against this at all, but I will point out that the Dockerfile Best Practices article (http://docs.docker.com/articles/dockerfile_best-practices/) is explicit that a container should kick off one process. If you look at how the highly curated official repositories function, none of them run more than one process or utilize a supervisor. Docker gives you the freedom to do whatever you want, but calling phusion/baseimage a "correct" way to do Docker conflicts with the official documentation. I totally get that phusion/baseimage has been around for a long time, and provides solutions to some common problems which may make adoption easier for some with legacy apps, but I would refrain from claiming that your solutions "gets everything right". By all means, use whatever works, just be aware that the best practices are clearly outlined on docs.docker.com, not the Phusion blog.


I'm almost done migrating all of my images over to baseimage. Most things I'm running come from debian packages, and are designed to run in a full unix environment as a service, so it's a lot easier to just make /etc/my_init/00_myservice.sh which sets up permissions for mounted volumes and calls "service myservice", then lets the OS handle the rest.

The image size doesn't bother me. Storage is cheap, and I can easily clear out the old cruft using "docker rmi $(docker images -q -f dangling=true)". Why should I care about 200mb in the age of multi-terabyte drives?


Storage is cheap and plentiful, yes. Can't always say the same for bandwidth or transfer.


If people really care about that then VMs wouldn't be so popular, and Docker wouldn't be either. Everybody would be using shared libraries in order to optimize away duplication as much as possible, instead of statically link things in order to make deployment easier.


Yes, because it's totally impossible for the popularity of VMs and Docker to be driven by people who don't have that constraint.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: