38 kilobits = 4.7 kB, but yes. Here's the complete genome of the pandemic strain...

Florin_Andrei · on Oct 20, 2014

There's something fascinating about looking at that code, even though I can't decompile it just by reading the characters.

P.S.: It's 37918 bits, about 4.6 kB

trhway · on Oct 20, 2014

the notes to decoded pieces itself fascinating :

/note="immunosuppressive region; other site"

...

/note="transmembrane anchor; transmembrane region"

dnautics · on Oct 20, 2014

it's even more fascinating. (biochemist here) I think the immunosuppressive regions is a small protein called sGP which ebola causes the cell to produce a ton of. It's not attached to ebola virions, so in a way it's a DDOS on the immune system, by creating a foreign component that the immune system gets distracted by instead of going after ebola.

trhway · on Oct 20, 2014

following the doc link for

524..2671 /gene="NP" /note="Ebola nucleoprotein; Region: Ebola_NP; pfam05505" /db_xref="CDD:147601"

"http://www.ncbi.nlm.nih.gov/pubmed/9657001?dopt=Abstract"

"DNA vaccines expressing the envelope glycoprotein (GP) or nucleocapsid protein (NP) genes of Ebola virus were evaluated in adult, immunocompetent mice. The vaccines were delivered into the skin by particle bombardment of DNA-coated gold beads with the Powderject-XR gene gun. Both vaccines elicited antibody responses as measured by ELISA and elicited cytotoxic T cell responses as measured by chromium release assays. From one to four vaccinations with 0.5 microgram of the GP DNA vaccine resulted in a dose-dependent protection from Ebola virus challenge. Maximal protection (78% survival) was achieved after four vaccinations. "

The article is from 1998. Interesting why there is still no vaccine. And this is on the virus structure - GP envelope - with attached antibody from human survivor http://www-als.lbl.gov/index.php/contact/163-structure-of-th...

micro_cam · on Oct 21, 2014

There are a few vaccines that work in animals, they just haven't been funded to go through the trials needed to be widely used in humans. There is a good interview on Montana public radio with one of the leading researchers that touches on this:

http://mtpr.org/post/getting-closer-ebola-vaccine

(Rocky Mountain Lab in Montana is where a large portion of the ebola research in the nation if not world the world happens and mtpr had gotten some really good interviews as a result.)

hga · on Oct 21, 2014

Errr, right now one vaccine is being tested on humans in Stage I trials: http://www.niaid.nih.gov/news/newsreleases/2014/Pages/EbolaV...

I can remember seeing pictures of some Malians getting it.

Florin_Andrei · on Oct 20, 2014

> a small protein called sGP which ebola causes the cell to produce a ton of

Is that why people die? Cells make too much of that stuff?

cloakandswagger · on Oct 20, 2014

Correct. Victims bodies are tricked into generating large amounts of cells. This causes the hemorrhaging which is the ultimate cause of death.

Fun fact: The victim's body continues producing cells for several hours after death. In rare cases this (in conjunction with post-death bloating) causes the cadaver to "explode"

throwaway5752 · on Oct 20, 2014

That is false: http://en.wikipedia.org/wiki/Ebola_virus_disease#Pathophysio...

pdeuchler · on Oct 21, 2014

What if the noncoding dna sequences are comments and the genome is just extremely well documented FOSS?

sopooneo · on Oct 21, 2014

I bet they're more like commented out sections of old code, now so far out of date as to be incomprehensible, but perhaps a key to past approaches.

CatMtKing · on Oct 20, 2014

UCSC has the sequenced strains loaded in their genome browser, too, if it interests you

http://genome.ucsc.edu/cgi-bin/hgTracks?db=eboVir3

ddlatham · on Oct 20, 2014

Oops, thanks for the bits to bytes correction.

It is simply incredible to me that it could be so small (in an information sense).

This article (gzipped) is 22KB, more than 4 times as much information.

rads · on Oct 20, 2014

Yeah, but don't forget the complexity of the hardware you need to run ebola versus a website. Our bodies provide most of the implementation details.

ddlatham · on Oct 20, 2014

Absolutely, but that's true of most information as well. For example, the information in the article is relative to the context of understanding our language and the body of assumed knowledge and references of a reader of the New Yorker.

My understanding of biology is very limited. I've heard how physically small microbes are in every article out there. But never how information-small they are. Fascinating from a software developer's perspective.

I wonder what code golf for a virus would look like.

ObviousScience · on Oct 21, 2014

Like viriods, which make a virus look huge by comparison:

> Viroids are plant pathogens that consist of a short stretch (a few hundred nucleobases) of highly complementary, circular, single-stranded RNA. Viroid genomes are extremely small in size, ranging from 246 to 467 nucleotides (nt), and consisting of fewer than 10,000 atoms. In comparison, the genome of the smallest known viruses capable of causing an infection by themselves are around 2,000 nucleobases in size. The human pathogen hepatitis D virus is similar to viroids.

> Viroid RNA does not code for any protein. Their replication mechanism uses RNA polymerase II, a host cell enzyme normally associated with synthesis of messenger RNA from DNA, which instead catalyzes "rolling circle" synthesis of new RNA using the viroid's RNA as template. Some viroids are ribozymes, having catalytic properties which allow self-cleavage and ligation of unit-size genomes from larger replication intermediates.

Source: http://en.wikipedia.org/wiki/Viroid

In case you were wondering what a ribosyme is

> A ribozyme (ribonucleic acid enzyme) is an RNA molecule that is capable of catalyzing specific biochemical reactions, similar to the action of protein enzymes.

Source: http://en.wikipedia.org/wiki/Ribozyme

The action of ribozymes led to the RNA world hypothesis, as the mechanism for how you could have a simple system from which DNA and proteins can come as later optimizations on particular aspects. Some ribozymes are able to go as far as catalyze the building of their own RNA structure in the right environments (albeit, with limited success so far).

Florin_Andrei · on Oct 20, 2014

Right. It's so much context dependent that the "hardware" of a different animal may react very differently to it - perhaps even ignoring it altogether.

The genetic information is really the 4.8 kB of "code" PLUS the entire information already contained in the cellular hardware reading it. One doesn't make sense without the other.

At the very bottom, the whole thing depends on the laws of quantum mechanics in this universe, governing the minute details of molecular interaction. That, too, should be considered to go into the "code". Make a tiny change to the Plank constant, and the Zaire ebolavirus code will do something very different.

JoeAltmaier · on Oct 20, 2014

Thank you for saying that! Its so often repeated that our DNA contains the entire program for a human being. That's patently false. The cellular machinery provides almost all of the OS; DNA is just a script.

I liken DNA to a paper tape containing one of two punches: MAN or MOUSE. Feed it into a bio-replicator and get a man or a mouse. Does the paper tape define the man? Of course not.

ghkbrew · on Oct 20, 2014

> I liken DNA to a paper tape containing one of two punches: MAN or MOUSE

I don't think that really captures it. Yes, it requires external machinery to actually do anything, but DNA is much more information dense and carries much more of an exact definition of the organism to be produced.

Personally, I prefer the analogy of compiler source code. Sure, it can't do anything on its own. But it defines how an working external system (another compiler or an functional cellular environment) can produce a second possibly different system

JoeAltmaier · on Oct 20, 2014

Yet the dynamic biochemistry of the cell is orders of magnitude larger and more complex than DNA. So its larger than a paper tape, sure, but the comparison is pretty good really.

1457389 · on Oct 21, 2014

Paper tapes can't catalyze their own creation and modification. DNA can with the addition of ribonucleotides.

Retric · on Oct 20, 2014

Mitochondria for example are (mostly?) independent of your DNA.

cauterized · on Oct 21, 2014

And yet the paper tape of a plant or animal genome (as opposed to a virus) also contains the instructions for the OS and the bio-replicator, which is part of what's so fascinating about it.

JoeAltmaier · on Oct 21, 2014

I don't think that's accurate at all. The DNA has no effect on the cellular soup - the RNA etc - that are the bioreactor. That you got from some ancestral Eve. It changes perhaps over time, like anything else through random chance. But its independent of the DNA, which is a tiny part of the whole.

narrator · on Oct 20, 2014

The interpreter is ridiculously complicated in order to make up for the conciseness of the programming language.

dnautics · on Oct 20, 2014

you could make that argument for certain hardware optimizations, too (like SSE, GPU, PPU, etc). A computer is not a raw turing machine.

stonogo · on Oct 22, 2014

No, the article does not contain more information. It contains more data, but when dealing with genomics you must keep in mind the fact that various codons translate to various proteins, and each of the proteins serve various functions depending on their shape... and proteins can assume different shapes, which changes their effects. This is the importance of protein-folding research.

You hit near it when you gzipped the article -- consider a genome to be an incredibly compressed format, able to explode into a truly stunning amount of information, stored in a relative paucity of raw data.

makaed · on Oct 21, 2014

How did you get 38 kilobits?

thelamest · on Oct 21, 2014

18959 units of 4 possible values (nucleotides) is equal to 37918 units of 2 possible values aka bits.

on Oct 20, 2014

[dead]

rwallace · on Oct 20, 2014

In fairness, as things now stand, any nutcase who wanted to obtain Ebola with evil intent could do it more easily by hopping on a plane to West Africa than by reconstituting it from its genome. The threat is from the natural epidemic currently running wild, and making all relevant information freely available might help the search for cures and vaccines.

tedks · on Oct 21, 2014

It takes a serious amount of resources to search for a cure or vaccine; they could just have easily made it available on request without limiting the amount of people who could work on it.

The real threat is in 10/20/100 years in the future, where someone with a desktop bioprinter decides to fuck up a subway station.

mbreese · on Oct 21, 2014

Requests require committees. Committees require meetings. Meetings take time.

The more red tape you put up, the harder it is for people to get to work on this.

If someone has the ability to print out a working copy of ebola in 10/20/100 years, someone else will have the ability to print out working antibodies. I'd be less concerned about some potential future risk and more concerned with getting out of the way of people who are working on Ebola research right now.

Plus, if some nut job wanted to print out some Ebola with a hypothetical bioprinter, they'd probably end up infecting themselves as well.

tedks · on Oct 21, 2014

> Requests require committees. Committees require meetings. Meetings take time.

This is totally false. Plenty of research material is "distributed on request" where the request is just an email and the validation is just checking that the email comes from an academic domain.

It doesn't really matter if someone can print out antibodies (hypothetically). That won't help the people already killed by the ebola.

jonknee · on Oct 20, 2014

Probably not. If you're sophisticated and maniacal enough to create and deliver a bio weapon you probably don't need the genome source dump from GitHub. There are trillions of copies of Ebola in the meatspace after all.