More

tfm · on April 29, 2017

Is that the actual URL? Seems to be missing /s

Marginally related: on cs.CL the other day was "Punny Captions: Witty Wordplay in Image Descriptions"[0]. A mashup of these two projects would bring us that much closer to the dream of Social Media In A Box.

[0] https://arxiv.org/abs/1704.08224

blopeur · on April 29, 2017

It's the URL provided in the paper (first-page bottom left)

tfm · on April 8, 2017

The idea is to reduce the amount of redundant copying of characters: you end up doing a few more concatenations in the outer loop, but the concatenations in the inner loop are of short strings.

Importantly, if you remove the restriction of the input list being "exactly 256 items", then the method is still quadratic.

A linear-time algorithm for this would copy each input character exactly once, which is effectively what the method based on array.tostring() does.

The chunk size of 16 is not as significant as the technique of constructing+concatenating chunks, although it is optimal for input length 256. In general I think you'd want a chunk size about the square-root of the expected input length, to minimise the number of copied characters.

EDIT: maths

Concatenating strings of length M and N is linear in O(M+N), because that's how many characters you're copying.

Number of characters copied if you construct a string of length N by concatenating one character at a time

  = (0+1) + (1+1) + (2+1) + (3+1) + ... + ((N-1)+1) + (N+1)
  = (N +1)*N / 2

Number of characters copied if you construct a string of length N by concatenating a chunk of length 16 each time

  = (0+16) + (16+16) + (32+16) + ... + ((N-16)+16)
  = 16*(0+1) + 16*(1+1) + 16*(2+1) + ...
  = 16 * (N/16 +1)*(N/16) / 2
         ^^^^^^^^^^^^^^^

This is where the "technique" comes in: although the algorithm is still quadratic you're effectively moving a constant factor out the front.

Note that you also have the cost of constructing the chunks each time, which becomes the dominant cost if you have too many chunks.

In general, if you have a length kM string which you construct from k chunks of length M, the number of characters copied

  = M * (k+1)*k / 2  +  k * (M+1)*M / 2

... which (rounding and integer constraints aside) is minimised for M = k, i.e. when the chunk size is the square root of the input length. Hence, for input length 256 we take chunk size 16.

bogomipz · on April 9, 2017

I see, that makes sense about reducing the constant. Interesting. Thanks for the great explanation.

tfm · on March 27, 2017

Yes, but with the caveat that you'll be running interpreted .py using a cross-compiled Python executable, rather than Python compiled to Android bytecode. Kivy itself (the project that spawned p4a) is a full cross-platform library for UI development; for any heavy computational lifting you'd want to farm it out to a C API library and import it the usual way, modulo some p4a recipe incantations.

tfm · on March 27, 2017

Fortunately the domain name is rather less ambiguous and primed me for the weak pun. I'm ... glad(?) that the wordplay survived the marketeering sessions and civil service vetting.

BTW: dodecahedron is the solid object, dodecagon is the 12-sided polygon. D&D style dice as currency would be pretty awesome but it might make it difficult to resolve dilemmas using pocket change.

tfm · on March 15, 2017

The bit that really stood out for me: """ He was never once bored. He wasn’t sure, he said, that he even understood the concept of boredom. """

Almost seems (too? as a consequence?) that he was lacking a type of curiosity, whether as an existing character trait, or something that developed from having to periodically use all his wiles to obtain the means of survival when surveilling the cabins.

tfm · on Jan 7, 2017

If we're limiting ourselves to deductive reasoning, then yes – the facts as stated do not give enough information to deduce that Greg must be white.

If instead we use abductive inference, we might seek the simplest and most likely explanation given our universe of observations. Sherlock Holmes was a big fan of abduction!

Much of real-world reasoning is abductive to a greater or lesser extent. There is a well-known joke about some motley band of engineers, logicians, mathematicians, statisticians, etc etc catching a train through the Highlands. They see a black sheep, the engineer says "look, all sheep in Scotland are black!", the statistician says "no, you can't say that – just that MOST sheep in Scotland are black", another says "no, we can only say that at least ONE sheep is black", another says "no, it's only black on at least one side", then the one you're stuck next to at the party says "you're all wrong, we can only say that at least one sheep in Scotland is black on at least one side at least some of the time". The last statement is fully deductive; the rest of them are abductive, and more-or-less useful.

Cybiote · on Jan 7, 2017

This is why I think the ability to ask good questions is a better indication of understanding and intelligence than the ability to generate answers.

As a gauge for how far we are from AI you can consider what sort of modeling capacity is required until an AI can ask, when presented with such a sequence: "What country is the swan from?" or, even more impressively: "Do you know where this took place and what country the swan's parents were from?" For the first question it would then abduce a color. Same for the second but perhaps it could include probabilities based on estimated number of each color and the genetics of swan color.

This post is a rotation meant to provide a better sense of scale for the problem at hand.

tfm · on Jan 7, 2017

> This is why I think the ability to ask good questions is a better indication of understanding and intelligence than the ability to generate answers.

Certainly! Synthesis rather than reformatting (or, more commonly, regurgitation). Analysis and abduction are more than just "put it in your own words". More useful too.

There is something of a rush on at the moment to generate chat-bots to replace FAQs. Every Slack/Fleep/Blern/Crank channel appears to have five or six memoisation bots. Seems to be largely a solved problem!

When we can start having bots that can be sensibly interrogated for a summary (or even a "hey, you've been away for several hours: here's the key points"), we can finally abandon the chatrooms and let the generative bots flood them with abductive content, and the precis bots can then ping you every couple of weeks when something important comes up.

kordless · on Jan 7, 2017

I would hypothesize abductive reasoning works better for collectives which accept mistakes as one means of learning. For today's AI, it might be better to ask for a bit of context from your observers before making conclusions.

"Am I in the United States around the first part of the 21st century?"

"Yes."

"Oh, how unfortunate - now I have to ask another question or you may think I'm not sentient."

tfm · on Jan 7, 2017

In this particular case it could be argued that Google's unofficial "don't be evil" motto explicitly framed this as a moral issue. Tongue-in-cheek, to be sure, but certainly an appeal to happy squishy emotions.

That said, RMS's stuff is showing up in a few posts today, and framing as a moral issue is certainly a large part of his standard toolkit. Considering any situation from an ethical viewpoint tends to make my eyes roll and skin crawl (ooh, the moral dimensions of public hygiene? yes please!), but appealing to hearts and minds is pretty basic old-school rhetorical technique.

If nothing else, coming from a technical background, it's occasionally useful to come at the problems from a different, squishier, direction.

tfm · on Dec 29, 2016

Lena handled it very gracefully. She gives a very quick synopsis of what her part of the talk would have been around the 41 minute mark.

The whole presentation was plagued by gremlins; the substance begins with Sönke Iversen's part of the talk around 13:45.

tfm · on Dec 1, 2016

Inasmuch as there is blame to be apportioned in this case, it's due to JavaScript / ECMAScript having broad definitions of acceptable variable names, and (arguably[0]) the fact that browser JS implementations will generally accept arbitrary 8-bit data within multiline comments, rather than the strict Unicode code units specified by ECMAScript.

JPEG comments exist for the same reason that EXIF tags exist – it's handy to store metadata alongside the image data, it gets copied around when the file gets copied, the tags can be transferred if the image gets re-encoded. There are enough error recovery mechanisms built into browsers that one could likely make a polyglot by just abusing the data segment, maybe even while crafting a legitimate standards-compliant JPEG.

Ultimately, bytes are bytes! Interpreting them with a variety of content types can give a variety of results, so keep it in mind.

[0] Resynchronisation / recovery from bit errors is one of the explicit motivations behind the design of Unicode encodings, so the browsers get a pass from me on this one. It's almost certainly possible to craft a suitable JPEG using legitimate code points anyway.

tfm · on Aug 26, 2016

Alas, there's no guarantee that the order of key/value pairs within an object will be preserved (not even within JS anymore!). But that's okay, we can always reformat the ordered data as an array or write a custom parser ... oh.

I don't see any reason to include parsing directives in JSON either, but it's a wild world out there and people do all sorts of strange things. Seems that a few of those folks were working at Yahoo and made the mistake of letting Doug see their code when JSON was in prototype phase, so no JSON comments for anyone. Whoops!