One of the classic motivating examples (which is addressed in the 1ML paper) goes like this:
Both trees and hash tables can be used to implement map-like data structures. You could, in ML, have an abstract signature for Map types that gets implemented sometimes by a concrete TreeMap implementation and sometimes by a concrete HashMap implementation. HashMap would be a better choice if the expected number of entries is larger, and the TreeMap would be better if the expected number of entries is smaller, so what we'd like to do is write something like
module Map = if size > threshold then HashMap else TreeMap
so that we choose what concrete implementation we want at runtime—but we can't really do that, because the language we use to talk about modules in ML is distinct from the language we use to talk about values. That is to say, the size > threshold part can't coexist in the same expression with the HashMap and TreeMap part. Some MLs have added the ability to wrap modules in values, so you can write it this in OCaml:
but it's a bit awkward because of the explicit moving-back-and-forth between value-level and module-level, and the interaction between the two languages has some rough edges (which the paper explains more thoroughly, if you're interested.)
The motivation for 1ML is that we'd like to use the same language to talk about both modules and values. That way, we could write the first, simpler definition without having to worry about the fact that we're manipulating distinct 'things'. Of course, there are other tradeoffs involved in the 1ML solution, but it's an interesting, compelling experiment.
how does that end up working for typechecking? does the Map module end up having all the properties of the intersection of these elements?
The Map example seems pretty easily solvable via Haskell through typeclasses or the more standard OOP-y languages through interfaces, but am I missing something? Does SML not have the tools for this right now?
There are very much ways of expressing that computation in SML—or in Haskell or other languages—but not in the language of modules. Haskell typeclasses would have a similar problem. Consider the following pseudocode:
class Map m where {- some implementation -}
instance Map HashMap where {- ... -}
instance Map TreeMap where {- ... -}
mkMap :: Int -> {- ??? -}
mkMap size | size > threshold = newHashMap
| otherwise = newTreeMap
What is the type of mkMap? It won't actually type-check in Haskell. It can't be (Map m) => m, because it needs to have a concrete type. We could do some kind of type-level work, relect the size variable into the type level, and have a type family that decides whether the resulting type is a HashMap or a TreeMap—but that has the same problem as the OCaml example, where we're using a different language (the language of type-level computation) to talk about something that'd be simple with value-level programming.
One place where you could write this easily is in a dynamically typed language:
That is exactly why this kind of research is interesting: we'd like to capture the flexibility of idioms that dynamic languages can sometimes afford us, with the extra safety and security guarantees of a static type system!
That type means something slightly different: (Map m) => Int -> m doesn't mean that it returns some Map type, it means it returns any Map type. We could write a function of that signature, but that would imply that the caller/context would choose what type of Map it returns (and so it would not depend on the integer argument.)
Think about what it means if a type variable has no typeclass constraint: the type t isn't just a specific unknown type, it's literally any type. Similarly, C t => t isn't a specific unknown instance of a typeclass, it's any instance of that typeclass. If we tell Haskell that we're returning an instance C t => t and then try to give some concrete type, Haskell won't allow it, because it wants that type to encompass any possible type that implements C, not just the one we happened to use.
One way of making the above code work in Haskell is to use an existential type, which expresses that we're talking about some (not any) map type, but we'd have to wrap it inside a different constructor:
-- this expresses that a SomeMap constructor contains some kind
-- if Map, but we can't tell which one:
data SomeMap = SomeMap (forall m. Map m => m)
-- We can now express this by hiding the choice of Map inside the
-- SomeMap type:
mkMap :: Int -> SomeMap
mkMap size | size > threshold = SomeMap newHashMap
| otheriwse = SomeMap newTreeMap
which is, again, a solution with some advantages and some disadvantages.
We could in fact write mkMap in Idris with very little change, and give it a proper type:
-- This might not be the best/cleanest way of writing it, as my
-- Idris is a little rusty, but this will work:
mkMap : (size: Int) -> (if size > threshold then HashMap else TreeMap)
mkMap size with (size > threshold)
| True = newHashMap
| False = newTreeMap
So yes, although again, other tradeoffs are made to allow this kind of radical expressiveness.
Both trees and hash tables can be used to implement map-like data structures. You could, in ML, have an abstract signature for Map types that gets implemented sometimes by a concrete TreeMap implementation and sometimes by a concrete HashMap implementation. HashMap would be a better choice if the expected number of entries is larger, and the TreeMap would be better if the expected number of entries is smaller, so what we'd like to do is write something like
so that we choose what concrete implementation we want at runtime—but we can't really do that, because the language we use to talk about modules in ML is distinct from the language we use to talk about values. That is to say, the size > threshold part can't coexist in the same expression with the HashMap and TreeMap part. Some MLs have added the ability to wrap modules in values, so you can write it this in OCaml: but it's a bit awkward because of the explicit moving-back-and-forth between value-level and module-level, and the interaction between the two languages has some rough edges (which the paper explains more thoroughly, if you're interested.)The motivation for 1ML is that we'd like to use the same language to talk about both modules and values. That way, we could write the first, simpler definition without having to worry about the fact that we're manipulating distinct 'things'. Of course, there are other tradeoffs involved in the 1ML solution, but it's an interesting, compelling experiment.