Hey all - I'm the release manager for Spark 1.1. Happy to answer any questions a...

hcrisp · on Sept 12, 2014

Good news about the PySpark input format improvements. Does that also cover reading complex Parquet datatypes into SchemaRDDs with their native datatypes? When can we get a Databricks Cloud account (I'm already on the waiting list)?

pwendell · on Sept 12, 2014

Yeah, you can load Parquet data directly into SchemaRDD's in 1.1 and get the type conversion, including use of nested types. That's the long term solution for all of our storage integration is to go through the SchemaRDD API since it's a standard type description and we expect many data sources to integrate there.

Re: databricks cloud - shoot me an e-mail and I'll see if I can help. Right now demand exceeds supply for us on accounts, but I can try!

ambrood · on Sept 12, 2014

Don't the SchemaRDD already support Parquet? Although it'd be great if they supported CSVs.

JoshRosen · on Sept 12, 2014

There's work in progress to support importing CSV data as SchemaRDDs:

https://issues.apache.org/jira/browse/SPARK-2360 https://github.com/apache/spark/pull/1351

century19 · on Sept 12, 2014

Any plans to allow GraphX to work with Spark Streaming DStreams?

pwendell · on Sept 12, 2014

You can call GraphX algorithms right now from within the Streaming API, for instance compute a graph on a windowed view of data.

Online graph algorithms aren't there yet (probably what you mean). We just started adding online MLlib algorithms, so this is the main focus for now.