Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey all - I'm the release manager for Spark 1.1. Happy to answer any questions about Spark or this release.


Good news about the PySpark input format improvements. Does that also cover reading complex Parquet datatypes into SchemaRDDs with their native datatypes? When can we get a Databricks Cloud account (I'm already on the waiting list)?


Yeah, you can load Parquet data directly into SchemaRDD's in 1.1 and get the type conversion, including use of nested types. That's the long term solution for all of our storage integration is to go through the SchemaRDD API since it's a standard type description and we expect many data sources to integrate there.

Re: databricks cloud - shoot me an e-mail and I'll see if I can help. Right now demand exceeds supply for us on accounts, but I can try!


Don't the SchemaRDD already support Parquet? Although it'd be great if they supported CSVs.


There's work in progress to support importing CSV data as SchemaRDDs:

https://issues.apache.org/jira/browse/SPARK-2360 https://github.com/apache/spark/pull/1351


Any plans to allow GraphX to work with Spark Streaming DStreams?


You can call GraphX algorithms right now from within the Streaming API, for instance compute a graph on a windowed view of data.

Online graph algorithms aren't there yet (probably what you mean). We just started adding online MLlib algorithms, so this is the main focus for now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: