XGBoost4J: Portable Distributed XGboost in Flink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

XGBoost4J: Portable Distributed XGboost in Flink

tqchen
Hi Flink Community:
    I am sending this email to let you know we just release XGBoost4J which also runs on Flink. In short, XGBoost is a machine learning package that is used by more than half of the machine challenge winning solutions and is already widely used in industry. The distributed version scale to billion examples(10x faster than spark.mllib in the experiment) with fewer resources (see .http://arxiv.org/abs/1603.02754)   

   See our blogpost for more details http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html  We would love to have you try it out and helo us to make it better.

Cheers
Reply | Threaded
Open this post in threaded view
|

Re: XGBoost4J: Portable Distributed XGboost in Flink

Till Rohrmann
Great to hear Tianqi :-) I will try it out.

Cheers,
Till

On Tue, Mar 15, 2016 at 12:41 AM, Tianqi Chen <[hidden email]> wrote:
Hi Flink Community:
    I am sending this email to let you know we just release XGBoost4J which also runs on Flink. In short, XGBoost is a machine learning package that is used by more than half of the machine challenge winning solutions and is already widely used in industry. The distributed version scale to billion examples(10x faster than spark.mllib in the experiment) with fewer resources (see .http://arxiv.org/abs/1603.02754)   

   See our blogpost for more details http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html  We would love to have you try it out and helo us to make it better.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: XGBoost4J: Portable Distributed XGboost in Flink

Christophe Salperwyck
In reply to this post by tqchen
Hi,

The paper compares the performance between your XGBoost and the Spark MLlib version. It would be nice to see how it scales when using Spark or Flink as an engine and also compare it to your native distributed version (with rabit, right?).

If you have some charts, they are welcome :-)

BTW, where did you submit this paper (if not confidential of course)?

Thanks!
Christophe


2016-03-15 0:41 GMT+01:00 Tianqi Chen <[hidden email]>:
Hi Flink Community:
    I am sending this email to let you know we just release XGBoost4J which also runs on Flink. In short, XGBoost is a machine learning package that is used by more than half of the machine challenge winning solutions and is already widely used in industry. The distributed version scale to billion examples(10x faster than spark.mllib in the experiment) with fewer resources (see .http://arxiv.org/abs/1603.02754)   

   See our blogpost for more details http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html  We would love to have you try it out and helo us to make it better.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: XGBoost4J: Portable Distributed XGboost in Flink

tqchen
Everything uses Rabit as communication engine(including Flink and Spark version), note this is not an re-implementation, but an effort to make one version available at all platforms. So the performance should be the same.  I did confirm this fact on the Spark version, but have not yet on Flink.