Flink first project

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink first project

Georg Heiler
New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg
Reply | Threaded
Open this post in threaded view
|

Re: Flink first project

Jörn Franke
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <[hidden email]> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg
Reply | Threaded
Open this post in threaded view
|

Re: Flink first project

Georg Heiler
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <[hidden email]> schrieb am So., 23. Apr. 2017 um 18:59 Uhr:
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <[hidden email]> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg
Reply | Threaded
Open this post in threaded view
|

Re: Flink first project

Georg Heiler
Wouldn't adding flume -> Kafka -> flink also introduce additional latency?

Georg Heiler <[hidden email]> schrieb am So., 23. Apr. 2017 um 20:23 Uhr:
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <[hidden email]> schrieb am So., 23. Apr. 2017 um 18:59 Uhr:
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <[hidden email]> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg
Reply | Threaded
Open this post in threaded view
|

Re: Flink first project

Tzu-Li (Gordon) Tai
Hi Georg,

Simply from the aspect of a Flink source that listens to a REST endpoint for input data, there should be quite a variety of options to do that. The Akka streaming source from Bahir should also serve this purpose well. It would also be quite straightforward to implement one yourself.

On the other hand, what Jörn was suggesting was that you would want to first persist the incoming data from the REST endpoint to a repayable storage / queue, and your Flink job reads from that replayable storage / queue.
The reason for this is that Flink’s checkpointing mechanism for exactly-once guarantee relies on a replayable source (see [1]), and since a REST endpoint is not replayable, you’ll not be able to benefit from the fault-tolerance guarantees provided by Flink. The most popular source used with Flink for exactly-once, currently, is Kafka [2]. The only extra latency compared to just fetching REST endpoint, in this setup, is writing to the intermediate Kafka topic.

Of course, if you’re just testing around and just getting to know Flink, this setup isn’t necessary.
You can just start off with a source such as the Flink Akka connector in Bahir, and start writing your first Flink job right away :)

Cheers,
Gordon

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html

On 24 April 2017 at 4:02:14 PM, Georg Heiler ([hidden email]) wrote:

Wouldn't adding flume -> Kafka -> flink also introduce additional latency?

Georg Heiler <[hidden email]> schrieb am So., 23. Apr. 2017 um 20:23 Uhr:
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <[hidden email]> schrieb am So., 23. Apr. 2017 um 18:59 Uhr:
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <[hidden email]> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg
Reply | Threaded
Open this post in threaded view
|

Re: Flink first project

Georg Heiler
Thanks for the overview. I think I will use akka streams and pipe the result to kafka, then move on with flink.
Tzu-Li (Gordon) Tai <[hidden email]> schrieb am Do. 27. Apr. 2017 um 18:37:
Hi Georg,

Simply from the aspect of a Flink source that listens to a REST endpoint for input data, there should be quite a variety of options to do that. The Akka streaming source from Bahir should also serve this purpose well. It would also be quite straightforward to implement one yourself.

On the other hand, what Jörn was suggesting was that you would want to first persist the incoming data from the REST endpoint to a repayable storage / queue, and your Flink job reads from that replayable storage / queue.
The reason for this is that Flink’s checkpointing mechanism for exactly-once guarantee relies on a replayable source (see [1]), and since a REST endpoint is not replayable, you’ll not be able to benefit from the fault-tolerance guarantees provided by Flink. The most popular source used with Flink for exactly-once, currently, is Kafka [2]. The only extra latency compared to just fetching REST endpoint, in this setup, is writing to the intermediate Kafka topic.

Of course, if you’re just testing around and just getting to know Flink, this setup isn’t necessary.
You can just start off with a source such as the Flink Akka connector in Bahir, and start writing your first Flink job right away :)

Cheers,
Gordon

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html

On 24 April 2017 at 4:02:14 PM, Georg Heiler ([hidden email]) wrote:

Wouldn't adding flume -> Kafka -> flink also introduce additional latency?

Georg Heiler <[hidden email]> schrieb am So., 23. Apr. 2017 um 20:23 Uhr:
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <[hidden email]> schrieb am So., 23. Apr. 2017 um 18:59 Uhr:
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <[hidden email]> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg