Hi to all,
is there a reason why the 0.7.0 hbase addons is not deployed on maven central? Thanks in advance, Flavio |
Hi, No, there is no reason for that. It actually seems like something went wrong while releasing Flink 0.7.0. I'll deploy the missing artifacts.On Thu, Oct 30, 2014 at 9:26 AM, Flavio Pompermaier <[hidden email]> wrote:
|
Ok thanks!I was trying to run a mapreduce flink job using an hbase dataset but I wasn't able to make it run it locally. The one in the addons just specify a plan but it does not say how to test it. On Oct 30, 2014 11:58 PM, "Robert Metzger" <[hidden email]> wrote:
|
Okay, I've deployed the missing artifacts to maven central. it will take some hours until they are synchronized. The example in the "flink-hbase" module is still using the old Java API. But you should be able to use the Hbase Input format like this: ExecutionEnvironment ee = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Record> t = ee.createInput(new MyTableInputFormat()); I think the Flink Hbase module is not very well-tested, so its likely that you'll find issues while using it. On Thu, Oct 30, 2014 at 4:10 PM, Flavio Pompermaier <[hidden email]> wrote:
|
We are trying to connect to HBase 0.98 so we'll probably contribute to the HBase addon :)
Is there a count API for Dataset? What is the fastest way to run a count on a dataset? Best, Flavio On Fri, Oct 31, 2014 at 6:19 AM, Robert Metzger <[hidden email]> wrote:
|
Hi Flavio, right now, there is no dedicated count operator in the API. You can do the work-around with appending a 1 and summing it up (see Wordcount example [1]). This is also what a dedicated count operator would internally do. It would be awesome to get some contributions for the HBase addon :-) Best, Fabian 2014-10-31 9:46 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
I think that a count operator is very useful for people wanting to run an HelloWorld with Flink, it's always the first test I do (and with Spark that is very easy..) Best, Flavio On Fri, Oct 31, 2014 at 9:57 AM, Fabian Hueske <[hidden email]> wrote:
Flavio Pompermaier Phone: +(39) 0461 283 702 |
Agreed 100%. I created a JIRA for this: https://issues.apache.org/jira/browse/FLINK-1200 Flavio, would you like to give it a go? Otherwise I will assign it to myself On Fri, Oct 31, 2014 at 10:12 AM, Flavio Pompermaier <[hidden email]> wrote:
|
For this I don't have time, we're working on upgrade HBase to 0.98 APIs (and it's already working :)) However we should discuss about how to manage properly the version of hbase and its hadoop dependencies.. Best, Flavio On Fri, Oct 31, 2014 at 11:32 AM, Kostas Tzoumas <[hidden email]> wrote:
Flavio Pompermaier Phone: +(39) 0461 283 702 |
I was wrong. This feature is actually coming up and tracked here: https://issues.apache.org/jira/browse/FLINK-758 On Fri, Oct 31, 2014 at 1:14 PM, Flavio Pompermaier <[hidden email]> wrote:
|
Is it far from being released this feature?
On Fri, Oct 31, 2014 at 1:51 PM, Kostas Tzoumas <[hidden email]> wrote:
|
The current implementation of HBase splitting policy cannot deal with region splitting during the job execution. Do you think it is possible to overcome this issue? On Fri, Oct 31, 2014 at 2:22 PM, Flavio Pompermaier <[hidden email]> wrote:
Flavio Pompermaier Phone: +(39) 0461 283 702 |
My pul;l request seems to build correctly right now, except a case (PROFILE="-Dhadoop.profile=2 -Dhadoop.version=2.2.0") where Travis stops the job during the tests saying:
No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself. The build has been terminated Can someone help me finalizing this PR? I also removed some classes that I think were obsolete right now (i.e. GenericTableOutputFormat,HBaseUtil and HBaseDataSink). On Fri, Oct 31, 2014 at 5:04 PM, Flavio Pompermaier <[hidden email]> wrote:
|
Hi Flavio! Here are a few comments: - Concerning the count operator: I think we can hack this in a very simple way. Would be good to spend a few thought cycles on keeping the API consistent, though. Flink does not pull data back to the client as eagerly as Spark, but leaves it in the cluster more. That has paid off in various situations. Let me draft a proposal how to include such operations in the next days. I think we can have this very soon. - Concerning the Region Splitting: Can you elaborate a little bit on that and give a few more details about the problem? In general, the input splitting in Flink happens when the job is started and the splits are dynamically assigned to the sources as the job runs. You can customize all that behavior by overwriting the two methods "createInputSplits" and "getInputSplitAssigner" in the input format. - Concerning the pull request: There are sometimes build stalls on Travis that no one has encountered outside Travis so far. Not exactly sure what causes them, but if that happens for one build and the others work, I would consider the pull request passed. Greetings, Stephan On Sat, Nov 1, 2014 at 2:03 PM, Flavio Pompermaier <[hidden email]> wrote:
|
In reply to this post by Flavio Pompermaier
Hi Flavio, we frequently get this type of error on Travis. It does not indicate that there is something wrong with your PR. We can just start the build again and it will probably finish. I'm not so familiar with HBase bit would digest to continue the discussion on the Dev mailing list. Best, Fabian Am Samstag, 1. November 2014 schrieb Flavio Pompermaier :
|
In reply to this post by Stephan Ewen
See inline the answers On Nov 1, 2014 8:19 PM, "Stephan Ewen" <[hidden email]> wrote: Ok > - Concerning the Region Splitting: Can you elaborate a little bit on that and give a few more details about the problem? In general, the input splitting in Flink happens when the job is started and the splits are dynamically assigned to the sources as the job runs. You can customize all that behavior by overwriting the two methods "createInputSplits" and "getInputSplitAssigner" in the input format. I just wanted to known if and how regiom splitting is handled. Can you explain me in detail how Flink and HBase works?what is not fully clear to me is when computation is done by region servers and when data start flow to a Flink worker (that in ky test job is only my pc) and how ro undertsand better the important logged info to understand if my job is performing well > - Concerning the pull request: There are sometimes build stalls on Travis that no one has encountered outside Travis so far. Not exactly sure what causes them, but if that happens for one build and the others work, I would consider the pull request passed. It would be great to contribute. Just forgot to mention that one could specify different Hbase version on compile time using -Dhbase.version=0.98.xxxx |
Free forum by Nabble | Edit this page |