Re: Error while reading binary file

Posted by Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Error-while-reading-binary-file-tp4759p4789.html

The SerializedInputFormat extends the BinaryInputFormat which expects a special block-wise encoding and certain metadata fields.
It is not suited to read arbitrary binary files such as a file with 64 short values.
I suggest to implement a custom input format based on FileInputFormat.

Best, Fabian

2016-02-08 22:05 GMT+01:00 Saliya Ekanayake <[hidden email]>:
Thank you, Fabian. It solved the compilation error, but at runtime I get an end-of-file exception. I've put up a sample code with data at Github https://github.com/esaliya/flinkit. The data file is a binary file containing 64 Short values.


02/08/2016 16:01:19 CHAIN DataSource (at main(WordCount.java:25) (org.apache.flink.api.common.io.SerializedInputFormat)) -> FlatMap (count())(4/8) switched to FAILED 
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.flink.core.memory.InputViewDataInputStreamWrapper.readShort(InputViewDataInputStreamWrapper.java:92)
at org.apache.flink.types.ShortValue.read(ShortValue.java:88)
at org.apache.flink.api.common.io.SerializedInputFormat.deserialize(SerializedInputFormat.java:37)
at org.apache.flink.api.common.io.SerializedInputFormat.deserialize(SerializedInputFormat.java:31)
at org.apache.flink.api.common.io.BinaryInputFormat.nextRecord(BinaryInputFormat.java:274)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:169)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

On Mon, Feb 8, 2016 at 3:50 PM, Fabian Hueske <[hidden email]> wrote:
Hi,

please try to replace
DataSet<ShortValue> ds = env.createInput(sif);
by
DataSet<ShortValue> ds = env.createInput(sif, ValueTypeInfo.SHORT_VALUE_TYPE_INFO);

Best, Fabian

2016-02-08 19:33 GMT+01:00 Saliya Ekanayake <[hidden email]>:
Till,

I am still having trouble getting this to work. Here's my code (https://github.com/esaliya/flinkit)

String binaryFile = "src/main/resources/sample.bin";
SerializedInputFormat<ShortValue> sif = new SerializedInputFormat<>();
sif.setFilePath(binaryFile);
DataSet<ShortValue> ds = env.createInput(sif);
System.out.println(ds.count());

I still get the same error as shown below

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.
at org.apache.flink.api.java.ExecutionEnvironment.createInput(ExecutionEnvironment.java:511)
at org.saliya.flinkit.WordCount.main(WordCount.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)


On Mon, Feb 8, 2016 at 5:42 AM, Till Rohrmann <[hidden email]> wrote:

Hi Saliya,

in order to set the file path for the SerializedInputFormat you first have to create it and then explicitly call setFilePath.

final SerializedInputFormat<Record> inputFormat = new SerializedInputFormat<Record>();
inputFormat.setFilePath(PATH_TO_FILE);

env.createInput(inputFormat, myTypeInfo);

Cheers,
Till


On Mon, Feb 8, 2016 at 7:00 AM, Saliya Ekanayake <[hidden email]> wrote:
Hi,

I was trying to read a simple binary file using SerializedInputFormat as suggested in a different thread, but encounters the following error. I tried to do what the exception suggests, but eventhough createInput() returns a DataSet object I couldn't find how to specify which file to read.

Any help is appreciated. The file I am trying to read is a simple binary file with containing java short values. Is there any example on reading binary files available?

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.

Thank you,
Saliya


--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell <a href="tel:812-391-4914" value="+18123914914" target="_blank">812-391-4914
http://saliya.org




--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell <a href="tel:812-391-4914" value="+18123914914" target="_blank">812-391-4914
http://saliya.org




--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell <a href="tel:812-391-4914" value="+18123914914" target="_blank">812-391-4914
http://saliya.org