Building Flink on VirtualBox VM failing

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Building Flink on VirtualBox VM failing

Juha Mynttinen-2

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

r_khachatryan
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

Juha Mynttinen-2
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

r_khachatryan
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

Juha Mynttinen-2
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

r_khachatryan
I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <[hidden email]> wrote:
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

Juha Mynttinen-2
Hi,

You're right, I thought about this also after writing the last comment - for example on Linux, the Kernel by default overcommits memory allocations and this approach doesn't work (doesn't make JVM crash right when it starts).

I dug a little deeper. It seems that for ci-environments there are specific compilation scripts such as https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that explicitly set flink.forkCount and flink.forkCountTestPackage to lower than (?) default values. But for anybody compiling Flink locally, mvn uses the default values, which might not work, as in my case.

I think a good goal would be that a developer can just git clone Flink and build it following simple instructions. Preferably there would be zero setup needed, just a simple command to run. The current situation is that building Flink is "simple", just run a specific mvn command. This simplicity comes with the price that things can break in unexpected ways:

1) There are things building Flink expects but doesn't check (https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink)
 * The correct Maven version
*  A suitable Java version
2) There's this issue with the count of CPU cores vs available mem.

The case 1) is documented, case 2) is not. 

Fix options

a)

Document case 2) and instruct how to set flink.forkCountTestPackage (if needed). Something like "Flink tests are run on parallel JVMs, each taking 2GB of RAM. There are by default as many JVMs as there are physical cores. If your machine doesn't have at least 2GB * count of cores of RAM, the tests can fail. You can set the count of JVMs using Maven property flink.forkCountTestPackage to a lower value".

b)

Create a Linux specific Maven wrapper script for local execution too. The wrapper script could download the correct Maven version, check the Java version, calculate the max number of forks etc. A quick way to calculate the max fork count 

expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152

Regards,
Juha





El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<[hidden email]>) escribió:
I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <[hidden email]> wrote:
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

Juha Mynttinen-2
Hmm

Even when setting the forkcounts to 1 things fail.

I wonder why there seem to be five of these JVM crashes. There should be one JVM at time. And Maven should fail after the 1st fail?

~/apache-maven-3.2.5/bin/mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:13 h
[INFO] Finished at: 2020-10-21T12:26:16+03:00
[INFO] Final Memory: 205M/704M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests



flink-tests/target/surefire-reports/2020-10-21T11-13-24_791-jvmRun1.dump

# Created at 2020-10-21T12:03:51.559
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-21T12:03:51.560
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 935338
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)



sudo less -n /var/log/kern.log
......
Oct 21 12:21:57 ubuntu kernel: [24024.569633] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=1220764,uid=1000
Oct 21 12:21:57 ubuntu kernel: [24024.569804] Out of memory: Killed process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0
Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Regards,
Juha

El mié., 21 oct. 2020 a las 10:04, Juha Mynttinen (<[hidden email]>) escribió:
Hi,

You're right, I thought about this also after writing the last comment - for example on Linux, the Kernel by default overcommits memory allocations and this approach doesn't work (doesn't make JVM crash right when it starts).

I dug a little deeper. It seems that for ci-environments there are specific compilation scripts such as https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that explicitly set flink.forkCount and flink.forkCountTestPackage to lower than (?) default values. But for anybody compiling Flink locally, mvn uses the default values, which might not work, as in my case.

I think a good goal would be that a developer can just git clone Flink and build it following simple instructions. Preferably there would be zero setup needed, just a simple command to run. The current situation is that building Flink is "simple", just run a specific mvn command. This simplicity comes with the price that things can break in unexpected ways:

1) There are things building Flink expects but doesn't check (https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink)
 * The correct Maven version
*  A suitable Java version
2) There's this issue with the count of CPU cores vs available mem.

The case 1) is documented, case 2) is not. 

Fix options

a)

Document case 2) and instruct how to set flink.forkCountTestPackage (if needed). Something like "Flink tests are run on parallel JVMs, each taking 2GB of RAM. There are by default as many JVMs as there are physical cores. If your machine doesn't have at least 2GB * count of cores of RAM, the tests can fail. You can set the count of JVMs using Maven property flink.forkCountTestPackage to a lower value".

b)

Create a Linux specific Maven wrapper script for local execution too. The wrapper script could download the correct Maven version, check the Java version, calculate the max number of forks etc. A quick way to calculate the max fork count 

expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152

Regards,
Juha





El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<[hidden email]>) escribió:
I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <[hidden email]> wrote:
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

Juha Mynttinen-2
I'm trying again running the tests, now I have four cores (previously five) and 12 GB RAM (previously 8 GB). I'm still hit by the OOM killer.

The command I'm running is:

mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:17 h
[INFO] Finished at: 2020-10-23T15:36:50+03:00
[INFO] Final Memory: 180M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

This means there should be only the parent JVM + the forked JVM running on the VM. There should be a lot of RAM available

/var/log/kern.log


Oct 23 15:26:42 ubuntu kernel: [23021.120464] Tasks state (memory values in pages):
Oct 23 15:26:42 ubuntu kernel: [23021.120464] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
....
Oct 23 15:26:42 ubuntu kernel: [23021.120574] [ 460994]  1000 460994  3319485  2440960 22024192        0             0 java
Oct 23 15:26:42 ubuntu kernel: [23021.120575] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=460994,uid=1000
Oct 23 15:26:42 ubuntu kernel: [23021.120669] Out of memory: Killed process 460994 (java) total-vm:13277940kB, anon-rss:9763848kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21508kB oom_score_adj:0
Oct 23 15:26:42 ubuntu kernel: [23021.406205] oom_reaper: reaped process 460994 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It seems very odd to me that the process takes 13277940kB of virtual mem and 9763848kB of anon-rss. Or maybe I'm reading something wrong.

r,
Juha

El mié., 21 oct. 2020 a las 12:54, Juha Mynttinen (<[hidden email]>) escribió:
Hmm

Even when setting the forkcounts to 1 things fail.

I wonder why there seem to be five of these JVM crashes. There should be one JVM at time. And Maven should fail after the 1st fail?

~/apache-maven-3.2.5/bin/mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:13 h
[INFO] Finished at: 2020-10-21T12:26:16+03:00
[INFO] Final Memory: 205M/704M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests



flink-tests/target/surefire-reports/2020-10-21T11-13-24_791-jvmRun1.dump

# Created at 2020-10-21T12:03:51.559
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-21T12:03:51.560
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 935338
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)



sudo less -n /var/log/kern.log
......
Oct 21 12:21:57 ubuntu kernel: [24024.569633] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=1220764,uid=1000
Oct 21 12:21:57 ubuntu kernel: [24024.569804] Out of memory: Killed process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0
Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Regards,
Juha

El mié., 21 oct. 2020 a las 10:04, Juha Mynttinen (<[hidden email]>) escribió:
Hi,

You're right, I thought about this also after writing the last comment - for example on Linux, the Kernel by default overcommits memory allocations and this approach doesn't work (doesn't make JVM crash right when it starts).

I dug a little deeper. It seems that for ci-environments there are specific compilation scripts such as https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that explicitly set flink.forkCount and flink.forkCountTestPackage to lower than (?) default values. But for anybody compiling Flink locally, mvn uses the default values, which might not work, as in my case.

I think a good goal would be that a developer can just git clone Flink and build it following simple instructions. Preferably there would be zero setup needed, just a simple command to run. The current situation is that building Flink is "simple", just run a specific mvn command. This simplicity comes with the price that things can break in unexpected ways:

1) There are things building Flink expects but doesn't check (https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink)
 * The correct Maven version
*  A suitable Java version
2) There's this issue with the count of CPU cores vs available mem.

The case 1) is documented, case 2) is not. 

Fix options

a)

Document case 2) and instruct how to set flink.forkCountTestPackage (if needed). Something like "Flink tests are run on parallel JVMs, each taking 2GB of RAM. There are by default as many JVMs as there are physical cores. If your machine doesn't have at least 2GB * count of cores of RAM, the tests can fail. You can set the count of JVMs using Maven property flink.forkCountTestPackage to a lower value".

b)

Create a Linux specific Maven wrapper script for local execution too. The wrapper script could download the correct Maven version, check the Java version, calculate the max number of forks etc. A quick way to calculate the max fork count 

expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152

Regards,
Juha





El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<[hidden email]>) escribió:
I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <[hidden email]> wrote:
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha

Reply | Threaded
Open this post in threaded view
|

Re: Building Flink on VirtualBox VM failing

r_khachatryan
The values printed by the OOM killer seem indeed strange. But from the line above the memory usage seems fine: rss=2440960.
Running the given command I see only one forked process.
Probably, this is an issue of OOM killer running in VM on Wwindows host. Can you try with OOM killer disabled?

Regards,
Roman


On Fri, Oct 23, 2020 at 3:02 PM Juha Mynttinen <[hidden email]> wrote:
I'm trying again running the tests, now I have four cores (previously five) and 12 GB RAM (previously 8 GB). I'm still hit by the OOM killer.

The command I'm running is:

mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:17 h
[INFO] Finished at: 2020-10-23T15:36:50+03:00
[INFO] Final Memory: 180M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

This means there should be only the parent JVM + the forked JVM running on the VM. There should be a lot of RAM available

/var/log/kern.log


Oct 23 15:26:42 ubuntu kernel: [23021.120464] Tasks state (memory values in pages):
Oct 23 15:26:42 ubuntu kernel: [23021.120464] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
....
Oct 23 15:26:42 ubuntu kernel: [23021.120574] [ 460994]  1000 460994  3319485  2440960 22024192        0             0 java
Oct 23 15:26:42 ubuntu kernel: [23021.120575] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=460994,uid=1000
Oct 23 15:26:42 ubuntu kernel: [23021.120669] Out of memory: Killed process 460994 (java) total-vm:13277940kB, anon-rss:9763848kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21508kB oom_score_adj:0
Oct 23 15:26:42 ubuntu kernel: [23021.406205] oom_reaper: reaped process 460994 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It seems very odd to me that the process takes 13277940kB of virtual mem and 9763848kB of anon-rss. Or maybe I'm reading something wrong.

r,
Juha

El mié., 21 oct. 2020 a las 12:54, Juha Mynttinen (<[hidden email]>) escribió:
Hmm

Even when setting the forkcounts to 1 things fail.

I wonder why there seem to be five of these JVM crashes. There should be one JVM at time. And Maven should fail after the 1st fail?

~/apache-maven-3.2.5/bin/mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:13 h
[INFO] Finished at: 2020-10-21T12:26:16+03:00
[INFO] Final Memory: 205M/704M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp surefire_11744637775482284170691tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp surefire_11923880479826081497266tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests



flink-tests/target/surefire-reports/2020-10-21T11-13-24_791-jvmRun1.dump

# Created at 2020-10-21T12:03:51.559
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-21T12:03:51.560
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 935338
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)



sudo less -n /var/log/kern.log
......
Oct 21 12:21:57 ubuntu kernel: [24024.569633] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=1220764,uid=1000
Oct 21 12:21:57 ubuntu kernel: [24024.569804] Out of memory: Killed process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0
Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Regards,
Juha

El mié., 21 oct. 2020 a las 10:04, Juha Mynttinen (<[hidden email]>) escribió:
Hi,

You're right, I thought about this also after writing the last comment - for example on Linux, the Kernel by default overcommits memory allocations and this approach doesn't work (doesn't make JVM crash right when it starts).

I dug a little deeper. It seems that for ci-environments there are specific compilation scripts such as https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that explicitly set flink.forkCount and flink.forkCountTestPackage to lower than (?) default values. But for anybody compiling Flink locally, mvn uses the default values, which might not work, as in my case.

I think a good goal would be that a developer can just git clone Flink and build it following simple instructions. Preferably there would be zero setup needed, just a simple command to run. The current situation is that building Flink is "simple", just run a specific mvn command. This simplicity comes with the price that things can break in unexpected ways:

1) There are things building Flink expects but doesn't check (https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink)
 * The correct Maven version
*  A suitable Java version
2) There's this issue with the count of CPU cores vs available mem.

The case 1) is documented, case 2) is not. 

Fix options

a)

Document case 2) and instruct how to set flink.forkCountTestPackage (if needed). Something like "Flink tests are run on parallel JVMs, each taking 2GB of RAM. There are by default as many JVMs as there are physical cores. If your machine doesn't have at least 2GB * count of cores of RAM, the tests can fail. You can set the count of JVMs using Maven property flink.forkCountTestPackage to a lower value".

b)

Create a Linux specific Maven wrapper script for local execution too. The wrapper script could download the correct Maven version, check the Java version, calculate the max number of forks etc. A quick way to calculate the max fork count 

expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152

Regards,
Juha





El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<[hidden email]>) escribió:
I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <[hidden email]> wrote:
Hey,

> Currently, tests do not run in parallel  

I don't think this is true, at least 100%. In 'top' it's clearly visible that there are multiple JVMs. If not running tests in parallel, what are these doing? In the main pom.xml there's configuration for the plug-in 'maven-surefire-plugin'.

I'm not a Maven expert, but it looks to me like this: in https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html it says "The other possibility for parallel test execution is setting the parameter forkCount to a value higher than 1". I think that's happening in Flink:

<forkCount>${flink.forkCount}</forkCount>

And

<flink.forkCount>1C</flink.forkCount>

This means there's gonna be 1 * count_of_cpus forks.

And this one:

<argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} -XX:+UseG1GC</argLine>

In my case, I have 5 CPUs, so 5 forks. I think what now happens is that since each fork gets max 2048m heap, there's kind of mem requirement of CPU count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * 2048mb. 

This could be better..... I think it's a completely valid computer that has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores and put 16 GB of RAM there. At least memory & CPU requirements should be documented? 

If the tests really need 2GB of heap, then maybe the forkCount should be based on the available RAM rather than available cores, e.g. floor(RAM / 2GB)? I don't if that's doable in Maven.... 

I think an easy and non-intrusive improvement would be to change ' -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate right away 2048mb (when it starts). If there's not enough memory, the tests would fail immediately (JVM couldn't start). The tests would probably fail anyways (my case) - better fail fast..

Regards,
Juha








El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<[hidden email]>) escribió:
Thanks for sharing this,
I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). 
High CPU usage can only be a consequence of it, being constant GC.

Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism).
So I think the best way to deal with this is to use VM with more memory.

Regards,
Roman


On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <[hidden email]> wrote:
Hey,

Good hint that /var/log/kern.log. This time I can see this:

Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[hidden email],task=java,pid=270024,uid=1000
Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The next question is why does this happen.... I'll try to dig deeper.

About the CPU load. I have five CPUs. Theoretically it makes sense to run five tests at time to max out the CPUs. However, when I look at what the five Java processes (that MVN forks) are doing, it can be seen that each of those processes have a large number of threads wanting to use CPU. Here's an example from 'top -H'

  top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5 buff/cache
MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                            
 254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41 C2 CompilerThre                                                                                                                    
 255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78 java                                                                                                                                
 254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16 java                                                                                                                                
 255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90 java                                                                                                                                
 255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78 java                                                                                                                                
 254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26 C2 CompilerThre                                                                                                                    
 253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47 C2 CompilerThre                                                                                                                    
 254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76 java                                                                                                                                
 254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67 java                                                                                                                                
 254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82 C2 CompilerThre                                                                                                                    
 255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51 C2 CompilerThre                                                                                                                    
 255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62 C2 CompilerThre                                                                                                                    
 255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47 C2 CompilerThre                                                                                                                    
 254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76 C2 CompilerThre                                                                                                                    
 253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63 java                                                                                                                                
 255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39 C1 CompilerThre                                                                                                                    
 255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37 C1 CompilerThre                                                                                                                    
 254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22 C2 CompilerThre                                                                                                                    
 254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30 C1 CompilerThre                                                                                                                    
 255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42 C1 CompilerThre                                                                                                                    
 253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10 C1 CompilerThre                                                                                                                    
 255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21 C2 CompilerThre                                                                                                                    
 254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62 C1 CompilerThre                                                                                                                    
 254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45 C1 CompilerThre                                                                                                                    
 254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64 C1 CompilerThre                                                                                                                    
 254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15 flink-akka.acto                                                                                                                    
                                                            
It can be seen that the JIT related threads consume quite a lot of CPU, essentially leaving less CPU available to the actual test code. By using htop I can also see the garbage collection related threads eating CPU. This doesn't seem right. I think it'd make sense to run the tests with less parallelism to better utilize the CPUs. Having greatly more threads wanting CPU slows things down (not speed up).

However, AFAIK high CPU load shouldn't trigger OOM-killer?

Regards,
Juha




El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<[hidden email]>) escribió:
Hey,

One reason could be that a resource-intensive test was killed by oom killer. You can inspect /var/log/kern.log for the related messages in your VM.

Regards,
Roman


On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <[hidden email]> wrote:

Hey,

I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.

The command I'm using:

apache-maven-3.2.5/bin/mvn clean verify

The output:

[INFO] Flink : Tests ...................................... FAILURE [14:38 min]
[INFO] Flink : Streaming Scala ............................ SKIPPED
[INFO] Flink : Connectors : HCatalog ...................... SKIPPED
[INFO] Flink : Connectors : Base .......................... SKIPPED
[INFO] Flink : Connectors : Files ......................... SKIPPED
[INFO] Flink : Table : .................................... SKIPPED
[INFO] Flink : Table : Common ............................. SKIPPED
[INFO] Flink : Table : API Java ........................... SKIPPED
[INFO] Flink : Table : API Java bridge .................... SKIPPED
[INFO] Flink : Table : API Scala .......................... SKIPPED
[INFO] Flink : Table : API Scala bridge ................... SKIPPED
[INFO] Flink : Table : SQL Parser ......................... SKIPPED
[INFO] Flink : Libraries : ................................ SKIPPED
[INFO] Flink : Libraries : CEP ............................ SKIPPED
[INFO] Flink : Table : Planner ............................ SKIPPED
[INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
[INFO] Flink : Table : Runtime Blink ...................... SKIPPED
[INFO] Flink : Table : Planner Blink ...................... SKIPPED
[INFO] Flink : Metrics : JMX .............................. SKIPPED
[INFO] Flink : Formats : .................................. SKIPPED
[INFO] Flink : Formats : Json ............................. SKIPPED
[INFO] Flink : Connectors : Kafka base .................... SKIPPED
[INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
[INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
[INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
[INFO] Flink : Connectors : HBase base .................... SKIPPED
[INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
[INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
[INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
[INFO] Flink : Formats : Orc .............................. SKIPPED
[INFO] Flink : Formats : Orc nohive ....................... SKIPPED
[INFO] Flink : Formats : Avro ............................. SKIPPED
[INFO] Flink : Formats : Parquet .......................... SKIPPED
[INFO] Flink : Formats : Csv .............................. SKIPPED
[INFO] Flink : Connectors : Hive .......................... SKIPPED
[INFO] Flink : Connectors : JDBC .......................... SKIPPED
[INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
[INFO] Flink : Connectors : Twitter ....................... SKIPPED
[INFO] Flink : Connectors : Nifi .......................... SKIPPED
[INFO] Flink : Connectors : Cassandra ..................... SKIPPED
[INFO] Flink : Connectors : Filesystem .................... SKIPPED
[INFO] Flink : Connectors : Kafka ......................... SKIPPED
[INFO] Flink : Connectors : Google PubSub ................. SKIPPED
[INFO] Flink : Connectors : Kinesis ....................... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
[INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
[INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
[INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
[INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
[INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
[INFO] Flink : Formats : Sequence file .................... SKIPPED
[INFO] Flink : Formats : Compress ......................... SKIPPED
[INFO] Flink : Formats : SQL Orc .......................... SKIPPED
[INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
[INFO] Flink : Formats : SQL Avro ......................... SKIPPED
[INFO] Flink : Examples : Streaming ....................... SKIPPED
[INFO] Flink : Examples : Table ........................... SKIPPED
[INFO] Flink : Examples : Build Helper : .................. SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED
[INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED
[INFO] Flink : Container .................................. SKIPPED
[INFO] Flink : Queryable state : Runtime .................. SKIPPED
[INFO] Flink : Mesos ...................................... SKIPPED
[INFO] Flink : Kubernetes ................................. SKIPPED
[INFO] Flink : Yarn ....................................... SKIPPED
[INFO] Flink : Libraries : Gelly .......................... SKIPPED
[INFO] Flink : Libraries : Gelly scala .................... SKIPPED
[INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
[INFO] Flink : External resources : ....................... SKIPPED
[INFO] Flink : External resources : GPU ................... SKIPPED
[INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
[INFO] Flink : Metrics : Graphite ......................... SKIPPED
[INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
[INFO] Flink : Metrics : Prometheus ....................... SKIPPED
[INFO] Flink : Metrics : StatsD ........................... SKIPPED
[INFO] Flink : Metrics : Datadog .......................... SKIPPED
[INFO] Flink : Metrics : Slf4j ............................ SKIPPED
[INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
[INFO] Flink : Table : Uber ............................... SKIPPED
[INFO] Flink : Table : Uber Blink ......................... SKIPPED
[INFO] Flink : Python ..................................... SKIPPED
[INFO] Flink : Table : SQL Client ......................... SKIPPED
[INFO] Flink : Libraries : State processor API ............ SKIPPED
[INFO] Flink : ML : ....................................... SKIPPED
[INFO] Flink : ML : API ................................... SKIPPED
[INFO] Flink : ML : Lib ................................... SKIPPED
[INFO] Flink : ML : Uber .................................. SKIPPED
[INFO] Flink : Scala shell ................................ SKIPPED
[INFO] Flink : Dist ....................................... SKIPPED
[INFO] Flink : Yarn Tests ................................. SKIPPED
[INFO] Flink : E2E Tests : ................................ SKIPPED
[INFO] Flink : E2E Tests : CLI ............................ SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
[INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED
[INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
[INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
[INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
[INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
[INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
[INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
[INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
[INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
[INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
[INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
[INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
[INFO] Flink : Quickstart : ............................... SKIPPED
[INFO] Flink : Quickstart : Java .......................... SKIPPED
[INFO] Flink : Quickstart : Scala ......................... SKIPPED
[INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
[INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
[INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
[INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
[INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
[INFO] Flink : E2E Tests : State evolution ................ SKIPPED
[INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
[INFO] Flink : E2E Tests : Common ......................... SKIPPED
[INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
[INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
[INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
[INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
[INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
[INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
[INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
[INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
[INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
[INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
[INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
[INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
[INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
[INFO] Flink : E2E Tests : Python ......................... SKIPPED
[INFO] Flink : E2E Tests : HBase .......................... SKIPPED
[INFO] Flink : State backends : Heap spillable ............ SKIPPED
[INFO] Flink : Contrib : .................................. SKIPPED
[INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
[INFO] Flink : FileSystems : Tests ........................ SKIPPED
[INFO] Flink : Docs ....................................... SKIPPED
[INFO] Flink : Walkthrough : .............................. SKIPPED
[INFO] Flink : Walkthrough : Common ....................... SKIPPED
[INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
[INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:49 min
[INFO] Finished at: 2020-10-19T18:24:46+03:00
[INFO] Final Memory: 179M/614M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/apache-flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m -Dmvn.forkNumber=3 -XX:+UseG1GC -jar /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar /home/juha/git/apache-flink/flink-tests/target/surefire 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp surefire_122313349068739873924160tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-tests

The jvmdump-files look like this:

# Created at 2020-10-19T18:14:22.869
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.Reader.read(Reader.java:189)
        at java.base/java.util.Scanner.readInput(Scanner.java:882)
        at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
        at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
        at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
        at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


# Created at 2020-10-19T18:14:22.870
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 898133
        at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
        at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)


I found some JIRA tickets with " The forked VM terminated without properly saying goodbye":


I don't see how these could explain the issue I'm witnessing....

I wonder if the issue is related to the VM running "too hot". 'top' shows very high load averages. 

The crash can be reproduced.

Regards,
Juha