Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Spark session fails with Hadoop 3 because of Hive Shims #321

Open
mikitakandratsiuk opened this issue May 23, 2020 · 2 comments
Open

Comments

@mikitakandratsiuk
Copy link

mikitakandratsiuk commented May 23, 2020

Using version 2.4.5_0.14.0
There is an issue during creation of Spark Session object. As can be seen here, Spark Session has enableHiveSupport by default. This calls the org.spark-project.hive:hive-exec:1.2.1.spark2 library (specifically Hive Shims) which is not compatible with Hadoop 3 and causes "Unrecognized Hadoop major version number" error.

This makes spark-testing-base unusable with Hadoop 3 (especially when Hive is not required for the project at all).

The stack trace is below:

An exception or error caused a run to abort. 
java.lang.ExceptionInInitializerError
	at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.newBuilder$1(DataFrameSuiteBase.scala:84)
	at com.holdenkarau.spark.testing.DataFrameSuiteBaseLike$class.sqlBeforeAllTestCases(DataFrameSuiteBase.scala:114)
	at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.com$holdenkarau$spark$testing$DataFrameSuiteBase$$super$sqlBeforeAllTestCases(ApplicationTests.scala:14)
	at com.holdenkarau.spark.testing.DataFrameSuiteBase$class.beforeAll(DataFrameSuiteBase.scala:43)
	at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.beforeAll(ApplicationTests.scala:14)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
	at xxx.xxx.xxx.xxx.spark_kafka.ApplicationTests.run(ApplicationTests.scala:14)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1320)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1314)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:972)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:971)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
	at org.scalatest.tools.Runner$.run(Runner.scala:798)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:133)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:27)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.2
	at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
	at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
	at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
	... 21 more
@mavericksid
Copy link

@mikitakandratsiuk were you able to find any solution for this?

@mikitakandratsiuk
Copy link
Author

@mavericksid, it's been a long time since I've raised this issue, so I don't really remember. Here is the comment that I found in my code, hope it helps:

I've excluded complete org.spark-project.hive from Holdenkarau and substituted it with org.apache.hive:hive-exec:3.1.2 and org.apache.hive:hive-metastore:3.1.2.

build.sbt - libraryDependencies:

"com.holdenkarau" %% "spark-testing-base" % s"${sparkVersion}_0.14.0" % Test
   // exclude Hive (especially Hive Shims) because of "IllegalArgumentException: Unrecognized Hadoop major version number: 3.2.1" error (add real Hive dependency below instead)
   // the reason is that Hive dependency is added by Spark-Hive 2.4.5, where Hive doesn't support Hadoop 3
   excludeAll ExclusionRule("org.spark-project.hive")

"org.apache.hive" % "hive-metastore" % "3.1.2" % Test,
"org.apache.hive" % "hive-exec" % "3.1.2" % Test,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants