Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-dependencies job is failing #405

Closed
jkandasa opened this issue May 10, 2019 · 22 comments · Fixed by jaegertracing/spark-dependencies#66
Closed

spark-dependencies job is failing #405

jkandasa opened this issue May 10, 2019 · 22 comments · Fixed by jaegertracing/spark-dependencies#66
Labels
bug Something isn't working Elasticsearch The issues related to Elasticsearch storage

Comments

@jkandasa
Copy link
Member

Setup:

  • Installed elasticsearch cluster manually. Authentication not enabled
  • jaeger-operator schedules spark-dependencies but it fails,

Error:

19/05/10 03:59:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: failure to login
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:700)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2391)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:185)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:172)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
	at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
	at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
	at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
	at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:675)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2391)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:185)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:172)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)

	at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
	at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
	at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:675)
	... 11 more

CR file:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaegerqe
spec:
  ingress:
    security: none
  strategy: production
  collector:
    replicas: 1
    image: jaegertracing/jaeger-collector:1.11
    resources:
      requests:
        cpu: "1"
        memory: "512Mi"
      limits:
        cpu: "1"
        memory: "512Mi"
    options:
      log-level: info
      metrics-backend: prometheus
      collector:
        num-workers: 1
        queue-size: 100000
      es:
        bulk:
          size: 524288
          workers: 3
          flush-interval: 50ms
        tags-as-fields:
          all: false
  query:
    replicas: 1
    image: jaegertracing/jaeger-query:1.11
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
    options:
      log-level: info
      metrics-backend: prometheus
      query:
        port: 16686
  agent:
    strategy: sidecar
    image: jaegertracing/jaeger-agent:1.11
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "500m"
        memory: "256Mi"
    options:
      log-level: info
      metrics-backend: prometheus
      processor:
        jaeger-compact:
          server-queue-size: 100000
          workers: 1000
  storage:
    type: elasticsearch
    esIndexCleaner:
      enabled: true
    dependencies:
      enabled: true
    options:
      es:
        server-urls: http://elasticsearch:9200

Elasticsearch access with curl,

$ curl http://elasticsearch:9200
{
  "name" : "elasticsearch-0",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "9-kQC9gMSvuuREL2iOVtfA",
  "version" : {
    "number" : "5.6.10",
    "build_hash" : "b727a60",
    "build_date" : "2018-06-06T15:48:34.860Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}
@pavolloffay
Copy link
Member

can you provide the output of oc describe from the dependency job?

@pavolloffay pavolloffay added the bug Something isn't working label May 10, 2019
@jkandasa
Copy link
Member Author

$ oc describe pod jaegerqe-spark-dependencies-1557460500-6bdll
Name:               jaegerqe-spark-dependencies-1557460500-6bdll
Namespace:          jaeger-pipeline
Priority:           0
PriorityClassName:  <none>
Node:               private.redhat.com/10.16.23.52
Start Time:         Fri, 10 May 2019 09:29:20 +0530
Labels:             controller-uid=6219b499-72d7-11e9-999b-ecf4bbc844d4
                    job-name=jaegerqe-spark-dependencies-1557460500
Annotations:        openshift.io/scc=restricted
                    prometheus.io/scrape=false
                    sidecar.istio.io/inject=false
Status:             Failed
IP:                 10.129.1.202
Controlled By:      Job/jaegerqe-spark-dependencies-1557460500
Containers:
  jaegerqe-spark-dependencies:
    Container ID:   docker://0958de4c11bdca1b792ddae3cb08cf512fe72af050159a533092c1dc65f5937c
    Image:          jaegertracing/spark-dependencies
    Image ID:       docker-pullable://docker.io/jaegertracing/spark-dependencies@sha256:f30f15a137cabbbc916ba0359813109ef97269f47c59708a5aae22ba34fd0600
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 10 May 2019 09:29:28 +0530
      Finished:     Fri, 10 May 2019 09:29:31 +0530
    Ready:          False
    Restart Count:  0
    Environment:
      STORAGE:              elasticsearch
      ES_NODES:             http://elasticsearch:9200
      ES_CLIENT_NODE_ONLY:  false
      ES_NODES_WAN_ONLY:    false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-s5vzv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-s5vzv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-s5vzv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:          <none>

@paulcreasey
Copy link

I have the same issue when running via kubectl as detailed in jaeger-kubernetes

kubectl run jaeger-spark-dependencies --env="STORAGE=elasticsearch" --env="ES_NODES=elasticsearch:9200" --env="ES_USERNAME=changeme" --env="ES_PASSWORD=changeme" --restart=Never --image=jaegertracing/spark-dependencies

@jpkrohling jpkrohling added the Elasticsearch The issues related to Elasticsearch storage label May 24, 2019
@pavolloffay
Copy link
Member

pavolloffay commented May 28, 2019

@jkandasa I am not able to reproduce your issue. Could you please try updating spark-dependencies image to the latest tag which is currently:

`"Id": "sha256:0caa3733c48044f805fc6e5cd488cb5232f3b615288657d5881ae1651018c2b5",

I have created this CR

# setup an elasticsearch with `make es`
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
    dependencies:
      enabled: true
      schedule: "*/1 * * * *"

and ES via make es on minikube. All jobs finished fine.

@jkandasa
Copy link
Member Author

@pavolloffay ok, I will check with this image and update here. Thank you!

@jkandasa
Copy link
Member Author

@pavolloffay I tried to pull the latest image on a different machine, It reports the same image that I used.

# docker pull jaegertracing/spark-dependencies
Using default tag: latest
Trying to pull repository docker.io/jaegertracing/spark-dependencies ... 
sha256:f30f15a137cabbbc916ba0359813109ef97269f47c59708a5aae22ba34fd0600: Pulling from docker.io/jaegertracing/spark-dependencies
8e3ba11ec2a2: Already exists 
311ad0da4533: Already exists 
df312c74ce16: Already exists 
76ca6384b055: Pull complete 
f0c67d33e0c5: Pull complete 
d127513e8f06: Pull complete 
e39d9c592ae5: Pull complete 
2a07e2290e01: Pull complete 
a30230bccbcd: Pull complete 
525ca86d8530: Pull complete 
5cf9c400b76d: Pull complete 
de9bfa3b30af: Pull complete 
Digest: sha256:f30f15a137cabbbc916ba0359813109ef97269f47c59708a5aae22ba34fd0600
Status: Downloaded newer image for docker.io/jaegertracing/spark-dependencies:latest

Could you please confirm the repository?

@pavolloffay
Copy link
Member

The repository and image is correct. Before I was referring to ID, which does not seem to be the image SHA. But your image SHA is shown below as RepoDigests

[
    {
        "Id": "sha256:0caa3733c48044f805fc6e5cd488cb5232f3b615288657d5881ae1651018c2b5",
        "RepoTags": [
            "jaegertracing/spark-dependencies:latest"
        ],
        "RepoDigests": [
            "jaegertracing/spark-dependencies@sha256:f30f15a137cabbbc916ba0359813109ef97269f47c59708a5aae22ba34fd0600"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2019-03-12T13:04:42.152445344Z",
        "Container": "d9a53df298be872eaa12b08658d88e60f8b3693aeca82a4caf4815877c644602",
        "ContainerConfig": {
            "Hostname": "d9a53df298be",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin",
                "LANG=C.UTF-8",
                "JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk",
                "JAVA_VERSION=8u171",
                "JAVA_ALPINE_VERSION=8.171.11-r0",
                "APP_HOME=/app/"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ",
                "CMD [\"/bin/sh\" \"-c\" \"java ${JAVA_OPTS} -jar jaeger-spark-dependencies/target/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar\"]"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:cf680204fea8eff63ffdcb6f153211a9086d1bdc3ce2fb426e3a438f55dae4ba",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": null,
            "OnBuild": [],
            "Labels": {}
        },
        "DockerVersion": "18.03.1-ee-3",
        "Author": "Pavol Loffay <ploffay@redhat.com>",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin",
                "LANG=C.UTF-8",
                "JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk",
                "JAVA_VERSION=8u171",
                "JAVA_ALPINE_VERSION=8.171.11-r0",
                "APP_HOME=/app/"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "java ${JAVA_OPTS} -jar jaeger-spark-dependencies/target/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:cf680204fea8eff63ffdcb6f153211a9086d1bdc3ce2fb426e3a438f55dae4ba",
            "Volumes": null,
            "WorkingDir": "/app/",
            "Entrypoint": null,
            "OnBuild": [],
            "Labels": null
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 189362222,
        "VirtualSize": 189362222,
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/f57a89e187b066105ef727a636e44f28fd8b9fb9a91ef15825ea877e4c11d587/diff:/var/lib/docker/overlay2/8a0b58e693d208e9d620df84dd83b98f022951e161ecc9abc479f2c788ac86d0/diff:/var/lib/docker/overlay2/9b4183089dd5393e3f935e295212c37798dc336e8ba8b23d8788aa586f675062/diff:/var/lib/docker/overlay2/254592bb96f84d51bcdf4038c22a928abbea067dc9bbaa3e8bdea7e26b378c1c/diff:/var/lib/docker/overlay2/89b8b094af5f14ea0706055964219169fc0b7ed20d06c7d5fff6a7adaf741866/diff:/var/lib/docker/overlay2/7ba0aca00517b9d7655f6cce137de0fc9a0d794068e8de9af5e41ace6c35ccc7/diff:/var/lib/docker/overlay2/2630a377342cc6c8a6ce232f0314af718ee284fe0aa3ffa2f1854bc8ed015673/diff:/var/lib/docker/overlay2/ff351c93cbe35ca7c0ae49272f769f9683002758b5e84cc58c44d16971625db2/diff:/var/lib/docker/overlay2/4527e8381d12d49ec8490db9cb0007457b17343e7a75a8ac3db5fc9a77fe67b2/diff:/var/lib/docker/overlay2/b6306b4c4095b35987d7a1c225b1a0579d4103d3400f4559417bee506968b4c3/diff:/var/lib/docker/overlay2/99b25302927d0259b12d44830122eaa779d1488e6038db9fff8cb333189d42c3/diff",
                "MergedDir": "/var/lib/docker/overlay2/1232dbe23b2104830331a96ee55555b48f16786ca34378d851d97c52c1c45c82/merged",
                "UpperDir": "/var/lib/docker/overlay2/1232dbe23b2104830331a96ee55555b48f16786ca34378d851d97c52c1c45c82/diff",
                "WorkDir": "/var/lib/docker/overlay2/1232dbe23b2104830331a96ee55555b48f16786ca34378d851d97c52c1c45c82/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:73046094a9b835e443af1a9d736fcfc11a994107500e474d0abf399499ed280c",
                "sha256:298c3bb2664fb7f8514ecdde8b76c0ca95c7c7b82eaa326a7e9661e017488164",
                "sha256:93351e248e6ec58df222fb8b12690ba552273ff307712e78251d3635e3aefedd",
                "sha256:da60f2303a3aac5d07b1356fec04489d08bfcb220fe752ac0cfd26869e1ab585",
                "sha256:0b0da55da16720af289ac3b1c254b24cb47ad296561ed3c9c1fafd3a5950dad5",
                "sha256:25e211c96707f5b0d1ee3966de8ecd50c26d639e4b6cf51aaf1da7d2df45990c",
                "sha256:fb1d9b8c72501978e9f8747a48306ddb285c81ed9d1293312798989f7df2b14a",
                "sha256:4826b3d06809042974869cee6786386ee604b44b7b81d134b794c9589cf8a39d",
                "sha256:231b5823abe77c7b6a0a476b833fe1c3c9e9a8b19ca0370f3b631c7a830c8709",
                "sha256:0866fcd5fd4ecc6a0c1348b7b307ef1e9002420fcb47de0f39fee240afa07de7",
                "sha256:f821541527b604f6c126777df2f46d11ae777ed5df3d5f32458af952101f0b53",
                "sha256:34c2d23d2627ec7d43778a2363e6e7b7f684ade91fe3bc2101f3874630061136"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

@pavolloffay
Copy link
Member

@jkandasa could you provide exact information about the ES installation? What version, where did you downloaded, how it is deployed and configured?

@jkandasa
Copy link
Member Author

@jkandasa
Copy link
Member Author

jkandasa commented May 30, 2019

@pavolloffay I tried again with the operator provided es cluster, I see the same issue.

spark-dependencies log: jaegerqe-spark-dependencies-1559226240-5tmxj.log

oc describe spark-dependencies log: jaegerqe-spark-dependencies-1559226240-5tmxj.txt
oc describe elasticsearch log: elasticsearch-cdm-lat4zs2m-1-5d9ccd4d4c-x9hd4.txt

CR file: crfile.yaml.txt

@pavolloffay
Copy link
Member

I tried again with the operator provided es cluster, I see the same issue.

The self-provisioned ES cluster does not work, it's not supported at the moment.

@pavolloffay
Copy link
Member

Fixed in jaegertracing/spark-dependencies#66

@marceloamaral
Copy link

marceloamaral commented Dec 10, 2019

I am having problem to deploy the jaeger-spark-dependencies service in my openshift cluster: the service cannot connect to the elasticsearch service.... Therefore,
1 - How do I configure it to connect to elasticsearch using tls certificates? The documentation only shows the option for using password...
2 - Is there a way to configure the operator to automatically configure jaeger with spark-dependencies?
3 - Do I need to configure something else in the jaeger UI to show the dependencies (it is disabled by default), or it will automatically be enabled after the spark jobs execute?

I used the following command to deploy the spark-dependencies service:
oc run jaeger-spark-dependencies --env="STORAGE=elasticsearch" --env="ES_NODES=https://elasticsearch:9200" --restart=Never --image=jaegertracing/spark-dependencies -n istio-system

The log from the jaeger-spark-dependencies container is:

19/12/10 02:41:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/10 02:41:02 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-12-10T00:00Z, reading from jaeger-span-2019-12-10 index, result storing to jaeger-dependencies-2019-12-10
19/12/10 02:41:03 ERROR NetworkClient: Node [elasticsearch:9200] failed (org.apache.commons.httpclient.NoHttpResponseException: The server elasticsearch failed to respond); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
        at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:224)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:203)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[elasticsearch:9200]]
        at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:424)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
        at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
        ... 33 more

Then, I tried with:
I used the following command to deploy the spark-dependencies service:
oc run jaeger-spark-dependencies --env="STORAGE=elasticsearch" --env="ES_CLIENT_NODE_ONLY=true" --env="ES_NODES=https://elasticsearch:9200" --restart=Never --image=jaegertracing/spark-dependencies -n istio-system

And, I got this:

 oc logs -n istio-system jaeger-spark-dependencies
19/12/10 02:51:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/10 02:51:11 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-12-10T00:00Z, reading from jaeger-span-2019-12-10 index, result storing to jaeger-dependencies-2019-12-10
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Client-only nodes cannot be enabled when running in WAN mode
        at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:60)
        at org.elasticsearch.hadoop.rest.InitializationUtils.validateSettings(InitializationUtils.java:237)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:218)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
        at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:224)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:203)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)

@pavolloffay
Copy link
Member

1 - How do I configure it to connect to elasticsearch using tls certificates? The documentation only shows the option for using password...

spark dependencies do not support TLS at the moment #294

@peacecoder
Copy link
Contributor

Hi @pavolloffay

This issue still appears in my k8s cluster. Here is the error log

I think I am using the version after your fix Container ENTRYPOINT failed to add passwd entry for anonymous UID,

Container ENTRYPOINT failed to add passwd entry for anonymous UID
19/12/12 23:56:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: failure to login
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:822)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:188)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:175)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
        at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
        at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
        at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
        at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:797)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:188)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:175)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)

        at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
        at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
        at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:797)
        ... 12 more

and here is the info

Name:         jaeger-streaming-spark-dependencies-1576194900-8bp4k
Start Time:   Fri, 13 Dec 2019 07:56:01 +0800
Labels:       controller-uid=5c93f4c3-2c3f-4047-b0d2-10f8d3ec0821
              job-name=jaeger-streaming-spark-dependencies-1576194900
Annotations:  cni.projectcalico.org/podIP: 172.23.9.2/32
              linkerd.io/inject: disabled
              prometheus.io/scrape: false
              sidecar.istio.io/inject: false
Status:       Failed
IP:           172.23.9.2
IPs:
  IP:           172.23.9.2
Controlled By:  Job/jaeger-streaming-spark-dependencies-1576194900
Containers:
  jaeger-streaming-spark-dependencies:
    Container ID:   docker://4a85cdcce4597e8a6318c00dca27f2f7be5a38db6b8c7c4816c89757940c041f
    Image:          harbor:5000/public/spark-dependencies:1.13
    Image ID:       docker-pullable://habor:5000/public/spark-dependencies@sha256:1471ae11fc911afaeced0ac2d4c1dfad7fc9ddd914aa0392b2b38e5829216f0a
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 13 Dec 2019 07:56:02 +0800
      Finished:     Fri, 13 Dec 2019 07:56:03 +0800
    Ready:          False
    Restart Count:  0
    Environment:
      STORAGE:              elasticsearch
      ES_NODES:             http://elasticsearch-ingest.test-inf.svc.cluster.local:9200
      ES_CLIENT_NODE_ONLY:  false
      ES_NODES_WAN_ONLY:    false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l7tn6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-l7tn6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l7tn6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age        From                         Message
  ----    ------     ----       ----                         -------
  Normal  Scheduled  <unknown>  default-scheduler            Successfully assigned dev-inf/jaeger-streaming-spark-dependencies-1576194900-8bp4k to d-k8s-20.novalocal
  Normal  Pulled     42m        kubelet, d-k8s-20.novalocal  Container image "harbor:5000/public/spark-dependencies:1.13" already present on machine
  Normal  Created    42m        kubelet, d-k8s-20.novalocal  Created container jaeger-streaming-spark-dependencies
  Normal  Started    42m        kubelet, d-k8s-20.novalocal  Started container jaeger-streaming-spark-dependencies

Here is my es cluster info

{
  "name" : "elasticsearch-ingest-0",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "***",
  "version" : {
    "number" : "6.8.2",
    "build_flavor" : "oss",
    "build_type" : "docker",
    "build_hash" : "b506955",
    "build_date" : "2019-07-24T15:24:41.545295Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Here is spark cronjob info

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2019-08-23T04:50:39Z"
  labels:
    app: jaeger
    app.kubernetes.io/component: cronjob-spark-dependencies
    app.kubernetes.io/instance: jaeger-streaming
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: jaeger-streaming-spark-dependencies
    app.kubernetes.io/part-of: jaeger
  name: jaeger-streaming-spark-dependencies
  namespace: dev-inf
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: jaeger-streaming
    uid: bdddb233-c470-11e9-94ca-fa163e6c834f
  resourceVersion: "62012690"
  selfLink: /apis/batch/v1beta1/namespaces/dev-inf/cronjobs/jaeger-streaming-spark-dependencies
  uid: 8de07051-c561-11e9-8784-fa163e33752d
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      parallelism: 1
      template:
        metadata:
          annotations:
            linkerd.io/inject: disabled
            prometheus.io/scrape: "false"
            sidecar.istio.io/inject: "false"
          creationTimestamp: null
        spec:
          containers:
          - env:
            - name: STORAGE
              value: elasticsearch
            - name: ES_NODES
              value: http://elasticsearch:9200
            - name: ES_CLIENT_NODE_ONLY
              value: "false"
            - name: ES_NODES_WAN_ONLY
              value: "false"
            image: harbor:5000/public/spark-dependencies:1.13
            imagePullPolicy: IfNotPresent
            name: jaeger-streaming-spark-dependencies
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: 55 23 * * *
  successfulJobsHistoryLimit: 0
  suspend: false
status:
  lastScheduleTime: "2019-12-12T23:55:00Z"

@pavolloffay
Copy link
Member

Could you please pull the latest spark dependencies image?

The latest seems to be f11a998536968188685d83b27d67c3079ba47f3a533a83119ae1a0304382cc01

@peacecoder
Copy link
Contributor

peacecoder commented Dec 17, 2019

@pavolloffay thanks,
I will try to pull the latest image to replace old. but the problem is, I have 2 cluster, one went wrong, while the other one is ok.

@pavolloffay
Copy link
Member

That seems weird, there might be some inconsistency then - using different images between clusters.

@peacecoder
Copy link
Contributor

That seems weird, there might be some inconsistency then - using different images between clusters.

thanks, it seems everything works.

@chgl
Copy link
Contributor

chgl commented Nov 23, 2020

I'm currently running into this as well (v1.20). I suspect it's because I override the job pod's securityContext's runAsUser. Using the latest tag (docker.io/jaegertracing/spark-dependencies@sha256:3dc11f9a8a2fc6aff2f6cbd002de194f89bb1beb9c2e79834693cc722a3bd84a).

Should I create a new issue for this? In general, I'm struggling quite a bit to get the operator to run on a cluster with restrictive PSPs.

  securityContext:
    fsGroup: 1
    runAsGroup: 999 
    runAsNonRoot: true
    runAsUser: 999 # didn't work. With 185, ie the Dockerfile's default user, the same issue occurrs (https://github.com/jaegertracing/spark-dependencies/blob/master/Dockerfile)
    supplementalGroups:
      - 1

Logs:


2020-11-23T00:55:10.301076455+01:00 stdout F Container ENTRYPOINT failed to add passwd entry for anonymous UID
  |   | 2020-11-23T00:55:14.203030982+01:00 stderr F 20/11/22 23:55:14 INFO CassandraDependenciesJob: Running Dependencies job for 2020-11-22T00:00Z: 1606003200000000 ≤ Span.timestamp 1606089599999999
  |   | 2020-11-23T00:55:16.206148265+01:00 stderr F 20/11/22 23:55:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  |   | 2020-11-23T00:55:17.211830949+01:00 stderr F Exception in thread "main" java.io.IOException: failure to login
  |   | 2020-11-23T00:55:17.211897487+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:822)
  |   | 2020-11-23T00:55:17.211908146+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
  |   | 2020-11-23T00:55:17.211933688+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
  |   | 2020-11-23T00:55:17.21193974+01:00 stderr F 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.211970554+01:00 stderr F 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.212009612+01:00 stderr F 	at scala.Option.getOrElse(Option.scala:121)
  |   | 2020-11-23T00:55:17.212015827+01:00 stderr F 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.212052417+01:00 stderr F 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
  |   | 2020-11-23T00:55:17.212057112+01:00 stderr F 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
  |   | 2020-11-23T00:55:17.212062105+01:00 stderr F 	at io.jaegertracing.spark.dependencies.cassandra.CassandraDependenciesJob.run(CassandraDependenciesJob.java:162)
  |   | 2020-11-23T00:55:17.212093684+01:00 stderr F 	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:60)
  |   | 2020-11-23T00:55:17.212123519+01:00 stderr F 	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
  |   | 2020-11-23T00:55:17.212265536+01:00 stderr F Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
  |   | 2020-11-23T00:55:17.212278587+01:00 stderr F 	at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
  |   | 2020-11-23T00:55:17.212283613+01:00 stderr F 	at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
  |   | 2020-11-23T00:55:17.212288259+01:00 stderr F 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  |   | 2020-11-23T00:55:17.21229272+01:00 stderr F 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  |   | 2020-11-23T00:55:17.212296642+01:00 stderr F 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  |   | 2020-11-23T00:55:17.212301377+01:00 stderr F 	at java.lang.reflect.Method.invoke(Method.java:498)
  |   | 2020-11-23T00:55:17.212305252+01:00 stderr F 	at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
  |   | 2020-11-23T00:55:17.212314044+01:00 stderr F 	at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
  |   | 2020-11-23T00:55:17.212318187+01:00 stderr F 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
  |   | 2020-11-23T00:55:17.212322086+01:00 stderr F 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
  |   | 2020-11-23T00:55:17.21232649+01:00 stderr F 	at java.security.AccessController.doPrivileged(Native Method)
  |   | 2020-11-23T00:55:17.212330284+01:00 stderr F 	at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
  |   | 2020-11-23T00:55:17.212334143+01:00 stderr F 	at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
  |   | 2020-11-23T00:55:17.212338067+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:797)
  |   | 2020-11-23T00:55:17.212342703+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
  |   | 2020-11-23T00:55:17.212358339+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
  |   | 2020-11-23T00:55:17.212362346+01:00 stderr F 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.212365912+01:00 stderr F 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.212370321+01:00 stderr F 	at scala.Option.getOrElse(Option.scala:121)
  |   | 2020-11-23T00:55:17.212374284+01:00 stderr F 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
  |   | 2020-11-23T00:55:17.212377615+01:00 stderr F 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
  |   | 2020-11-23T00:55:17.212381974+01:00 stderr F 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
  |   | 2020-11-23T00:55:17.212385937+01:00 stderr F 	at io.jaegertracing.spark.dependencies.cassandra.CassandraDependenciesJob.run(CassandraDependenciesJob.java:162)
  |   | 2020-11-23T00:55:17.212389863+01:00 stderr F 	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:60)
  |   | 2020-11-23T00:55:17.21239357+01:00 stderr F 	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
  |   | 2020-11-23T00:55:17.212396952+01:00 stderr F
  |   | 2020-11-23T00:55:17.212433172+01:00 stderr F 	at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
  |   | 2020-11-23T00:55:17.212439181+01:00 stderr F 	at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
  |   | 2020-11-23T00:55:17.212443317+01:00 stderr F 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
  |   | 2020-11-23T00:55:17.212447069+01:00 stderr F 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
  |   | 2020-11-23T00:55:17.2124509+01:00 stderr F 	at java.security.AccessController.doPrivileged(Native Method)
  |   | 2020-11-23T00:55:17.212454718+01:00 stderr F 	at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
  |   | 2020-11-23T00:55:17.212458688+01:00 stderr F 	at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
  |   | 2020-11-23T00:55:17.212462542+01:00 stderr F 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:797)
  |   | 2020-11-23T00:55:17.212466545+01:00 stderr F 	... 11 more

@nightscape
Copy link

nightscape commented Mar 25, 2021

Not sure if this helps anybody, but I had the same problem in a different context, while creating a custom Spark image for usage in Kubernetes.
The incantation that works for me is the following:
Dockerfile (note especially the auth required line):

FROM bitnami/spark:3.1.1
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /usr/bin/tini
USER root
RUN chmod +x /usr/bin/tini && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && chmod ugo+rwx -R /opt/bitnami/spark
USER 1001
ADD entrypoint.sh /opt/
ENTRYPOINT ["/opt/bitnami/scripts/spark/entrypoint.sh", "/opt/entrypoint.sh"]

entrypoint.sh

#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# echo commands to the terminal output
set -ex

# Check whether there is a passwd entry for the container UID
myuid=$(id -u)
mygid=$(id -g)
# turn off -e for getent because it will return error code in anonymous uid case
set +e
uidentry=$(getent passwd $myuid)
set -e

# If there is no passwd entry for the container UID, attempt to create one
if [ -z "$uidentry" ] ; then
    if [ -w /etc/passwd ] ; then
        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
    else
        echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
    fi
fi

SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt
readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt

if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
fi

if [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "2" ]; then
    pyv="$(python -V 2>&1)"
    export PYTHON_VERSION="${pyv:7}"
    export PYSPARK_PYTHON="python"
    export PYSPARK_DRIVER_PYTHON="python"
elif [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "3" ]; then
    pyv3="$(python3 -V 2>&1)"
    export PYTHON_VERSION="${pyv3:7}"
    export PYSPARK_PYTHON="python3"
    export PYSPARK_DRIVER_PYTHON="python3"
fi

# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor.
# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s.
if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
fi

if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
fi

case "$1" in
  driver)
    shift 1
    CMD=(
      "$SPARK_HOME/bin/spark-submit"
      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
      --deploy-mode client
      "$@"
    )
    ;;
  executor)
    shift 1
    CMD=(
      ${JAVA_HOME}/bin/java
      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
      -Xms$SPARK_EXECUTOR_MEMORY
      -Xmx$SPARK_EXECUTOR_MEMORY
      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
      org.apache.spark.executor.CoarseGrainedExecutorBackend
      --driver-url $SPARK_DRIVER_URL
      --executor-id $SPARK_EXECUTOR_ID
      --cores $SPARK_EXECUTOR_CORES
      --app-id $SPARK_APPLICATION_ID
      --hostname $SPARK_EXECUTOR_POD_IP
    )
    ;;

  *)
    echo "Non-spark-on-k8s command provided, proceeding in pass-through mode..."
    CMD=("$@")
    ;;
esac

# Execute the container CMD under tini for better hygiene
exec /usr/bin/tini -s -- "${CMD[@]}"

@flodetan
Copy link

flodetan commented Sep 16, 2022

hi I encoutered same issue:

docker image i use.

DEV_JMP /home/ubuntu $ docker images |grep spark
jaegertracing/spark-dependencies                             latest                                   e76604ab86a7        15 months ago       294MB

when i run following command is no problem:

 docker run -e JAVA_OPTS="-Xms1G -Xmx20G"  --env STORAGE=elasticsearch --memory="2g" --memory-swap="5g" --cpus="2.0" --env ES_NODES=http://10.10.10.10:9200 --env ES_TIME_RANGE=5m jaegertracing/spark-dependencies > 3 &

but When i change another way(same image version)
it got this error:

Exception in thread "main" java.io.IOException: failure to login
	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:822)
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:225)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:212)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
	at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)

my environment varibles:

            STORAGE: 'elasticsearch',
            ES_NODES: 'http://10.10.10.10:9200',
            ES_TIME_RANGE: '5m',
            JAVA_OPTS: '-Xms1G -Xmx2G'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Elasticsearch The issues related to Elasticsearch storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants