Glusterfs and elasticsearch

bonovoxly · April 3, 2016, 2:23pm

I’m seeing issues with running elasticsearch on a glusterfs volume. I’ve configured Convoy with Glusterfs and the volume is mounted. All seems well.

When elasticsearch starts up, I see this:

4/3/2016 8:53:55 AM[2016-04-03 12:53:55,083][WARN ][cluster.action.shard ] [Nicole St. Croix] [.kibana][0] received shard failed for target shard [[.kibana][0], node[MlquNusiR2O-9Lc5x2dLeQ], [P], v[1], s[INITIALIZING], a[id=j_vQF1yPRMWPQR56c3Th_w], unassigned_info[[reason=INDEX_CREATED], at[2016-04-03T12:53:50.855Z]]], indexUUID [_4mgkEHzRxawb_2dE3vihA], message [failed recovery], failure [IndexShardRecoveryException[failed recovery]; nested: AlreadyClosedException[Underlying file changed by an external force at 2016-04-03T12:53:54.102783Z, (lock=NativeFSLock(path=/usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/.kibana/0/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],ctime=2016-04-03T12:53:54.102783Z))]; ] 4/3/2016 8:53:55 AM[.kibana][[.kibana][0]] IndexShardRecoveryException[failed recovery]; nested: AlreadyClosedException[Underlying file changed by an external force at 2016-04-03T12:53:54.102783Z, (lock=NativeFSLock(path=/usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/.kibana/0/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],ctime=2016-04-03T12:53:54.102783Z))]; 4/3/2016 8:53:55 AM at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:179)

I’ve seen recommendations for enabling “cluster.consistent-metadata” for a gluster volume, however, it seems that isn’t possible for the current shipped version of glusterfs with Rancher.

Any thoughts for this? I’m guessing I’ll just have to forgo this for sidekicks.

denise · April 9, 2016, 3:39am

Are you launching your own version of Elastic search or how are you trying to connect it to glusterfs?

Can you provide the docker-compose.yml of how you are trying to link elastic search?

bonovoxly · April 9, 2016, 1:04pm

Sure. I’m actually using it through Convoy, a volume mount.

Here’s part of the docker-compose:

`elasticsearch:
ports:

9200:9200/tcp
9300:9300/tcp
labels:
io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack.name=$${stack_name}
command:
elasticsearch
-Des.network.host=0.0.0.0
image: elasticsearch:latest
volume-driver: convoy-gluster
volumes:
elasticsearch-data:/usr/share/elasticsearch/data`

I have other volume mounts to the Convoy storage mounts that work great. Logs and configs for example. But it seems with Elasticsearch, there’s something with how often it writes/reads that Convoy/GlusterFS seems to interfere with.

rsmith · May 9, 2016, 2:57pm

I am seeing the same issue as you are. Running the ELK stack without using a shared volume through Convoy-Gluster works correctly, but with it I am seeing the same issues. Have you found a solution to this @bonovoxly?

davidcunningham · July 16, 2016, 1:00am

Anyone get this working yet?

bonovoxly · July 18, 2016, 4:44pm

@rsmith and @davidcunningham

I’ve been working on a few other things and haven’t revisited this.

hiscal · May 2, 2017, 1:25am

@bonovoxly
The same issue with glusterfs 3.10.1, haven’t found any solutions.

adawolfs · February 7, 2019, 7:43pm

This happens because the ES compares ctime to check the lock file unchanged:

github.com

apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/NativeFSLockFactory.java

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.lucene.store;


import java.nio.channels.FileChannel;

This file has been truncated. show original

In GlusterFS, it returns ctime from one of the multiple backend bricks so the ctime varies:

https://bugzilla.redhat.com/show_bug.cgi?id=1318493

As a result, ES believes the file is changed by someone else.

Topic		Replies	Views
Convoy-Gluster causing write.lock issues with elasticsearch Convoy	0	3064	May 13, 2016
Convoy-Gluster service failes (SETCAP) Convoy	0	1960	May 5, 2016
Convoy-glusterfs performance help Convoy	1	2950	May 7, 2016
Convoy-glusterfs is stuck initializing Convoy	3	2157	April 11, 2016
Connecting convoy gluster to external GlusterFS cluster Convoy	1	2472	May 20, 2016

Glusterfs and elasticsearch

Related topics