How to keep data Consistency between replicas?


#1

Here is how to handle io error when failed to write replica.

        func (c *Controller) handleErrorNoLock(err error) error {
    	if bErr, ok := err.(*BackendError); ok {
    		if len(bErr.Errors) > 0 {
    			for address, replicaErr := range bErr.Errors {
    				logrus.Errorf("Setting replica %s to ERR due to: %v", address, replicaErr)
    				c.setReplicaModeNoLock(address, types.ERR)
    			}
    			// if we still have a good replica, do not return error
    			for _, r := range c.replicas {
    				if r.Mode == types.RW {
    					logrus.Errorf("Ignoring error because %s is mode RW: %v", r.Address, err)
    					err = nil
    					break
    				}
    			}
    		}
    	}
    	if err != nil {
    		logrus.Errorf("I/O error: %v", err)
    	}
    	return err
    }

replica will be set to ERR mode, what happen is set to ERR mode? will it be rebuilt?
If we have two replica, and they both return error, their data may be different, how to handle this case?


#2
func (r *Replica) WriteAt(buf []byte, offset int64) (int, error) {
	if r.readOnly {
		return 0, fmt.Errorf("Can not write on read-only replica")
	}

	r.RLock()
	r.info.Dirty = true
	c, err := r.volume.WriteAt(buf, offset)
	r.RUnlock()
	if err != nil {
		return c, err
	}
	if err := r.increaseRevisionCounter(); err != nil {
		return c, err
	}
	return c, nil
}

After we write data to disk, we increase revision counter, then updata revision counter to disk.
If disk happen to fault when writing data to disk, it will be partial write without updating revision counter.
for the next time, replica restart.

revisionCounters := make(map[string]int64)
	for _, r := range c.replicas {
		counter, err := c.backend.GetRevisionCounter(r.Address)
		if err != nil {
			return err
		}
		if counter > expectedRevision {
			expectedRevision = counter
		}
		revisionCounters[r.Address] = counter
	}

	for address, counter := range revisionCounters {
		if counter != expectedRevision {
			logrus.Errorf("Revision conflict detected! Expect %v, got %v in replica %v. Mark as ERR",
				expectedRevision, counter, address)
			c.setReplicaModeNoLock(address, types.ERR)
		}
	}

At this situation, since we revision keep the old value, we can not tell whether the data is complete. bad things will happen, Is’t correct?


#3

Any replica with Error state will be disconnected from controller. We’re not going to use that replica again (except for the salvage). Longhorn Manager will see the error then start the rebuilding process for a new replica.

If all the replicas are in error state, the manager will put the volume in faulted state. User can manually select some replicas and restore the volume to detached state. Upon next attach, controller will compare the revision counters of every replica, try to get the ones with the biggest counter, and discard the others automatically.