QuEP 4: reducing abstraction penalty by means of disposable arrays

Luigi Ballabio

Abstract

Writing vectorial operation such as v = L.applyTo(u) carries an abstraction penalty due to the creation and copy of the returned array. Optimization of the above expression would cause it to be written as L.applyTo(u,v). This would move towards a functional rather than object-oriented style of programming and would decrease the readability of the code since it is not immediately obvious which of u and v are the input and output array.

An implementation of a "disposable" array is proposed which reduces the abstraction penalty by eliminating unnecessary copying of the returned array. This allows vectorial operations and more generally functions returning an array to be written in the more natural style.

Current implementation

The application of a tridiagonal operator to an array is currently implemented as shown in the following code.

Array TridiagonalOperator::applyTo(const Array& v) const {
    // preconditions - omitted for brevity

    // 1. allocate space for the result array
    int N = size();
    Array result(N);

    // matricial product - omitted for brevity

    // boundary conditions are applied - omitted for brevity

    // 2. the array is returned by copy
    return result;
}

The above method is then used, i.e., in an explicit finite-difference scheme as following:

// an initial condition is chosen
Array v = ...;

// the scheme is used
for (int i=0; i<timeSteps; i++) {
    // the operator is applied
    // 3. the returned array is copied back into the input
    v = L.applyTo(v);
}

At each step of the for loop, 1, 2 and 3 are executed. This results in:

  1. the allocation of the local array result of size N;
  2. the allocation of an array of size N to be returned and the deep copy of result into the latter;
  3. the deep copy of the returned array into v;
adding up to two allocations and two deep copies of an N-sized array.

Before implementing the proposal in this QuEP, the above can be already improved by making the result array a mutable data member of the operator and have applyTo return it by const reference. This would result in:

  1. no allocation necessary since result already exists;
  2. no allocation and deep copy since a const reference is returned;
  3. the deep copy of the returned array into v;
adding up to a single deep copy of an N-sized array.

Proposed implementation

A disposable or temporary array class can be implemented which inherits from Array and redefines its copy semantics. A copy constructor and assignment operator would be added to Array which copy a disposable array with the same semantics, namely, the inner pointer to the allocated storage space and the corresponding informations are simply swapped between the array and the disposable array. After such operation the former will contain the data previously stored in the disposable array, while the latter will contain the data previously stored in the array which will be no longer used. This inexpensive pointer-copy operation will hereafter be referred to as shallow copy, as opposed to the usual deep copy in which each element of the source array is copied to the corresponding element of the target.

A draft implementation of such TempArray can be written as follows.

class Array {
    // This is needed for TempArray to access the Array data members.
    // Alternatively, they could be declared protected instead
    // of private.
    friend class TempArray;
  public:
    // constructors
    Array(const Array& a) { /* deep copy as usual */ }
    Array(const TempArray& a) {
        // shallow copy
        std::swap(pointer_,a.pointer_);
        std::swap(n_,a.n_);
        std::swap(bufferSize)_,a.bufferSize_);
    }
    // assignment operators
    Array& operator=(const Array& a) { /* deep copy as usual */ }
    Array& operator=(const TempArray& a) { /* shallow copy as above */ }
    // etc.
};

class TempArray : public Array {
  public:
    // constructors
    TempArray(const TempArray& a) { /* shallow copy */ }
    TempArray(const Array& a) { /* deep copy - a is not disposable! */ }
    // assignment operators
    TempArray& operator=(const TempArray& a) { /* shallow copy */ }
    TempArray& operator=(const Array& a) { /* deep copy */ }
    // allow for resizing after having been invalidated by a copy
    void resize(size_t n);
};

The tridiagonal operator application could now be written as:

TempArray TridiagonalOperator::applyTo(const Array& v) const {
    // 1. allocate space for the result array
    int N = size();
    TempArray result(N);

    // matricial product - omitted for brevity
    // boundary conditions are applied - omitted for brevity

    // 2. the disposable array is returned by (shallow) copy
    return result;
}

// an initial condition is chosen
Array v = ...;

// the scheme is used
for (int i=0; i<timeSteps; i++) {
    // 3. the returned array is (shallowly) copied back into the input
    v = L.applyTo(v);
}

This would result in:

  1. the allocation of the local disposable array result;
  2. no allocation and deep copy since a shallow copy is performed upon return;
  3. the shallow copy of the returned array into v;
adding up to a single allocation of an N-sized array.

Moreover, the use of the disposable array can be combined with the previously introduced technique of storing the result array as a mutable data member. The implementation of applyTo would be in this case:

TempArray& TridiagonalOperator::applyTo(const Array& v) const {
    // 1. make sure that the disposable array is of the right size
    //    since it could have been invalidated by a copy
    int N = size();
    result_.resize(N);

    // matricial product - omitted for brevity
    // boundary conditions are applied - omitted for brevity

    // 2. the disposable array is returned by reference
    return result;
}

// an initial condition is chosen
Array v = ...;

// the scheme is used
for (int i=0; i<timeSteps; i++) {
    // 3. the returned array is (shallowly) copied back into the input
    v = L.applyTo(v);
}

This would result in:

  1. no allocation of result_ except at the first step only: at each subsequent step result_ contains the storage previously contained in v which is already of size N, and resize() is smart enough to reuse the existing storage if large enough;
  2. no allocation and deep copy since the result is returned by reference;
  3. the shallow copy of the returned array into v;
adding up to no allocations and no deep copies. How about that?

Sobering thoughts and a question

The above seems perfectly fine on paper even if not really cuspy. I implemented it and had it working in my previous life as a physicists. The only care to be taken while coding is to use disposable arrays as such---i.e., as temporary objects to be used for storing values to be immediately passed to a "real" array and not as persistent storage.

However, unlikely as it seems to me, I am not entirely sure that it could not be broken by some optimizing compiler trying to do things too smart for its own good. Does anyone know something of the inner working of such beasts? Could (s)he give any informed advice of this?

Finally, I would not advise to run around waving one's hands in the air and shouting in joy at the prospect of huge performance gains. The disposable array trick merely takes away some abstraction penalty from functions returning an array. It won't make the implementation of the corresponding algorithms any faster, and therefore the performance gain will be small if returning the array by value was not a large part of the total function time.

Conclusion

A disposable array implementation was proposed which can in principle remove the abstraction penalty imposed by returning an array from a function.

This can allow one to give such function a more natural interface than the one requesting the output array to be previously allocated and passed to the function.

Feedback

Feedback on the above proposal should be posted to the QuantLib-dev mailing list.