Valgrind invalid read/write when using the LAPACK from C++11

Post here if you have a question about LAPACK or ScaLAPACK algorithm or data format

Valgrind invalid read/write when using the LAPACK from C++11

Postby ejspeiro » Mon May 12, 2014 3:22 pm

Dear all,

I am using Valgrind's memcheck to supervise the memory manipulation of my code. Yet, it is reporting a very strange invalid access from the LAPACK.

The following code depicts a minimally working example, which can be compiled as follows:

Code: Select all
g++ -std=c++11 -c -g -Wall mwe.cc
gfortran -o mwe mwe.o $(HOME)/Libraries/lapack-3.4.1/liblapack.a
$(HOME)/Libraries/BLAS/libblas.a -lstdc++


The code creates a generator vector, its related Vandermonde matrix, and then performs a QR factorization of it. Once the factorization completes, we extract and print the Q matrix.

Code: Select all
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstring>

#define VERBOSE       1       // Should I verbose the execution?

using namespace std;

extern "C" {

  // External routines from the LAPACK:
  // LAPACK can be installed from either:
  // 1. http://www.netlib.org/lapack/
  // 2. https://software.intel.com/en-us/non-commercial-software-development

  // Double precision General QR Factorization to compute the null-space:
  // Theory: http://www.netlib.org/lapack/lug/node69.html
  // http://www.netlib.no/netlib/lapack/double/dgeqrf.f
  void dgeqrf_(int *m,
               int *n,
               double *a,
               int *lda,
               double *tau,
               double *work,
               int *lwork,
               int *info);

  // dormqr_ overwrites the general real M-by-N matrix C with
  //
  //                 SIDE = 'L'     SIDE = 'R'
  // TRANS = 'N':      Q * C          C * Q
  // TRANS = 'T':      Q**T * C       C * Q**T
  //
  // where Q is a real orthogonal matrix defined as the product of k
  // elementary reflectors
  //
  //       Q = H(1) H(2) . . . H(k)
  //
  // as returned by DGEQRF. Q is of order M if SIDE = 'L' and of order N
  // if SIDE = 'R'.
  // http://www.netlib.no/netlib/lapack/double/dormqr.f
  void dormqr_(char *side,
               char *trans,
               int *m,
               int *n,
               int *k,
               double *a,
               int *lda,
               double *tau,
               double *c,
               int *ldc,
               double *work,
               int *lwork,
               int *info);
}

// Used to compute a 1D index (idx) from a 2D set of indexes.
inline int idx(const int ii, const int offset, const int jj) {

  return ii*offset + jj;
}

// Generates an identity matrix:
double* Identity(int num_rows, int num_cols, bool transpose) {

  double* II {};  // Output identity matrix.

  if (num_rows < 1 || num_cols < 1) {
    return nullptr;
  }
  try {
    II = new double[num_rows*num_cols];
    memset (II, 0, sizeof(II[0])*num_rows*num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (transpose) {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        II[idx(jj,num_rows,ii)] = (ii == jj)? 1.0: 0.0;
      }
    }
  } else {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        II[idx(ii,num_cols,jj)] = (ii == jj)? 1.0: 0.0;
      }
    }
  }

  return II;
}

// Transposes a matrix:
double* Transpose(double *AA, int rr, int cc) {

  double* AT {};  // Output transpose matrix.

  if (rr <= 0 || cc <= 0 || AA == nullptr) {
    return nullptr;
  }
  try {
    AT = new double[rr*cc];
    memset(AT, 0, sizeof(AT[0])*(rr*cc));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  for (auto ii = 0; ii < rr; ii++) {
    for (auto jj = 0; jj < cc; jj++) {
      AT[idx(jj,rr,ii)] = AA[idx(ii,cc,jj)];
    }
  }

  return AT;
}

// Generates a Vandermonde matrix with a given generator vector gen:
double* VandermondeOrDie(const double *gen,      // Given generator vector.
                         const int length_gen,   // How long generator vector?
                         const int num_rows,     // How many rows?
                         const int num_cols,     // How many columns?
                         const bool transpose) { // Should create transposed?

  double* TT {};  // Output Vandermonde matrix.

  if (length_gen < 1 || num_rows < 1 || num_cols < 1 || length_gen > num_cols) {
    return nullptr;
  }

  try {
    TT = new double[num_rows*num_cols];
    memset (TT, 0, sizeof(TT[0])*num_rows*num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (transpose) {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        TT[idx(jj,num_rows,ii)] = pow(gen[jj], (double) ii);
      }
    }
  } else {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        TT[idx(ii,num_cols,jj)] = pow(gen[jj], (double) ii);
      }
    }
  }

  return TT;
}

int main () {

  // Input:
  int kk {8}; // Order of numerical accuracy of the operator.

  // Variables:
  char side {'L'};  // Side for overwriting Q.
  char trans {'N'}; // Should it be transposed?

  double* gg {};    // Spatial coordinates for the boundary points.
  double* AA {};    // Boundary Vandermonde to get the null-space.
  double* tau {};   // The scalar factors of elementary reflectors for QR.
  double* work {};  // Work array for dgels_ solver and dgeqrf_ QR factor.
  double* QQ {};    // Q matrix from qr(AA) containing AA's null-space.

  int info {0};         // Information regarding solution of the system.
  int num_bndy_approxs; // Required boundary points for uniform order accuracy.
  int AA_num_rows;      // Number rows for the boundary Vandermonde matrices.
  int AA_num_cols;      // Number cols for the boundary Vandermonde matrices.
  int AA_ld;            // Leading dimension of the boundary Vandermonde matrix.
  int lwork {-1};       // Length of work array for the dgels_ solver.
  int QQ_ld;            // Leading dimension of the Q = qr(AA) matrix.

  // Compute amount of boundary points required to achieve uniform accuracy:
  num_bndy_approxs = (int) (3.0*((double) kk)/2.0);

  // Generator vector for the next approximation:
  try {
    gg = new double[num_bndy_approxs];
    memset(gg, 0, sizeof(gg[0])*num_bndy_approxs);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  gg[0] = -1.0/2.0;
  for (auto ii = 1; ii < num_bndy_approxs; ii++) {
    gg[ii] = gg[ii - 1] + 1.0;
  }

  // Before solving this and any other systems for the boundary matrices, we
  // must obtain the null-space from the first matrix. For this we will follow
  // recommendations given in:
  //
  // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506
  //
  // We will compute the QR Factorization of the transpose, as in the
  // following (MATLAB) pseudo-code:
  //
  // [Q,R] = qr(A'); % Full QR as defined in
  // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf
  // null-space = Q(:, last (kk/2 - 1) columns of Q);
  //
  // However, given the nature of the Vandermonde matrices we will compute--
  // we will compute dim_null of them--they will all posses the same
  // null-space. Therefore, we impose the convention of computing the
  // null-space of the first Vandermonde matrix only (for the west boundary).

  // Vandermonde matrix (using boundary coordinates as the generator):
  // We compute it NOT transposed, which is the ACTUAL transpose for
  // LAPACK's QR factor.
  AA_num_rows = kk + 1;
  AA_num_cols = num_bndy_approxs;
  AA = VandermondeOrDie(gg, num_bndy_approxs,
                        AA_num_rows, AA_num_cols, false);
  if (AA == nullptr) {
    cerr << "ERROR constructing matrix at line " << __LINE__ + 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  // Prepare to factorize: allocate and inquire for the value of lwork:
  try {
    tau = new double[1];
    memset(tau, 0, sizeof(tau[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  try {
    work = new double[1];
    memset(work, 0, sizeof(work[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  AA_ld = AA_num_cols;
  info = 0;

  dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info);
  lwork = (int) work[0];

  // Once we know lwork, we can actually invoke the factorization:
  if (tau != nullptr) {
    delete[] tau;
    tau = nullptr;
  }
  try {
    tau = new double[AA_num_cols];
    memset(tau, 0, sizeof(tau[0])*AA_num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[lwork];
    memset(work, 0, sizeof(work[0])*(lwork));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info);
  if (!info) {
    if (VERBOSE) {
      cout << "QR factorization successfully completed!" << endl << endl;
    }
  } else {
    cerr << "Something went wrong factoring! info = " << info << endl;
  }

  // Generate the real matrix Q with orthonormal columns. This has to be done
  // separately since the actual output of dgeqrf_(AA) represents the
  // orthogonal matrix Q as a product of AA_num_cols elementary Householder
  // reflectors:

  QQ_ld = AA_num_cols;
  try {
    QQ = new double[QQ_ld*QQ_ld];
    memset(QQ, 0, sizeof(QQ[0])*(QQ_ld*QQ_ld));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  // QQ must be initialized to the identity matrix:
  for (auto ii = 0; ii < QQ_ld; ii++) {
    for (auto jj = 0; jj < QQ_ld; jj++) {
      if (ii == jj) {
        QQ[idx(ii,QQ_ld,jj)] = 1.0;
      }
    }
  }

  // To complete the recovery of the real QQ, we must re-inquire the new value
  // for lwork that is going to be used by the dormqr_ routine we will use.
  // Notice we leave the tau array alone... i.e. we only re-allocate the work
  // array.
  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[1];
    memset(work, 0, sizeof(work[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  lwork = -1;

  dormqr_(&side, &trans, &AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld,
          tau, QQ, &QQ_ld, work, &lwork, &info);
  lwork = (int) work[0];

  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[lwork];
    memset(work, 0, sizeof(work[0])*(lwork));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  info = 0;

  // WARNING: We get the following warning from Valgrind's memcheck, when
  // invoking the following routine:
// ==14545== Invalid read of size 8
// ==14545==    at 0x40BA4B: dorm2r_ (dorm2r.f:272)
// ==14545==    by 0x4080F1: dormqr_ (dormqr.f:289)
// ==14545==    by 0x402450: main (divrcrs.cc:480)
// ==14545==  Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd
// ==14545==
// ==14545== Invalid write of size 8
// ==14545==    at 0x40BA7A: dorm2r_ (dorm2r.f:273)
// ==14545==    by 0x4080F1: dormqr_ (dormqr.f:289)
// ==14545==    by 0x402450: main (divrcrs.cc:480)
// ==14545==  Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd
// ==14545==
// ==14545== Invalid write of size 8
// ==14545==    at 0x40BB4F: dorm2r_ (dorm2r.f:276)
// ==14545==    by 0x4080F1: dormqr_ (dormqr.f:289)
// ==14545==    by 0x402450: main (divrcrs.cc:480)
// ==14545==  Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd
// ==14545==
  // TODO: We should make a more profound debug, unless we can no longer
  // depend on the matrix QQ (within the code flow).

  dormqr_(&side, &trans, &AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld,
          tau, QQ, &QQ_ld, work, &lwork, &info);
  if (!info) {
    if (VERBOSE) {
      cout << "Q matrix successfully attained!" << endl << endl;
      cout << "Q = " << endl;
      for (auto ii = 0; ii < QQ_ld; ii++) {
        for (auto jj = 0; jj < QQ_ld; jj++) {
          cout << setw(12) << QQ[idx(ii,QQ_ld,jj)];
        }
        cout << endl;
      }
      cout << endl;
    }
  } else {
    cerr << "Something went wrong building QQ! info = " << info << endl;
  }

  // Garbage collection :)
  if (gg != nullptr) {
    delete[] gg;
    gg = nullptr;
  }
  if (AA != nullptr) {
    delete[] AA;
    AA = nullptr;
  }
  if (tau != nullptr) {
    delete[] tau;
    tau = nullptr;
  }
  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  if (QQ != nullptr) {
    delete[] QQ;
    QQ = nullptr;
  }
}


Output:

Code: Select all
ejspeiro@Eduardo-Alienware-14:~/Dropbox/mimops$ make run
./mwe
QR factorization successfully completed!

Q matrix successfully attained!

Q =
   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675   -0.288675
   -0.459933   -0.376309   -0.292685   -0.209061   -0.125436  -0.0418121   0.0418121    0.125436    0.209061    0.292685    0.376309    0.459933
    0.501828    0.228104  0.00912415   -0.155111     -0.2646   -0.319345   -0.319345     -0.2646   -0.155111  0.00912415    0.228104    0.501828
    0.459933  -0.0418121   -0.292685   -0.348434    -0.26481  -0.0975616   0.0975616     0.26481    0.348434    0.292685   0.0418121   -0.459933
   -0.368767    0.301718    0.368767    0.145272   -0.134097   -0.312893   -0.312893   -0.134097    0.145272    0.368767    0.301718   -0.368767
   -0.261608    0.451869    0.166478   -0.229898   -0.348811    -0.15855     0.15855    0.348811    0.229898   -0.166478   -0.451869    0.261608
    0.164197   -0.462738    0.164197    0.373176   0.0597081   -0.298541   -0.298541   0.0597081    0.373176    0.164197   -0.462738    0.164197
  -0.0904791    0.370142   -0.412914   -0.136541    0.335595    0.230311   -0.230311   -0.335595    0.136541    0.412914   -0.370142   0.0904791
   0.0430767    -0.23888    0.466011   -0.254544   -0.289789    0.274124    0.274124   -0.289789   -0.254544    0.466011    -0.23888   0.0430767
 -0.00874036      0.0583   -0.146954    0.141249   0.0486822   -0.156543   -0.130333    0.566054   -0.653529    0.386018   -0.119774   0.0155708
   0.0117399  -0.0941426    0.322481   -0.603006    0.637649   -0.321721  -0.0245471    0.104354  -0.0153304  -0.0339244   0.0199423 -0.00349546
   0.0106414  -0.0748491    0.203312   -0.217426   -0.109592    0.578839    -0.66744     0.29945   0.0592101   -0.129478   0.0558632 -0.00853182



It seems to be running fine! Except that when I go to Valgrind's memcheck, I get this output:

Code: Select all
==21810== HEAP SUMMARY:
==21810==     in use at exit: 0 bytes in 0 blocks
==21810==   total heap usage: 30 allocs, 30 frees, 35,757 bytes allocated
==21810==
==21810== All heap blocks were freed -- no leaks are possible
==21810==
==21810== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0)
==21810==
==21810== 2 errors in context 1 of 3:
==21810== Invalid write of size 8
==21810==    at 0x40A0A1: dorm2r_ (dorm2r.f:276)
==21810==    by 0x406B30: dormqr_ (dormqr.f:289)
==21810==    by 0x4020C6: main (mwe.cc:399)
==21810==  Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd
==21810==
==21810==
==21810== 2 errors in context 2 of 3:
==21810== Invalid write of size 8
==21810==    at 0x409FCC: dorm2r_ (dorm2r.f:273)
==21810==    by 0x406B30: dormqr_ (dormqr.f:289)
==21810==    by 0x4020C6: main (mwe.cc:399)
==21810==  Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd
==21810==
==21810==
==21810== 2 errors in context 3 of 3:
==21810== Invalid read of size 8
==21810==    at 0x409F9D: dorm2r_ (dorm2r.f:272)
==21810==    by 0x406B30: dormqr_ (dormqr.f:289)
==21810==    by 0x4020C6: main (mwe.cc:399)
==21810==  Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd
==21810==
==21810== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0)


Any hint? o.O Through DDD, I tracked the bug to the suggested line...

It seems that, the routine indexes the matrix at the very last value of it... namely, A(N,N). But what to do from C++11?

Feel free to suggest about my coding!

Thanks in advanced!

:)
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby Julien Langou » Mon May 12, 2014 3:51 pm

I have glanced at the code super fast. Best I can do. I see that you are using DORMQR on identity to generate Q. This is OK. Maybe a better way is to use DORGQR right away. This does not explain your problem. This is just feedback from parsing the code. Julien.
Julien Langou
 
Posts: 821
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby ejspeiro » Mon May 12, 2014 4:01 pm

Dear Julien,

I will keep that into consideration.

Thanks for your efforts,
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby ejspeiro » Wed May 14, 2014 4:22 pm

I have written a minimal working example, in where I try to implement this QR factorization, but now using DORGQR instead of DORMQR

I want to ask the following:

Julien Langou wrote:Maybe a better way is to use DORGQR right away.


What do you mean with "right away"? As far as what I could read regarding DORGQR, you still have to compute the factorization using dgeqrf_, so what would be the advantage of using DORGQR? My guess is that it has to do with the fact that with DORGQR, one does not have to allocate extra space for the Q matrix, nor initialize it to the identity matrix, whereas with DORMQR, one has to do this.

However, provided this is true, what if my system is an under-determined system? Where A is 9 by 12, but my Q matrix is 12 by 12?

My minimal working example fails, because of this:

Code: Select all
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstring>

#define VERBOSE       1       // Should I verbose the execution?

using namespace std;

extern "C" {

  // External routines from the LAPACK:
  // LAPACK can be installed from either:
  // 1. http://www.netlib.org/lapack/
  // 2. https://software.intel.com/en-us/non-commercial-software-development

  // Double precision General QR Factorization to compute the null-space:
  // Theory: http://www.netlib.org/lapack/lug/node69.html
  // http://www.netlib.no/netlib/lapack/double/dgeqrf.f
  void dgeqrf_(int *m,
               int *n,
               double *a,
               int *lda,
               double *tau,
               double *work,
               int *lwork,
               int *info);

  // dormqr_ overwrites the general real M-by-N matrix C with
  //
  //                 SIDE = 'L'     SIDE = 'R'
  // TRANS = 'N':      Q * C          C * Q
  // TRANS = 'T':      Q**T * C       C * Q**T
  //
  // where Q is a real orthogonal matrix defined as the product of k
  // elementary reflectors
  //
  //       Q = H(1) H(2) . . . H(k)
  //
  // as returned by DGEQRF. Q is of order M if SIDE = 'L' and of order N
  // if SIDE = 'R'.
  // http://www.netlib.no/netlib/lapack/double/dormqr.f
  void dormqr_(char *side,
               char *trans,
               int *m,
               int *n,
               int *k,
               double *a,
               int *lda,
               double *tau,
               double *c,
               int *ldc,
               double *work,
               int *lwork,
               int *info);

  // dorgqr_ generates an M-by-N real matrix Q with orthonormal columns,
  // which is defined as the first N columns of a product of K elementary
  // reflectors of order M
  //
  //       Q  =  H(1) H(2) . . . H(k)
  //
  // as returned by DGEQRF.
  void dorgqr_(int *m,
               int *n,
               int *k,
               double *a,
               int *lda,
               double *tau,
               double *work,
               int *lwork,
               int *info);
}

// Used to compute a 1D index (idx) from a 2D set of indexes.
inline int idx(const int ii, const int offset, const int jj) {

  return ii*offset + jj;
}

// Generates an identity matrix:
double* Identity(int num_rows, int num_cols, bool transpose) {

  double* II {};  // Output identity matrix.

  if (num_rows < 1 || num_cols < 1) {
    return nullptr;
  }
  try {
    II = new double[num_rows*num_cols];
    memset (II, 0, sizeof(II[0])*num_rows*num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (transpose) {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        II[idx(jj,num_rows,ii)] = (ii == jj)? 1.0: 0.0;
      }
    }
  } else {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        II[idx(ii,num_cols,jj)] = (ii == jj)? 1.0: 0.0;
      }
    }
  }

  return II;
}

// Transposes a matrix:
double* Transpose(double *AA, int rr, int cc) {

  double* AT {};  // Output transpose matrix.

  if (rr <= 0 || cc <= 0 || AA == nullptr) {
    return nullptr;
  }
  try {
    AT = new double[rr*cc];
    memset(AT, 0, sizeof(AT[0])*(rr*cc));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  for (auto ii = 0; ii < rr; ii++) {
    for (auto jj = 0; jj < cc; jj++) {
      AT[idx(jj,rr,ii)] = AA[idx(ii,cc,jj)];
    }
  }

  return AT;
}

// Generates a Vandermonde matrix with a given generator vector gen:
double* VandermondeOrDie(const double *gen,      // Given generator vector.
                         const int length_gen,   // How long generator vector?
                         const int num_rows,     // How many rows?
                         const int num_cols,     // How many columns?
                         const bool transpose) { // Should create transposed?

  double* TT {};  // Output Vandermonde matrix.

  if (length_gen < 1 || num_rows < 1 || num_cols < 1 || length_gen > num_cols) {
    return nullptr;
  }

  try {
    TT = new double[num_rows*num_cols];
    memset (TT, 0, sizeof(TT[0])*num_rows*num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (transpose) {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        TT[idx(jj,num_rows,ii)] = pow(gen[jj], (double) ii);
      }
    }
  } else {
    for (auto ii = 0; ii < num_rows; ii++) {
      for (auto jj = 0; jj < num_cols; jj++) {
        TT[idx(ii,num_cols,jj)] = pow(gen[jj], (double) ii);
      }
    }
  }

  return TT;
}

int main () {

  // Input:
  int kk {8}; // Order of numerical accuracy of the operator.

  // Variables:
  double* gg {};    // Spatial coordinates for the boundary points.
  double* AA {};    // Boundary Vandermonde to get the null-space.
  double* tau {};   // The scalar factors of elementary reflectors for QR.
  double* work {};  // Work array for dgels_ solver and dgeqrf_ QR factor.

  int info {0};         // Information regarding solution of the system.
  int num_bndy_approxs; // Required boundary points for uniform order accuracy.
  int AA_num_rows;      // Number rows for the boundary Vandermonde matrices.
  int AA_num_cols;      // Number cols for the boundary Vandermonde matrices.
  int AA_ld;            // Leading dimension of the boundary Vandermonde matrix.
  int lwork {-1};       // Length of work array for the dgels_ solver.

  // Compute amount of boundary points required to achieve uniform accuracy:
  num_bndy_approxs = (int) (3.0*((double) kk)/2.0);

  // Generator vector for the next approximation:
  try {
    gg = new double[num_bndy_approxs];
    memset(gg, 0, sizeof(gg[0])*num_bndy_approxs);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  gg[0] = -1.0/2.0;
  for (auto ii = 1; ii < num_bndy_approxs; ii++) {
    gg[ii] = gg[ii - 1] + 1.0;
  }

  // Before solving this and any other systems for the boundary matrices, we
  // must obtain the null-space from the first matrix. For this we will follow
  // recommendations given in:
  //
  // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506
  //
  // We will compute the QR Factorization of the transpose, as in the
  // following (MATLAB) pseudo-code:
  //
  // [Q,R] = qr(A'); % Full QR as defined in
  // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf
  // null-space = Q(:, last (kk/2 - 1) columns of Q);
  //
  // However, given the nature of the Vandermonde matrices we will compute--
  // we will compute dim_null of them--they will all posses the same
  // null-space. Therefore, we impose the convention of computing the
  // null-space of the first Vandermonde matrix only (for the west boundary).

  // Vandermonde matrix (using boundary coordinates as the generator):
  // We compute it NOT transposed, which is the ACTUAL transpose for
  // LAPACK's QR factor.
  AA_num_rows = kk + 1;
  AA_num_cols = num_bndy_approxs;
  AA = VandermondeOrDie(gg, num_bndy_approxs,
                        AA_num_rows, AA_num_cols, false);
  if (AA == nullptr) {
    cerr << "ERROR constructing matrix at line " << __LINE__ + 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  // Prepare to factorize: allocate and inquire for the value of lwork:
  try {
    tau = new double[1];
    memset(tau, 0, sizeof(tau[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  try {
    work = new double[1];
    memset(work, 0, sizeof(work[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  AA_ld = AA_num_cols;
  info = 0;

  dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info);

  if (info == 0) {
    lwork = (int) work[0];
  } else {
    cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  // Once we know lwork, we can actually invoke the factorization:
  if (tau != nullptr) {
    delete[] tau;
    tau = nullptr;
  }
  try {
    tau = new double[AA_num_cols];
    memset(tau, 0, sizeof(tau[0])*AA_num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[lwork];
    memset(work, 0, sizeof(work[0])*(lwork));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info);

  if (!info) {
    if (VERBOSE) {
      cout << "QR factorization successfully completed!" << endl << endl;
    }
  } else {
    cerr << "Something went wrong factoring! info = " << info << endl;
  }

  // Generate the real matrix Q with orthonormal columns. This has to be done
  // separately since the actual output of dgeqrf_(AA) represents the
  // orthogonal matrix Q as a product of AA_num_cols elementary Householder
  // reflectors:

  // To complete the recovery of the real QQ, we must re-inquire the new value
  // for lwork that is going to be used by the dormqr_ routine we will use.
  // Notice we leave the tau array alone... i.e. we only re-allocate the work
  // array.

  // Do NOT reallocate the tau array. It contains info from the factorization.
  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[1];
    memset(work, 0, sizeof(work[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  lwork = -1;

  dorgqr_(&AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld,
          tau, work, &lwork, &info);

  if (info == 0) {
    lwork = (int) work[0];
  } else {
    cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[lwork];
    memset(work, 0, sizeof(work[0])*(lwork));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  dorgqr_(&AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld,
          tau, work, &lwork, &info);

  if (!info) {
    if (VERBOSE) {
      cout << "Q matrix successfully attained!" << endl << endl;
      cout << "Q = " << endl;
      for (auto ii = 0; ii < AA_num_cols; ii++) {
        for (auto jj = 0; jj < AA_num_cols; jj++) {
          cout << setw(12) << AA[idx(ii,AA_num_cols,jj)];
        }
        cout << endl;
      }
      cout << endl;
    }
  } else {
    cerr << "Something went wrong building QQ! info = " << info << endl;
  }

  // Garbage collection :)
  if (gg != nullptr) {
    delete[] gg;
    gg = nullptr;
  }
  if (AA != nullptr) {
    delete[] AA;
    AA = nullptr;
  }
  if (tau != nullptr) {
    delete[] tau;
    tau = nullptr;
  }
  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
}



Thanks in advanced!
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby Julien Langou » Thu May 15, 2014 3:51 pm

(1) If A is 9-by-12, then Q is 9-by-9 as well (and R is 9-by-12). I do not think your call to DORGQR is not valid. Needs to be: &AA_num_rows for the first three arguments.

(2) You can get Q by using DORMQR on the identity. This works. Two advantages of using DORGQR: (1) less flops, much less actually, and (2) you do not need to allocate another matrix and DORGQR writes into the A output from DGEQRF (if appropriate).

(3) "As far as what I could read regarding DORGQR, you still have to compute the factorization using dgeqrf_," => Sure. My "right away" meant no need to allocate a Q, initialize a Q. You can call DORQR right after DGEQRF.

Julien.
Julien Langou
 
Posts: 821
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby ejspeiro » Thu May 15, 2014 3:55 pm

Dear Julien,

Fair enough :)

Thanks a lot! I will work on it and let everyone know :)

I hold a separate thread, regarding tracking bugs on a much larger code, using this chunk. I use the LAPACK extensively :D

http://www.linuxquestions.org/questions ... 175504890/

Thanks for all your help! :D
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby ejspeiro » Fri May 16, 2014 12:47 pm

Well, there's a key detail I forgot to mention: The matrix I intend to factorize has to be transposed, since I intend to compute its null-space, as I have already explained on a previopus thread I started a while back: viewtopic.php?f=5&t=4506

Based on that, I think I'm almost there:

Here's the C++ minimal working example (mwe) that construct the transpose of the Vandermonde matrix and factorizes it:

Code: Select all
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstring>

#define VERBOSE 1 // Should I verbose the execution?

using namespace std;

extern "C" {

// Double precision General QR Factorization to compute the null-space:
// Theory: http://www.netlib.org/lapack/lug/node69.html
// http://www.math.utah.edu/software/lapack/lapack-d/dgeqrf.html
void dgeqrf_(int *m,        // 1. Number of rows of the matrix.
              int *n,       // 2. Number of columns of the matrix.
              double *a,    // 3. The matrix.
              int *lda,     // 4. Leading dimension of the matrix.
              double *tau,  // 5. Scalar factors of the elementary reflectors.
              double *work, // 6. Workspace. info = 0, work(1) is optimal lwork.
              int *lwork,   // 7. The dimension of work. lwork >= max(1,n).
              int *info);   // 8. info = 0: successful exit.

// dorgqr_ generates an M-by-N real matrix Q with orthonormal columns,
// which is defined as the first N columns of a product of K elementary
// reflectors of order M
//
//       Q  =  H(1) H(2) . . . H(k)
//
// as returned by DGEQRF.
// http://www.math.utah.edu/software/lapack/lapack-d/dorgqr.html
void dorgqr_(int *m,        // 1. The number of rows of the matrix Q. M >= 0.
              int *n,       // 2. The number of cols of Q. M >= N >= 0.
              int *k,       // 3. Number of elementary reflectors.
              double *a,    // 4. Returned by DGEQRF first k cols of argument A.
              int *lda,     // 5. First dimension of A. LDA >= max(1,M).
              double *tau,  // 6. TAU(i) is the scalar of reflector H(i).
              double *work, // 7. Workspace. info = 0, work(1) is optimal lwork.
              int *lwork,   // 8. The dimension of work. lwork >= max(1,n).
              int *info);   // 9. info = 0: successful exit.
}

// Used to compute a 1D index (idx) from a 2D set of indexes.
// WARNING: Not needed if we allocate the matrices so that they are contiguous
// in memory.
// TODO: Implement the MTK_DenseMatrix class with this constructor, so that we
// do not have to use this routine.
inline int idx(const int ii, const int offset, const int jj) {

  return ii*offset + jj;
}

// Def. In linear algebra, a VANDERMONDE MATRIX is a matrix with the terms of a
// geometric progression in each row. This progression uses the terms of a
// given generator vector.

// Generates a Vandermonde matrix with a given generator vector gen.
double* Vandermonde(const double *gen,      // Given generator vector.
                    const int gen_length,   // Length of the generator vector.
                    const int pro_length,   // Length of the progression.
                    const bool transpose) { // Should create its transpose?

  double* TT {};  // Output Vandermonde matrix.

  // Check for the integrity of the arguments:
  if (gen == nullptr || gen_length < 1 || pro_length < 1) {
    return nullptr;
  }

  try {
    TT = new double[gen_length*pro_length];
    memset(TT, 0.0, sizeof(TT[0])*gen_length*pro_length);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 3 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  if (!transpose) {
    for (auto ii = 0; ii < gen_length; ii++) {
      for (auto jj = 0; jj < pro_length; jj++) {
        TT[idx(ii,pro_length,jj)] = pow(gen[ii], (double) jj);
      }
    }
  } else {
    for (auto ii = 0; ii < pro_length; ii++) {
      for (auto jj = 0; jj < gen_length; jj++) {
        TT[idx(ii,gen_length,jj)] = pow(gen[jj], (double) ii);
      }
    }
  }

  return TT;
}

// Prints a dense matrix.
bool MTK_DenseMatrix_Print(const double* aa,      // The matrix.
                           const int num_rows,    // Number of rows.
                           const int num_cols) {  // Number of columns.

  if (aa == nullptr || num_rows < 1 || num_cols < 1) {
    cerr << "ERROR printing matrix of line " << __LINE__ << endl;
    cerr << "Exiting..." << endl;
    return false;
  }

  for (auto ii = 0; ii < num_rows; ii++) {
    for (auto jj = 0; jj < num_cols; jj++) {
      cout << setw(12) << aa[idx(ii,num_cols,jj)];
    }
    cout << endl;
  }
  cout << endl;
  return true;
}

int main () {

  // Input:
  int kk {8}; // Order of numerical accuracy of the operator.

  // Variables:
  double* gg {};    // Spatial coordinates for the boundary points.
  double* AA {};    // Boundary Vandermonde to get the null-space.
  double* tau {};   // The scalar factors of elementary reflectors for QR.
  double* work {};  // Work array for dgels_ solver and dgeqrf_ QR factor.

  int info {0};         // Information regarding solution of the system.
  int num_bndy_approxs; // Required boundary points for uniform order accuracy.
  int AA_num_rows;      // Number rows for the boundary Vandermonde matrices.
  int AA_num_cols;      // Number cols for the boundary Vandermonde matrices.
  int AA_ld;            // Leading dimension of the boundary Vandermonde matrix.
  int lwork {-1};       // Length of work array for the dgels_ solver.

  // Compute amount of boundary points required to achieve uniform accuracy:
  num_bndy_approxs = (int) (3.0*((double) kk)/2.0);

  // Generator vector for the next approximation:
  try {
    gg = new double[num_bndy_approxs];
    memset(gg, 0, sizeof(gg[0])*num_bndy_approxs);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  gg[0] = -1.0/2.0;
  for (auto ii = 1; ii < num_bndy_approxs; ii++) {
    gg[ii] = gg[ii - 1] + 1.0;
  }

  if (VERBOSE) {
    cout << "gg =" << endl;
    for (auto ii = 0; ii < num_bndy_approxs; ii++) {
      cout << setw(12) << gg[ii];
    }
    cout << endl << endl;
  }

  // Before solving this and any other systems for the boundary matrices, we
  // must obtain the null-space from the first matrix. For this we will follow
  // recommendations given in:
  //
  // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506
  //
  // We will compute the QR Factorization of the transpose, as in the
  // following (MATLAB) pseudo-code:
  //
  // [Q,R] = qr(A'); % Full QR as defined in
  // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf
  // null-space = Q(:, last (kk/2 - 1) columns of Q);
  //
  // However, given the nature of the Vandermonde matrices we will compute--
  // we will compute dim_null of them--they will all posses the same
  // null-space. Therefore, we impose the convention of computing the
  // null-space of the first Vandermonde matrix only (for the west boundary).

  // Vandermonde matrix (using boundary coordinates as the generator):
  // We compute it NOT transposed, which is the ACTUAL transpose for
  // LAPACK's QR factor.

  // Create the Vandermonde matrix for this iteration:
  AA = Vandermonde(gg, num_bndy_approxs, kk + 1, true);
  if (AA == nullptr) {
    cerr << "ERROR constructing matrix at line " << __LINE__ - 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  AA_num_rows = kk + 1;
  AA_num_cols = num_bndy_approxs;
  AA_ld = AA_num_cols;

  if (VERBOSE) {
    // Although it is generated transposed, we print it non-transposed:
    cout << "AA =" << endl;
    if (!MTK_DenseMatrix_Print(AA, AA_num_rows, AA_num_cols)) {
      return EXIT_FAILURE;
    }
  }

  // Prepare to factorize: allocate and inquire for the value of lwork:
  try {
    tau = new double[AA_num_cols];
    memset(tau, 0, sizeof(tau[0])*AA_num_cols);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  try {
    work = new double[1];
    memset(work, 0, sizeof(work[0])*1);
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  info = 0;

  dgeqrf_(&AA_num_rows, &AA_num_cols, AA, &AA_ld, tau, work, &lwork, &info);

  if (info == 0) {
    lwork = (int) work[0];
  } else {
    cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl;
    cerr << "Exiting..." << endl;
    return EXIT_FAILURE;
  }

  // Once we know lwork, we can actually invoke the factorization:

  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
  try {
    work = new double[lwork];
    memset(work, 0, sizeof(work[0])*(lwork));
  } catch (bad_alloc &memory_allocation_exception) {
    cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl;
    cerr << memory_allocation_exception.what() << endl;
  }

  dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info);

  if (!info) {
    if (VERBOSE) {
      cout << "QR factorization successfully completed!" << endl << endl;
    }
  } else {
    cerr << "Something went wrong factoring! info = " << info << endl;
  }

  if (VERBOSE) {
    cout << "AA after QR (elementary reflectors of order M) =" << endl;
    for (auto ii = 0; ii < AA_num_rows; ii++) {
      for (auto jj = 0; jj < AA_num_cols; jj++) {
        cout << setw(12) << AA[idx(ii,AA_num_cols,jj)];
      }
      cout << endl;
    }
    cout << endl;
  }

  // Garbage collection :)
  if (gg != nullptr) {
    delete[] gg;
    gg = nullptr;
  }
  if (AA != nullptr) {
    delete[] AA;
    AA = nullptr;
  }
  if (tau != nullptr) {
    delete[] tau;
    tau = nullptr;
  }
  if (work != nullptr) {
    delete[] work;
    work = nullptr;
  }
}



It can be built as follows:

Code: Select all
ejspeiro@Eduardo-Alienware-14:~/Dropbox/mimops$ make
g++ -std=c++11 -c -g -Wall mwe3.cc
gfortran -o mwe3 mwe3.o ~/Libraries/lapack-3.4.1/liblapack.a ~/Libraries/BLAS/libblas.a -lstdc++


And outputs as follows:
Code: Select all
gg =
        -0.5         0.5         1.5         2.5         3.5         4.5         5.5         6.5         7.5         8.5         9.5        10.5

AA =
           1           1           1           1           1           1           1           1           1           1           1           1
        -0.5         0.5         1.5         2.5         3.5         4.5         5.5         6.5         7.5         8.5         9.5        10.5
        0.25        0.25        2.25        6.25       12.25       20.25       30.25       42.25       56.25       72.25       90.25      110.25
      -0.125       0.125       3.375      15.625      42.875      91.125     166.375     274.625     421.875     614.125     857.375     1157.62
      0.0625      0.0625      5.0625     39.0625     150.062     410.062     915.062     1785.06     3164.06     5220.06     8145.06     12155.1
    -0.03125     0.03125     7.59375     97.6562     525.219     1845.28     5032.84     11602.9     23730.5     44370.5     77378.1      127628
    0.015625    0.015625     11.3906     244.141     1838.27     8303.77     27680.6     75418.9      177979      377150      735092  1.3401e+06
  -0.0078125   0.0078125     17.0859     610.352     6433.93     37366.9      152244      490223 1.33484e+06 3.20577e+06 6.98337e+06  1.4071e+07
  0.00390625  0.00390625     25.6289     1525.88     22518.8      168151      837339 3.18645e+06 1.00113e+07 2.72491e+07  6.6342e+07 1.47746e+08

QR factorization successfully completed!

AA after QR (elementary reflectors of order M) =
     -3.4641    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009    0.224009
    -17.3205     11.9583     0.14895   0.0832741   0.0175979  -0.0480783   -0.113755   -0.179431   -0.245107   -0.310783   -0.376459   -0.442136
    -127.883     119.583     36.5331    0.247348     0.33828    0.380356    0.373575    0.317938    0.213444   0.0600931   -0.142114   -0.393178
    -1052.22     1150.98     547.996    -107.624    0.177247   0.0484448  -0.0874815    -0.18914   -0.215138   -0.124085    0.125414    0.574749
    -9234.36     11061.4     6573.35    -2152.49    -306.814    0.303125    0.268976   0.0738031   -0.214325    -0.46264   -0.473674   0.0147402
    -84356.3      107266     73000.9    -31001.8    -7670.35     840.952   -0.272228   -0.415334   -0.255461    0.126994    0.285204   -0.732966
     -792070 1.05056e+06      784038     -391932     -129036     25228.6     2192.48    0.155459  -0.0739444 -0.00699654    0.408327    0.334625
-7.58693e+06 1.03823e+07  8.2843e+06-4.63725e+06-1.83164e+06      485536     76736.9      5356.1   -0.510053   -0.761635   0.0786269   -0.345146
 -7.3775e+07 1.03411e+08 8.68011e+07-5.28129e+07-2.37088e+07 7.64813e+06 1.66263e+06      214244     12000.1    0.153881    0.286217   -0.143035


According to Valgrind's memcheck, this mwe is bug free :)

Code: Select all
(...)
==23068== HEAP SUMMARY:
==23068==     in use at exit: 0 bytes in 0 blocks
==23068==   total heap usage: 26 allocs, 26 frees, 32,285 bytes allocated
==23068==
==23068== All heap blocks were freed -- no leaks are possible
==23068==
==23068== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==23068== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


What is my goal? My goal is to get THIS Q:

Code: Select all
Q =
  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675  -0.288675
  -0.459933  -0.376309  -0.292685  -0.209061  -0.125436 -0.0418121  0.0418121   0.125436   0.209061   0.292685   0.376309   0.459933
   0.501828   0.228104 0.00912415  -0.155111    -0.2646  -0.319345  -0.319345    -0.2646  -0.155111 0.00912415   0.228104   0.501828
   0.459933 -0.0418121  -0.292685  -0.348434   -0.26481 -0.0975616  0.0975616    0.26481   0.348434   0.292685  0.0418121  -0.459933
  -0.368767   0.301718   0.368767   0.145272  -0.134097  -0.312893  -0.312893  -0.134097   0.145272   0.368767   0.301718  -0.368767
  -0.261608   0.451869   0.166478  -0.229898  -0.348811   -0.15855    0.15855   0.348811   0.229898  -0.166478  -0.451869   0.261608
   0.164197  -0.462738   0.164197   0.373176  0.0597081  -0.298541  -0.298541  0.0597081   0.373176   0.164197  -0.462738   0.164197
 -0.0904791   0.370142  -0.412914  -0.136541   0.335595   0.230311  -0.230311  -0.335595   0.136541   0.412914  -0.370142  0.0904791
  0.0430767   -0.23888   0.466011  -0.254544  -0.289789   0.274124   0.274124  -0.289789  -0.254544   0.466011   -0.23888  0.0430767
-0.00874036     0.0583  -0.146954   0.141249  0.0486822  -0.156543  -0.130333   0.566054  -0.653529   0.386018  -0.119774  0.0155708
  0.0117399 -0.0941426   0.322481  -0.603006   0.637649  -0.321721 -0.0245471   0.104354 -0.0153304 -0.0339244  0.0199423-0.00349546
  0.0106414 -0.0748491   0.203312  -0.217426  -0.109592   0.578839   -0.66744    0.29945  0.0592101  -0.129478  0.0558632-0.00853182


Which is 12 by 12, because I need to factorize the tranpose of a 9 by 12 matrix A.

How in the world can I recover Q? :( Well, so far we have discussed two options:
  • dorMqr
  • dorGqr

Using dorMqr, I get the Q I need, but I also get the memory error that originated this thread u.u'

I will keep trying... I will post the mwes to recover Q...

Thanks!
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA

Re: Valgrind invalid read/write when using the LAPACK from C

Postby ejspeiro » Mon May 19, 2014 11:40 am

Well, I think I have successfully solved this problem.

I do not know what the problem was... BUT I basically revisited the routine to construct the Vandermonde matrices, while keeping a thorough differentiation of the matrices and their transpose.

I am calling this thread as solved 8)

If anybody is interested in how does the code looks like, please let me know.

The Q matrix (as discussed in previous threads), was used to attain the kernel. Once the kernel was attained, I had to scale it. Results look like this:

Image

Thanks to everybody involved on this thread :)

\m/
ejspeiro
 
Posts: 18
Joined: Wed Sep 26, 2012 11:30 am
Location: San Diego, CA


Return to Algorithm / Data

Who is online

Users browsing this forum: No registered users and 2 guests