c++ - OpenACC parallel kernels not getting generated -
i developing code on pgc++ graphically accelerating code.
- i using openbabel has eigen dependancy.
- i have tried using #pragma acc kernel
- i have tried using #pragma acc routine
- my compilation command is: "pgc++ -acc -ta=tesla -minfo=all -i/home/pranav/new_installed/include/openbabel-2.0/ -i/home/pranav/new_installed/include/eigen3/ -l/home/pranav/new_installed/lib/openbabel/ main.cpp /home/pranav/new_installed/lib/libopenbabel.so"
i getting following error
pgcc-s-0155-procedures called in compute region must have acc routine information: openbabel::obmol::settorsion(openbabel::obatom *, openbabel::obatom *, openbabel::obatom *, openbabel::obatom *, double) (main.cpp: 66) pgcc-s-0155-accelerator region ignored; see -minfo messages (main.cpp) bondrot::two(std::vector>, openbabel::obmol, int, openbabel::obmol): 11, include "bondrot.h" 0, accelerator region ignored 66, accelerator restriction: call 'openbabel::obmol::settorsion(openbabel::obatom *, openbabel::obatom *, openbabel::obatom *, openbabel::obatom *, double)' no acc routine information pgcc/x86 linux 15.10-0: compilation completed severe errors
note: line 66 "mol.settorsion(a[0],a[1],a[2],a[3],i*(3.14159265358979323846/180));" in pasted bode below.
my code showing error follows:
#pragma acc routine public:bool two(vector<obatom *> a) { std::ostringstream bestanglei,bestanglej; for(unsigned int i=0;i<=360;i=i+res) { for(unsigned int j=0;j<=360;j=j+res) { mol.settorsion(a[0],a[1],a[2],a[3],i*(3.14159265358979323846/180)); //cout<<i<<"\n"; } } return true; }
from primary search on google, got idea error occurring because of "back dependency" of mol(obmol object). if knows solution please me out.
in order call routine within device code, must available device version of routine. in case, compiler can't find 1 "openbabel::obmol::settorsion" routine. you'll need add "#pragma acc routine" directive in library routine's prototype , definition, compile library pgi , "-acc". routines settorsion might call need device versions well.
alternatively, can try inline these routines.
note have issues trying write i/o stream , files device code. limited support unformatted stdout available output threads buffered, transferred host, , printed os.
you'll have issues using stl::vector. besides not being thread safe, aggregate data types dynamic data members not yet supported in openacc. there ways handle these structures if you're willing manage data in structure itself, or use cuda unified memory (-ta=tesla:managed). if you're interested, gave talk on subject @ gtc2015 can review at: https://www.youtube.com/watch?v=rwlmzt_u5u4
hope helps, mat
Comments
Post a Comment