gpu programming - Errors in Polynomial fitting problem on CUDA -

- June 15, 2012

I tried to use CUDA to make some simple loops on the device, but it seems difficult to understand Qyda is. I'm getting 0 from every function call, when I use the CUDA kernel function with the normal C code. Original code:

  Dual evaluation (int de, double tmp [], tall * enfwel) * Polynomial fitting problem * / int i, j; Intint M = 60; Double px, x = -1, dx = (double) m, result = 0; (* Nfeval) ++; DX = 2 / dx; (I = 0; i & lt; = M; i ++) for {px = tmp [0]; For (J = 1; J & L; D, J ++) {PX = X * PX + TMP [J]; } If (px and lt; -1; px & gt; 1) result + = (1-px) * (1-px); X + = dx; } Px = tmp [0]; For (J = 1; J & L; D; J ++) PX = 1.2 * PX + TMP [J]; Px = px-72.661; If (px <0) result + px * px; Px = tmp [0]; For (j = 1; j & l; d; j ++) px = -1.2 * px + tmp [ja]; Px = px-72.661; If (px <0) result + px * px; Return result; }    I first wanted to loop on CUDA:  
  Double evaluation _guppu (int de, double tmp [], tall * napvel) {/ * Multilateral fitting problem * / et j; Intint M = 60; Double px, dx = (double) m, result = 0; (* Nfeval) ++; DX = 2 / dx; Int N = M; Double * device_tmp = NULL; Size_t size_tmp = sizeof tmp; CudaMalloc ((double **) and device_tmp, size_tmp); CudaMemcpy (device_tmp, tmp, size_tmp, cudaMemcpyHostToDevice); Int block_size = 4; Int n_blocks = N / block_size + (n% block_size == 0? 0: 1); CEvaluate & lt; & Lt; & Lt; N_blocks, block_size & gt; & Gt; & Gt; (Device_tmp, result, D); // cudaMemcpy (Results, Results, Dimensions_groups, cudaMemcpyDeviceToHost); Px = tmp [0]; For (J = 1; J & L; D; J ++) PX = 1.2 * PX + TMP [J]; Px = px-72.661; If (px <0) result + px * px; Px = tmp [0]; For (j = 1; j & l; d; j ++) px = -1.2 * px + tmp [ja]; Px = px-72.661; If (px <0) result + px * px; Return result; }    where device function looks:  
  __global__ zero cEvaluate_temp (double * tmp, double result, int d) {int M = 60; Double px; Double x = -1; Double dx = (double) m; Int j; DX = 2 / dx; Int idx = blockIdx.x * blockDim.x + threadIdx.x; If (idx & lt; 60) // & lt; ==> If (idx  {px = tmp [0]; For (J = 1; J & L; D, J ++) {PX = X * PX + TMP [J]; } If (px and lt; -1; px & gt; 1) {__ irrigation (); Results + = (1-pixel) * (1-pixels); // + =} x + = dx; }}    I know that I have not specified this problem, but it appears that I have more than one.  
 I do not know when to copy the variable device, and when it will be copied 'automatically' Now, I am using CUDA 3.2 and have problems with emulation (I printf I would like to use), when I make NVCC to em = 1, then when I use printf, there is no error, but I do not even get any output  
 is the simplest version of the device function , I have tested. Can anyone explain what will happen with the result value after the increase in parallel? I think I should use device sharing memory and synchronization to do STH like "+ =".  
  __global__ zero cEvaluate (double * tmp, double result, int d) {int idx = blockIdx.x * blockDim.x + threadIdx.x; If (idx & lt; 60) // & lt; ==> If (idx & lt; m) {Results + = 1; Printf ("res =% f", result); // - deviceemu, emu = 1}}     
  no, variable  result  is not shared on many threads.  
 What I recommend is that the result of the shared memory is to have a matrix of values, a result for each thread, calculate each value and reduce it, a single value.  
  __global__ zero cEvaluate_temp (double * tmp, double * global_view, int d) {int M = 60; Double px; Double x = -1; Double dx = (double) m; Int j; DX = 2 / dx; Int idx = blockIdx.x * blockDim.x + threadIdx.x; __shared___ Share_Salt [Blocks]; Return (IDX> = 60); Px = tmp [0]; For (J = 1; J & L; D, J ++) {PX = X * PX + TMP [J]; } If (px and lt; -1; px & gt; 1) {result [thread idx] + = (1-px) * (1-px); } X + = dx; } __syncthreads (); If (threadIdx.x == 0) {total_result = 0 (IDX in blocks) {total_result + = result [idx]; } Global_result [0] = Total_Loc; }    In addition to this you need cudaMemcpy after the Kadal Orientation. Kernel is asynchronous and requires a sync function.  
 Also use the error check function on  each  CUDA API invocation.   

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




qt - switch/case statement in C++ with a QString type -



-



July 15, 2014








    I want to use switch-case in my program but the compiler generates an error. How can I use the  switch  statement with  QString ?   The compiler gives me this error:    The expression of the switch type 'QString' is invalid    My code is as follows :    boolStopWord (the word QString) {bool flag = false; Switch (word) {case "the": flag = true; break ; "On" case: flag = true; break ; Case "in": flag = true; break ; Case "your": flag = true; break ; Case "near": flag = true; break ; Case "all": flag = true; break ; Case "this": flag = true; break ; } Return flag; }        How do I use the switch statement with a Caststring ?    You can not use the  switch  statement in C ++ language with integral or enum types. You can formally enter the square type of object in the  switch  statement, but this means that the compiler will look for a user-defined conversion to convert it to an integral or enum type.    





Read more





python - sqlite3.OperationalError: near "REFERENCES": syntax error -
foreign key creating -



-



September 15, 2010








    I'm trying to create a predefined key after checking the manual and I wrote it:   Db.execute ("create table" Table 2 (id, integer primary key, somedata integer) "db.execute" (table creation table 2 does not exist (name text, maintain reference (id) ")    and found this:    sqlite 3. Operation error: near" references ": syntax error              Your second ligne is wrong    db.execute ("create table if table does not exist 2 (name text, my_id integer, foreign key (my_id) reference Maintenance (ID)) ")    as explained    





Read more





Python's equivalent for Ruby's define_method? -



-



April 15, 2010








    What is a Python equivalent for Ruby's  define_method , which is the dynamic generation of class methods will allow? (As can be seen in Wikipedia)      Functions are first-class objects in Python and assign them The properties of a class or an example can be a way of doing the same thing as a Wikipedia example:    colors = {"black": "000", "red": "F00", "green": "0f0", "yellow": "ff0", "blue": "00f", "magenta": "fff", "cyan": "0ff", "white": " FFF "} Class MyStream (ARR): Pass for Name, Code in colors.iteritems (): def _in_colour ( W code = code): return '& lt; Span style = "color:% s" & gt;% s & lt; / Span & gt; ' % (Code, self) setter (histrive, "in_" + name, _in_ color)     





Read more

Search This Blog

T C SPAIN

gpu programming - Errors in Polynomial fitting problem on CUDA -

Comments

Post a Comment

Popular posts from this blog

qt - switch/case statement in C++ with a QString type -

python - sqlite3.OperationalError: near "REFERENCES": syntax error - foreign key creating -

Python's equivalent for Ruby's define_method? -