Óbuda University - CUDA Teaching Center

List of CUDA related publications

Title Parallel biomedical image processing with GPGPU-s in cancer research
Authors A. Reményi, S. Szénási, I. Bándi, Z. Vámossy, G. Valcz, P. Bogdanov, S. Sergyán, and M. Kozlovszky
Reference as A. Reményi, S. Szénási, I. Bándi, Z. Vámossy, G. Valcz, P. Bogdanov, S. Sergyán, and M. Kozlovszky, "Parallel biomedical image processing with GPGPU-s in cancer research," in 3rd IEEE International Symposium on Logistics and Industrial Informatics (LINDI 2011), Budapest, 2011, pp. 245-248.
Abstract The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) stained colon tissue sample images. Since pathologists are working with large number of high resolution images - thus require significant storage space -, one feasible way to achieve reasonable processing time is the usage of GPGPUs. The CUDA software development kit was used to develop processing algorithms to NVIDIA type GPUs. Our work focuses on how to achieve better performance with coalesced global memory access when working with three-channel RGB tissue images, and how to use the on-die shared memory efficiently.
Download Download full paper as pdf or from IEEE Xplore.

Title GPGPU-based data parallel region growing algorithm for cell nuclei detection
Authors S. Szénási, Z. Vámossy, and M. Kozlovszky
Reference as S. Szénási, Z. Vámossy, and M. Kozlovszky, "GPGPU-based data parallel region growing algorithm for cell nuclei detection," in Computational Intelligence and Informatics (CINTI), 2011 IEEE 12th International Symposium on, Budapest, 2011, pp. 493-499.
Abstract Nowadays microscopic analysis of tissue samples is done more and more by using digital imagery and special immunodiagnostic software. These are typically specific applications developed for one distinct field, but some subroutines are commonly repeated, for example several applications contain steps that can detect cell nuclei in a sample image. The aim of our research is developing a new data parallel algorithm that can be implemented even in a GPGPU environment and that is capable of counting hematoxylin eosin (HE) stained cell nuclei and of identifying their exact locations and sizes (using a variation of the region growing method). Our presentation contains the detailed description of the algorithm, the peculiarity of the CUDA implementation, and the evaluation of the created application (regarding its accuracy and the decrease in the execution time).
Download Download full paper as pdf or from IEEE Xplore.

Title GPGPU - Új eszközök az információbiztonság területén
Authors S. Szénási
Reference as S. Szénási, "GPGPU - Új eszközök az információbiztonság területén," in HADMÉRNÖK, vol. 2009, no. 4, pp. 362-373, 2009.
Abstract In the last few years, the general trend in processor development has been to increase the number of processing units. This means not only to introduce multi-core processors, but also to develop patent architectures like GPGPUs (General Purpose computation on Graphics Processing Units). The cost/benefit ratio of these devices is outstanding, but their peak performance can be achieved by only properly designed and implemented algorithms. So in practice this performance usually not (or not fully) achievable, this is the main topic of this article.
Download Download full paper as pdf.

Title Preparing initial population of genetic algorithm for region growing parameter optimization
Authors S. Szénási, Z. Vámossy, and M. Kozlovszky
Reference as S. Szénási, Z. Vámossy, and M. Kozlovszky, "Preparing initial population of genetic algorithm for region growing parameter optimization," in 4th IEEE International Symposium on Logistics and Industrial Informatics, Smolenice, 2012, pp. 47-54.
Abstract The processing of microscopic tissue images is nowadays done more and more using special immunodiagnostic-evaluation software products. Often to evaluate the samples, the first step is determining the number and location of cell nuclei. To do this, one of the most promising methods is the region growing, but this algorithm is very sensitive to the appropriate setting of different parameters. Due to the large number of parameters and due to the big set of possible values setting those parameters manually is a quite hard task, so we developed a genetic algorithm to optimize these values. The first step of the development is the statistical analysis of the parameters, and the determination of the important features, to extract valuable information for a to-be-implemented genetic algorithm that will perform the optimization. © 2012 IEEE.
Download Download full paper as pdf or from IEEE Xplore.

Title Evaluation and comparison of cell nuclei detection algorithms
Authors S. Szénási, Z. Vámossy, and M. Kozlovszky
Reference as S. Szénási, Z. Vámossy, and M. Kozlovszky, "Evaluation and comparison of cell nuclei detection algorithms," in Proceedings of the 16th IEEE Conference International Conference on Intelligent Engineering System 2012, Lisszabon, 2012, pp. 469-475.
Abstract The processing of microscopic tissue images and especially the detection of cell nuclei is nowadays done more and more using digital imagery and special immunodiagnostic software products. Since several methods (and applications) were developed for the same purpose, it is important to have a measuring number to determine which one is more efficient than the others. The purpose of the article is to develop a generally usable measurement number that is based on the “gold standard” tests used in the field of medicine and that can be used to perform an evaluation using any of image segmentation algorithms. Since interpreting the results themselves can be a pretty time consuming task, the article also contains a recommendation for the efficient implementation and a simple example to compare three algorithms used for cell nuclei detection.
Download Download full paper as pdf or from IEEE Xplore.

Title Implementation of a Distributed Genetic Algorithm for Parameter Optimization in a Cell Nuclei Detection Project
Authors S. Szénási, and Z. Vámossy
Reference as S. Szénási, and Z. Vámossy, "Implementation of a Distributed Genetic Algorithm for Parameter Optimization in a Cell Nuclei Detection Project," in ACTA POLYTECHNICA HUNGARICA, vol. 10, no. 4, pp. 59-86, 2013.
Abstract The processing of microscopic tissue images and especially the detection of cell nuclei is nowadays done more and more using digital imagery and special immunodiagnostic software products. One of the most promising image segmentation method s is region growing, but this algorithm is very sensitive to the appropriate setting of different parameters , and the long runtime due to its high computing demand reduces its practical usability. As a result of our research , we managed to develop a data - p arallel region growing algorithm that is two or three times faster than the original sequential version . The paper summarizes o ur results : the development of an evolution - based algorithm that was used to successfully determine a set of parameters that could be used to achieve significantly better accuracy than the already existing parameters.
Download Download full paper as pdf.

Title Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation
Authors S. Szénási, and Z. Vámossy
Reference as S. Szénási, and Z. Vámossy, "Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation," in ACTA POLYTECHNICA HUNGARICA, vol. 10, no. 5, pp. 7-28, 2013.
Abstract The use of digital microscopy allows diagnosis through automated quantitative and qualitative analysis of the digital images. Often to evaluate the samples, the first step is determining the number and location of cell nuclei. For this purpose, we have developed a GPGPU based data-parallel region growing algorithm that is equally as accurate as the already existing sequential versions, but its speed is two or three times faster (implementing in CUDA environment), but this algorithm is very sensitive to the appropriate setting of different parameters. Due to the large number of parameters and due to the big set of possible values setting those parameters manually is a quite hard task, so we have developed a genetic algorithm to optimize these values. Our evolution-based algorithm that is described in this paper was used to successfully determine a set of parameters that compared to the results with the previously known best set of parameters means a significantly improvement.
Download Download full paper as pdf.

Title Adatpárhuzamos sejtmagkeresési eljárás fejlesztése és paramétereinek optimalizálása
Authors S. Szénási
Reference as S. Szénási, "Adatpárhuzamos sejtmagkeresési eljárás fejlesztése és paramétereinek optimalizálása," , 2013, pp. 1-125.
Download Download full paper as pdf.

Title Medical Image Segmentation with Split-and-Merge Method
Authors S. Szénási
Reference as S. Szénási, "Medical Image Segmentation with Split-and-Merge Method," in LINDI 2013, Wildau, 2013, pp. 137-140.
Abstract The processing of microscopic tissue images and especially the detection of cell nuclei is nowadays done more and more using digital imagery and special immunodiagnostic software products. One of the most promising methods is region growing but it is quite memory intensive. The size of high-resolution tissue images can easily reach the order of a hundred megabytes therefore the memory requirement for the region growing is more than one gigabyte. To provide the execution in low-end clients we have to split the whole image into smaller tiles and after the processing of each individual tiles we have to merge the results.
Download Download full paper as pdf.

Title Genetic Algorithm for Parameter Optimization of Image Segmentation Algorithm
Authors S. Szénási
Reference as S. Szénási, "Genetic Algorithm for Parameter Optimization of Image Segmentation Algorithm," in CINTI 2013 : Proceeding of the 14th IEEE International Symposium on Computational Intelligence and Informatics, Budapest, 2013, pp. 351-354.
Abstract In the current practice of medicine, histopathological examinations are some of the most important tools for clinical diagnoses of a large group of diseases. To help pathologists and to reduce the subjectivity level, it has been proposed that computer-aided procedures be used to provide objective results. The first step of these procedures is the segmentation of the tissue image. In our research, we try to detect nuclei, glands and surface epithelium in Haematoxylin and Eosin (HE) stained colon tissue samples. This paper focuses on the identification of epithelial cell nuclei.
Download Download full paper as pdf.

Title Distributed Implementations of Cell Nuclei Detection Algorithm
Authors S. Szénási
Reference as S. Szénási, "Distributed Implementations of Cell Nuclei Detection Algorithm," in Recent Advances in Image, Audio and Signal Processing, Budapest, 2013, pp. 105-109.
Abstract Signal processing plays an important role in the work of pathologists; it is especially true for image processing software products. High-resolution digital images have taken over the role of traditional tissue slides on a glass plate. In addition to the direct effects of this advancement (sharing images, remote access, etc.), a new option appeared: the possibility of using image processing software for automatic (or semi-automatic) diagnostics. One of the most important tasks in this procedure is the segmentation of the tissue images; we have to identify the main components (in the case of colon tissue samples, these are the cell nuclei, glands and surface epithelium). There are several traditional image segmentation methods for this purpose, but none of them provides both acceptable accuracy and runtime. This paper presents a distributed region growing method implemented on CPUs and GPGPUs.
Download Download full paper as pdf.

Title A Grafikus Hardveren (GPGPU) Implementált Alkalmazások Sebezhetőségei
Authors S. Sergyán, S. Szénási, and Z. Vámossy
Reference as S. Sergyán, S. Szénási, and Z. Vámossy, "A Grafikus Hardveren (GPGPU) Implementált Alkalmazások Sebezhetőségei," in HADMÉRNÖK, vol. IX., no. 1, pp. 249-257, 2014.
Abstract Traditionally, graphics cards were responsible for the visualization of the content in the screen; however these devices have more and more tasks in the last few decades. The appearance of the first 3D accelerator cards changes the video adapter industry, in the next few years these new functions had been integrated into the GPUs (Graphical Processing Units).Nowadays GPGPU (General Purpose Graphical Processing Unit) programming becomes more and more general, especially in the field of High Performance Computing. In the first time, in case of games and initial research projects, data security was not an important factor. However nowadays, there are several GPGPU applications working with sensitive (personal, business, governmental) data. This paper deals with several questions of possible security holes and attack methods.
Download Download full paper as pdf.

Title Segmentation of colon tissue sample images using multiple graphics accelerators
Authors S. Szénási
Reference as S. Szénási, "Segmentation of colon tissue sample images using multiple graphics accelerators," in COMPUTERS IN BIOLOGY AND MEDICINE, vol. 51, pp. 93-103, 2014.
Abstract Nowadays, processing medical images is increasingly done through using digital imagery and custom software solutions. The distributed algorithm presented in this paper is used to detect special tissue parts, the nuclei on haematoxylin and eosin stained colon tissue sample images. The main aim of this work is the development of a new data-parallel region growing algorithm that can be implemented even in an environment using multiple video accelerators. This new method has three levels of parallelism: (a) the parallel region growing itself, (b) starting more region growing in the device, and (c) using more than one accelerator. We use the split-and-merge technique based on our already existing data-parallel cell nuclei segmentation algorithm extended with a fast, backtracking- based, non-overlapping cell filter method. This extension does not cause significant degradation of the accuracy; the results are practically the same as those of the original sequential region growing method. However, as expected, using more devices usually means that less time is needed to process the tissue image; in the case of the configuration of one central processing unit and two graphics cards, the average speed-up is about 4–6×. The implemented algorithm has the additional advantage of efficiently processing very large images with high memory requirements.
Download Download full paper as pdf.

Title Distributed region growing algorithm for medical image segmentation
Authors S. Szénási
Reference as S. Szénási, "Distributed region growing algorithm for medical image segmentation," in INTERNATIONAL JOURNAL OF CIRCUITS, SYSTEMS AND SIGNAL PROCESSING, vol. 8, pp. 173-181, 2014.
Abstract Signal processing plays an important role in the work of pathologists; it is especially true for image processing software products. High-resolution digital images have taken over the role of traditional tissue slides on a glass plate. In addition to the direct effects of this advancement (sharing images, remote access, etc.), a new option appeared: the possibility of using image processing software for automatic (or semi-automatic) diagnostics. One of the most important tasks in this procedure is the segmentation of the tissue images; we have to identify the main components (in the case of colon tissue samples, these are the cell nuclei, glands and surface epithelium). There are several traditional image segmentation methods for this purpose, but none of them provides both acceptable accuracy and runtime. This paper presents a distributed region growing method implemented on CPUs and GPGPUs.
Download Download full paper as pdf.

Title Optimizing General Purpose Computations Using Kepler Based Graphics Accelerators
Authors S. Szénási
Reference as S. Szénási, "Optimizing General Purpose Computations Using Kepler Based Graphics Accelerators," in International Masaryk Conference for PhD students and young researchers, Hradec Kralova, 2014, pp. 3354-3360.
Abstract The programming of GPUs (Graphics Processing Units) is ready for practical applications; the largest industry players (including research centres, financial and analyst corporations) have already announced that they use these new devices for high computing applications. There are several well-known areas, like image processing, simulations and obviously 3D graphics, where we can use these devices very efficiently. In this paper, we would like to show, that beyond these well-known topics, GPU programming is able to speed-up more general purpose applications. The key is the data parallel nature of the algorithm, and the minimization of data transfers between CPU and GPU.
Download Download full paper as pdf.

Title Solving Multiple Quartic Equations on the GPU using Ferrari's Method
Authors S. Szénási, and Á. Tóth
Reference as S. Szénási, and Á. Tóth, "Solving Multiple Quartic Equations on the GPU using Ferrari's Method," in SAMI 2015 • IEEE 13th International Symposium on Applied Machine Intelligence and Informatics, Herlany, 2015, pp. 333-337.
Abstract As known, quartics are the highest degree polynomials which can be solved analytically in general by the methods of radicals. There are several problems based on not only one but more equations independently, in case of simulations, the number of equations can be very high. For this reason, it is worth examining the runtime of the solver algorithms implemented for multi-core systems, especially graphics accelerators. In this paper, we discuss the runtime and numerical stability of the Ferrari’s method using GPUs. It is worth to port an application to the graphics card, if the number of calculations is relatively high and the number and volume of memory accesses is relatively small. Based on the results, it is clear, that running multiple equation solvers based on the given method is clearly meets these conditions.
Download Download full paper as pdf.

Title Solving One-dimensional IHCP with Particle Swarm Optimization using Graphics Accelerators
Authors S. Szénási, I. Felde, and I. Kovács
Reference as S. Szénási, I. Felde, and I. Kovács, "Solving One-dimensional IHCP with Particle Swarm Optimization using Graphics Accelerators," in 10th Jubilee IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI 2015), Timisoara, 2015, pp. 365-369.
Abstract There are several implicit and explicit formulations to solve the Inverse Heat Conduction Problem. One of the most promising methods is the Particle Swarm Optimization; however, it needs a long time to find solutions for large scale problems (large swarm populations). This paper presents the implementation and the evaluation of a parallel approach using graphics accelerators. This GPU implementation is about three times faster than the original CPU based method.
Download Download full paper as pdf.

Title Performance Measurement of a General Multi-Scale Template Matching Method
Authors G. Kertész, S. Szénási, and Z. Vámossy
Reference as G. Kertész, S. Szénási, and Z. Vámossy, "Performance Measurement of a General Multi-Scale Template Matching Method," in 19th IEEE International Conference on Intelligent Engineering Systems, Bratislava, 2015, pp. 153-158.
Abstract Generally, template matching methods are very sensible to image orientation, rotation or the size of the template. However, there are known occasions, when they perform better then other keypoint-based techniques. This article demonstrates a massively parallelizable method to multi-scaled template matching. By implementing it sequentially on CPU and on GPU with a naive approach, the runtimes can be measured during a test with multiple scale sizes. With the use of the GPU, the runtimes are 12-times lower. These results indicate that there is place for further research, and to develop a GPU associated solution to handle real-time video processing.
Download Download full paper as pdf or from IEEE Xplore.

Title GPU Implementation of DBSCAN Algorithm for Searching Multiple Accident Black Spots
Authors S. Szénási
Reference as S. Szénási, "GPU Implementation of DBSCAN Algorithm for Searching Multiple Accident Black Spots," in 15th International Multidisciplinary Scientific GeoConference (SGEM2015), Albena, 2015, pp. 647-652.
Abstract Nowadays, graphics cards are not only used for processing video data but also suitable for other general purpose operations too. We can use these graphics accelerators in the field of road accident black spot searching. There are several black spot searching methods and most of them are very time and resource consuming; therefore, it is worth implementing them as parallel GPU codes. This paper presents a naive implementation of the DBSCAN algorithm to speed-up accident black spot localization. We can run multiple DBSCAN searches from different points parallel to find clusters of accidents with high accident density.
Download Download full paper as pdf.

Title Determination of complex thermal boundary conditions using a Particle Swarm Optimization method
Authors I. Felde, S. Szénási, A. Kenéz, and R. Colas
Reference as I. Felde, S. Szénási, A. Kenéz, and R. Colas, "Determination of complex thermal boundary conditions using a Particle Swarm Optimization method," in Proceedings of 5th International Conference on Distortion Engineering 2015, Bremen, 2015, pp. 227-236.
Abstract The methodology based on the Particle Swarm Optimization (PSO) method, as a recent stochastic optimization technique to solve complex inverse heat transfer problems is outlined. Temporal and spatial dependent Heat Transfer coefficient obtained on the surfaces of a cylindrical work piece is recovered by solving the inverse heat conduction problem. The fitness function to be minimized by the PSO approach is defined by the deviation of the measurements and the calculated temperatures is minimized. The PSO algorithm has been parallelized and implemented on a GPU architecture. Numerical results are demonstrated that the determination of Heat Transfer Coefficient functions can be performed by using the PSO method, as well as, the GPU implementation; provide a less time consuming and accurate estimation.
Download Download full paper as pdf.

Title Parallelization Methods of the Template Matching Method on Graphics Accelerators
Authors G. Kertész, S. Szénási, and Z. Vámossy
Reference as G. Kertész, S. Szénási, and Z. Vámossy, "Parallelization Methods of the Template Matching Method on Graphics Accelerators," in 16th IEEE International Symposium on Computational Intelligence and Informatics, CINTI 2015, Budapest, 2015, pp. 161-164.
Abstract Template matching is a classic technique used in image processing for object detection. It is based on multiple matrix-based calculations, where there are no dependencies on partial results, so parallel solutions could be created. In this article two GPU implemented methods are presented and compared to the CPU-based sequential solution.
Download Download full paper as pdf or from IEEE Xplore.

Title Modified Particle Swarm Optimization Method to Solve One-dimensional IHCP
Authors S. Szénási, and I. Felde
Reference as S. Szénási, and I. Felde, "Modified Particle Swarm Optimization Method to Solve One-dimensional IHCP," in 16th IEEE International Symposium on Computational Intelligence and Informatics, CINTI 2015, Budapest, 2015, pp. 85-88.
Abstract Particle swarm optimization is one of the most promising methods for solving the inverse heat conduction problem. However, it needs a high computing capacity and a long time to find optimal solutions for large scale problems (large swarm populations). This paper presents a further improvement of our already published GPU-based method. Our new method extends the original algorithm with an initial temporary swarm (which contains significantly more participants than the simulated swarm to better cover the search space). The results are promising, the average fitness values are usually better when using this technique, and (thanks to the well-parallelized graphics accelerator based implementation) the runtime is quite similar.
Download Download full paper as pdf.

Title GPU Based Implementation of Inverse Heat Conduction Problem Solver
Authors S. Szénási, I. Felde, and I. Kovács
Reference as S. Szénási, I. Felde, and I. Kovács, "GPU Based Implementation of Inverse Heat Conduction Problem Solver," in BULETINUL STIINTIFIC AL UNIVERSITATII POLITEHNICA DIN TIMISOARA ROMANIA SERIA AUTOMATICA SI CALCULATORAE / SCIENTIFIC BULLETIN OF POLITECHNICA UNIVERSITY OF TIMISOARA TRANSACTIONS ON AUTOMATIC CONTROL AND COMPUTER SCIENCE, vol. 60(74), no. 1, pp. 5-10, 2015.
Abstract Inverse Heat Conduction Problem means that the surface Heat Transfer Coefficient (HTC)/Heat Flux (HF) must be determined from transient temperature measurements at given interior points. This is a typical ill-posed problem, because the solution’s behaviour does not change continuously with the initial conditions; therefore, there are no already known direct solutions. There are several heuristic methods to solve the IHCP, and one of the most promising methods is Particle Swarm Optimization (PSO) developed by Eberhart and Kennedy in 1995. It has the ability to find the optimal solution in very large parameter spaces; however, it has some limitations. The main weaknesses are the high computational demand (and consequently a large runtime), and the unpredictable chance to find only a local but not the global optimum. This paper presents the implementation and the evaluation of a graphics accelerator-based parallel approach, which has significantly lower runtime (this GPU implementation is about three times faster than the original CPU-based sequential method). Furthermore, the authors examined the relationship between the size of the initial swarms and the final fitness values. The results indicate that it is worth it to generate a larger initial swarm and continue the processing using smaller further swarms. This technique combines the advantages of the large (better accuracy) and small (lower runtime) particle counts.
Download Download full paper as pdf.

Title Multiprocessing of an Individual-Cell Based Model for Parameter Testing
Authors G. Kertész, D. Kiss, A. Lovrics, S. Szénási, and Z. Vámossy
Reference as G. Kertész, D. Kiss, A. Lovrics, S. Szénási, and Z. Vámossy, "Multiprocessing of an Individual-Cell Based Model for Parameter Testing," in Proceedings of the11th IEEE International Symposium on Applied Computational Intelligence and Informatics SACI 2016, Timisoara, 2016, pp. 491-495.
Abstract To simulate the behaviour of cell-level biological interactions, cells can be modeled as individual agents interacting with each other. To identify the effects of parameter perturbation, a massive number of simulations are usually required that are feasible to carry out by parallel computation. In this article, a multiprocessing scheme is defined to reduce the overall runtime. The scheme relies on predicting the expected computational cost of each input parameter set, and scheduling them based on the Longest Processing Time (LPT) heuristic rule.
Download Download full paper as pdf or from IEEE Xplore.

Title Data-parallel Implementation of Accident Black Spot Searching Method
Authors S. Szénási
Reference as S. Szénási, "Data-parallel Implementation of Accident Black Spot Searching Method," in ÓBUDA UNIVERSITY E-BULLETIN, vol. 6, no. 1, pp. 9-15, 2016.
Abstract Identification of accident hot spots of the public roadway (also called accident black spots) is one of the main tasks of road safety experts to avoid further traffic accidents and personal injuries. There are several available methods for this purpose, and one of the most promising of them is based on the GPS (Global Positioning System) coordinates of accidents and uses a well-known data-mining approach called DBSCAN (Density- based Spatial Clustering of Applications with Noise). This method is well parallelizable. Therefore, we can run multiple independent searches from different starting points of the search space. This paper presents a graphics accelerator based implementation of the original sequential algorithm to decrease the processing runtime.
Download Download full paper as pdf.

Title A Novel Method for Robust Multi-Directional Image Projection Computation
Authors G. Kertész, S. Szénási, and Z. Vámossy
Reference as G. Kertész, S. Szénási, and Z. Vámossy, "A Novel Method for Robust Multi-Directional Image Projection Computation," in 20th IEEE Jubilee International Conference on Intelligent Engineering Systems, Budapest, 2016, pp. 239-243.
Abstract This paper introduces a novel method to calculate multi- directional projections of squared images, similar to the Radon transformation. Image projections are often used as object signatures for detection, matching and tracking techniques in computer vision. The Radon transformation provides a fast solution to calculate these pixel intensity sums. The proposed method is based on trigonometric functions and basic coordinategeometry. The solution is implemented sequentially and the runtimes of a GPU-based implementation are measured and evaluated. The analysis of the results indicate that further research is applicable, new parallel models should be discussed.
Download Download full paper as pdf or from IEEE Xplore.

Title Heat Transfer Simulation using GPUs
Authors S. Szénási, and I. Felde
Reference as S. Szénási, and I. Felde, "Heat Transfer Simulation using GPUs," in 20th IEEE Jubilee International Conference on Intelligent Engineering Systems, Budapest, 2016, pp. 263-267.
Abstract Several real-world applications involve simulation of the heat transfer within a given workpiece when placed into an environment with a different temperature. This requires calculation of the temperature in a particular location and at a given moment in time according to the available input parameters (shape and thermal characteristics of the given object and the environment). There are several computer- assisted numerical methods available for solving this type of problem, but these usually have a high computational demand. This paper presents a way to re-design an already known method as a data-parallel one, which makes it possible to use graphics accelerators to speed up the simulation process. According to the test results, the CUDA implementation of the parallel algorithm offers the same accuracy, but 4-5x lower runtime, as the original sequential method.
Download Download full paper as pdf or from IEEE Xplore.

Title Design and Implementation of Parallel List Data Structure using Graphics Accelerators
Authors T. Varga, and S. Szénási
Reference as T. Varga, and S. Szénási, "Design and Implementation of Parallel List Data Structure using Graphics Accelerators," in 20th IEEE Jubilee International Conference on Intelligent Engineering Systems, Budapest, 2016, pp. 315-318.
Abstract One of the biggest shortcomings of the CUDA environment is the lack of a data structure having effective parallel insert and remove operations. This paper focuses on the insertion. The traditional vector-based lists are not applicable for this kind of addition; however, the well-known linked list data structure has the ability to handle parallel insertions at the same time. But nowadays, there is not any ``official'' linked list data structure in the current CUDA runtime library. This paper presents a novel way to create a multi-layered linked list (called Parallel List) using the NVIDIA CUDA framework. As our experiments show, the implemented data structure is able to do multiple insert operations 2--30 times faster than the traditional sequential CPU implementation of a general list object. Another advantage of the new data structure is the support of Random Access pattern to access contents directly.
Download Download full paper as pdf or from IEEE Xplore.

Title Parallel Implementation of DBSCAN Algorithm Using Multiple Graphics Accelerators
Authors S. Szénási
Reference as S. Szénási, "Parallel Implementation of DBSCAN Algorithm Using Multiple Graphics Accelerators," in 16th International Multidisciplinary Scientific Geoconference (SGEM2016), Albena, 2016, pp. 327-333.
Abstract Road accident hotspot (also called black spot) identification is one of the most important tasks of road safety experts to avoid further accidents. One of the available methods is based on the GPS coordinates of accidents and uses the data-mining method called DBSCAN. The DBSCAN method is well parallelizable because we can run multiple searches from different starting points of the search space. This paper presents an NVIDIA CUDA implementation of the algorithm which uses multiple graphics accelerators to decrease the necessary runtime. As the results show, the accuracy of the method is the same as of the sequential one, but the runtime is significantly lower.
Download Download full paper as pdf.

Title Parallel PSO method for estimation heat transfer coefficients
Authors I. Felde, S. Szénási, G. Pintér, W. Shi, R. Colas, and O. Zapata-Hernández
Reference as I. Felde, S. Szénási, G. Pintér, W. Shi, R. Colas, and O. Zapata-Hernández, "Parallel PSO method for estimation heat transfer coefficients," in 23rd International Federation of Heat Treatment and Surface Engineering Congress 2016, IFHTSE 2016, Ohio, 2016, pp. 348-353.
Abstract The methodology based on the Particle Swarm Optimization (PSO) method, as a recent stochastic optimization technique to solve complex inverse heat transfer problems is outlined. The temporospatial Heat Transfer coefficient obtained on the surfaces of a cylindrical work piece is reconstructed by solving the inverse heat conduction problem. The fitness function to be minimized by the PSO approach is defined by the deviation of the measurements and the calculated temperatures is minimized. The PSO algorithm has been parallelized and implemented on a GPGPU architecture. Numerical results are demonstrated that the determination of Heat Transfer Coefficient functions can be performed by using the PSO method, as well as, the GPU implementation; provide a less time consuming and accurate estimation.
Download Download full paper as pdf.

Title Estimation of temporospatial boundary conditions using a particle swarm optimisation technique
Authors I. Felde, and S. Szénási
Reference as I. Felde, and S. Szénási, "Estimation of temporospatial boundary conditions using a particle swarm optimisation technique," in INTERNATIONAL JOURNAL OF MICROSTRUCTURE AND MATERIALS PROPERTIES, vol. 11, no. 3/4, pp. 288-300, 2016.
Abstract In this paper, we present an inverse solver for the estimation of the temporospatial heat transfer coefficients (HTCs) without using prior information of the thermal boundary conditions. The particle swarm optimisation (PSO) method has been introduced to recover the unknown HTC function obtained during immersion quenching. The HTC obtained on the surfaces of a cylindrical work piece is aimed to estimate by using cooling curves recorded internal thermocouples. The fitness function to be minimised by the PSO approach is defined by the deviation of the measured and the calculated cooling curves. The PSO algorithm has been parallelised and implemented on the graphics processing unit (GPU) architecture. The method is tested and evaluated by using hypothetical HTC function on a 2D axis symmetrical heat transfer model. The proposed approach provide significant acceleration of computation and accurate estimation.

Title Estimation of Temporospatial Heat Transfer Coefficients by Parallel PSO Approach
Authors I. Felde, S. Szénási, K. Gábor, S. Wei, and C. Rafael
Reference as I. Felde, S. Szénási, K. Gábor, S. Wei, and C. Rafael, "Estimation of Temporospatial Heat Transfer Coefficients by Parallel PSO Approach," in 3rd International Conference on Heat Treatment and Surface Engineering in Automotive Applications, Prague, 2016.
Abstract The methodology based on the Particle Swarm Optimization (PSO) method, as a recent stochastic optimization technique to solve complex inverse heat transfer problems is outlined. The temporospatial Heat Transfer Coefficient obtained on the surfaces of a cylindrical work piece is determined by solving the inverse heat conduction problem. The goal function to be minimized by the PSO approach is defined by the deviation of the measurements and the calculated temperatures is minimized. The PSO algorithm has been parallelized and implemented on a NVIDIA graphic accelerator. Numerical results are demonstrated that the determination of Heat Transfer Coefficient functions can be performed by using the PSO method, as well as, the Graphics Accelerator implementation; provide a less time consuming and accurate estimation.

Title Investigation of parallelized PSO algorithm applied to estimate complex HTC
Authors I. Felde, S. Szénási, F. Zoltán, and S. Wei
Reference as I. Felde, S. Szénási, F. Zoltán, and S. Wei, "Investigation of parallelized PSO algorithm applied to estimate complex HTC," in Proceedings of the 5th Asian Conference on heat Treatment and Surface Engineering, Hangzhou, 2016, pp. 263-270.