+++ title = " A tale of Decision Trees, Java and OpenCL" date = "2019-10-02T17:53:52+05:30" hide_authorbox = false disable_comments = false draft = false categories = [ "Development" ] tags = [ "opencl", "java", "ai", "Machine Learning", "id3", "Decision Tree", "gpgpu" ] opacity = false sidebar = { "disable" = true } featured_image = "/img/decisiontree.webp" +++ I worked on a decision tree program in java for my final year MSc project. In a series of posts such as this one, I'll be highlighting some of the aspects of it's creation and implementation. A bit of background - this project is of great signficance to me, one reason being that it had many firsts - my first proper java project, my first foray into machine learning and GPGPU programming, my first GUI program, to mention a few. It was also when I realized that I could actually enjoy programming. I also had to face significant adversities while working on it, none of which I'd disclose publicly. But yeah, I was going through a bad time, and yet I still managed to finish my project, clear all my exams and get my master's degree - a fact that I'm proud of. Now for the project itself.. ### Decision Tree Decision Tree is a machine learning algorithm which creates a model of the given data set in the form a tree. This tree can then be used for prediction. ![Decison Tree](/img/decisiontree.webp) ####
Figure - A Decision Tree
A major challenge in implementing this form of decision tree was that the number of children of the nodes is not constant. Thus a simple binary tree wouldn't suffice. I had to use an n-ary tree data structure that I also had to create myself. ### ID3 I used ID3 as the algorithm for partitioning the data set and creating the decision tree. ID3 is a greedy algorithm that uses the concept of entropy to decide where to split the data in order to partition it. ID3 splits the data set recursively until it reaches the terminating condition(s), and each split point becomes a node of the tree. ### OpenCL I also worked on incorporating GPU acceleration into this project. While I had an Nvidia GPU, I still chose to go with OpenCL over CUDA, and the major benefit was that I could prototype on my laptop despite it not having a GPU(the code would just run on CPU) and then run it on my desktop which had a GPU. #### Other stuff - MySQL, Swing etc Some of the other stuff I used were mysql - for storing the data sets. I didn't know about hibernate at this point so I 'simply' used JDBC , and swing - for creating the GUI. That's it for now. In the next post I'll talk about the system overview and architecture. Stay tuned! ![System Architecture](/img/sysarch.webp) Cover image source: https://www.sciencemag.org/