Chat with us, powered by LiveChat Use machine learning algorithms to analyze a dataset - Essayabode

Final Project: Use machine learning algorithms to analyze a dataset Step1: Find a proper1 dataset that can be used to answer some interesting questions. The dataset must contain a variable that can be used as a dependent variable (for example, the number of COVID cases in a county). The dependent variable can be a continuous variable or a categorical variable (binary variable or multi-level variable). The dataset must also include at least two variables that can be used as the independent variables. The independent variables can be continuous or categorical variables or a combination of both. Step 2: Explain the research question(s) you are trying to answer by analyzing the dataset. For example, Question1: “I want to analyze the impact of air pollution on the death rate of COVID at the county level”. Question 2: “I want to analyze the impact of the number of ventilators in a county on the death rate of COVID at the county level” Make sure your dataset can support the questions you are trying to address. The dependent variable must be the same for all questions. As in the above examples, we are studying the impact of different factors on the mortality rate of COVID at the county level. Step 3: Select a proper machine learning algorithm that can be used to analyze the data and answer your questions. For example, if all your variables are continuous numeric data, you might want to use linear regression, while if you are trying to predict the rating of a product (rating can take one of the 5 possible values, i.e., number of stars a product receives) then you might want to use k-nearest neighbors. The selected algorithm must be justifiable. Step 4: analyze the data and answer your questions. For example, if you have used linear regression, explain the trained coefficients and their sign (positive and negative). Briefly explain your findings. For example, by increasing the number of ventilators by 1 unit, the death rate of COVID decreases by 5%. or for example, if you are using decision trees, the first question that should be asked a customer in order to predict his/her evaluation of a product’s quality is about the price of that product. Submission: 1. Report with python notebook (save it as html file) with code, results, and commentary. 2. 1-2 page Microsoft Word document explaining your findings. i. Your research questions (what are the questions you are trying to answer?) ii. The dataset (where did you find it? Why do you think it is a proper dataset? What features and variables are there in the dataset?) iii. The algorithm you have selected for analyzing the data (why did you pick this algorithm?) iv. v. Your findings (does the analysis answer your questions? What are the answers to your initial questions?) What are the main takeaways of this project for you?