The skills I have practiced within this post:
Cleaning and Analyzing data in R
Using internet searches (Stack Overflow!) to write code that I am stumped on
Visualizing in R
Knitting and creating a pretty HTML file in R
Using MySQL server
Creating a new table/data frame in MySQL
What you are seeing above is an example of a scatterplot in the RMarkdown file I have completed. This project has taken me the better part of the last week. It is a part of the second Capstone Project for the Google Certificates Course. The project is for a company called Bella Beat. They are a fitness products company- they create a smart scale, smart water bottle, smartwatch, and app to pair with these. They are looking to gain insights about the market so they can better advertise their products. I was given access to a dataset (n=33) of FitBit users who recorded their daily health information using their smartwatches (and sometimes a scale) over the course of a few weeks.
This project was much more challenging to me than the cycle project, for multiple reasons. First of all, this data is not even coming from the actual users of the company's product- it was coming from FitBit users. Secondly, it has a very small sample size, so the generalizability of anything I find in this data is going to be incredibly limited. I spent a long time just thinking about how I could use this data to do anything productive.
I knew that my best option for starting out was just to go into Sheets and clean the spreadsheets- rename them, rename columns, take a look at the data, and look for missing data. This took me about an hour, I did not account for how much time Sheets would take to convert the spreadsheets from CSV's to sheets files and then work with them. In the future, I will think about which sheets I want to use prior to trying to convert all of them.
My first mission I set for myself was that I decided to learn how to use MySQL- because only knowing how to use Big Query is impractical. I had looked at this program before, but I could not even figure out how to get my tables uploaded to even begin trying to write queries. After a few hours and several YouTube videos, I had done it! The tables I was interested in were uploaded to MySQL. Then I felt stuck again. I kept thinking about the different ways I could analyze the users data, but then feeling stuck because it did not make sense to draw conclusions from a sample size of 33 or less.
After some time feeling stuck on this, I decided to change tactics. I switched over to R Studio- I had also wanted to spend some time practicing in this, especially because I had not yet figured out how to get my desktop version to show the entire R Studio. After another 30 minutes on Google and Youtube, I realized I had opened R, NOT R Studio. Once I figured that out, I was able to get into R Studio and get started.
In my Google Doc where I was keeping track of my progress on everything I was doing, I wrote out some different things I wanted to try doing in R. I decided that my best bet with this data would be to figure out what corner of the market this data represented, and what they seemed to be using their smartwatches and scales for.
I managed to discover that all 33 participants tracked their activity using their smartwatch, but only some tracked their sleep, and an even smaller portion tracked their weight. This is when I was glad to be in R. I decided that I would make some graphs representing which kind of users were more likely to record their sleep or their weight. The case study mentioned a Senior position and the company was a mathematician, so I knew that presenting r values for the correlations would not be over his head, and would likely be appreciated.
Figuring out how to do everything I wanted in R required a lot of Google searching! I am getting better all the time at figuring out exactly what terms to search in order to get the results I am looking for. Thank goodness for stack overflow! Several times I came across something I wanted to do and knew I could do it one way, but decided it was worth it to take the time to figure out how to do it more efficiently or prettier.
In the end, I ended up finding that the more active people were, the more likely they were to also track their sleep/weight. I also made some very pretty scatter plots. I decided that since my objective here was to gain familiarity with R, that is where I will end with this capstone project, at least for now. However, when I went to knit the file, I ran into some errors!
It ended up taking me a few hours and several tries to get the "knit" function to work. I had to do all sorts of thing like specify the exact path for my read_csv function, and setting my CRAN mirror so I could download packages. It was honestly such a headache, because I felt like I had already done the work and was ready to move on. In the end I produced an html pretty document that was not as clean as I wanted. However, each time I went back to edit, one of the tables would produce another column that I was not sure how to make it stop doing that. So, I left it as is. Maybe I will come back to it some day. My priority now is learning Python, so I did not want to spend more precious hours on this one thing.
I feel like I never have enough hours in the day for everything I want to learn! My list of "to-do's" and "to-learn's" grows every day. I cannot wait to move forward and learn new concepts.
Comments