Currently Being Moderated

Here is my first submission for Data Geek Challenge 2...

 

My objective of analysis to see if I can see any correlation of food habit and exercise and  with breast Cancer in females in US aged 16-64.

Lets see if the data shows any correlation with the smoking, obesity , eating habits and any relationship with Brest cancer across US states  I will be using Predictive analysis. I got the data point from the below link as CSV and did a little cleansing.

Here is the source I got some of my data from.

http://statecancerprofiles.cancer.gov/

1.png

 

Now lets import the CSV data into predictive analysis and then we will merge the data for the possible related factors with it.

2.png

 

Now once the data is acquired we will have to enrich the time hierarchy data and Geo graphic data in this case the states.

 

Now I would create a Geographic hierarchy Region for states so that I can use the dimension in the Geo maps. I would be assigning the non mapped states manually.

 

3.png

Now acquiring and merging the other parameter and merging based on the state dimensions.

 

 

4.png

6.png

 

Now lets start plotting just the % incident numbers by state in a Geo map.Now here is the chart which looks pretty much the same for all the states.

 

7.png

Now here comes the need for the some basic predictive analysis.Having said that let me be very clear here I am beginner in the predictive analysis stuff though worked in R with HANA a while ago.We would normalize the data based on a Max min Scaling algorithm with min-Max as (1-0).

 

8.png

 

Now platting the scaled data looks like data is scaled for based on the algorithm and gives some better idea of the data.

 

9.png

 

Thats a good start.

 

Now let’s create a bubble chart of Annual Incidence Rate per 100,000 polulation with the % of Obesity and % of Smoking for the top 10 incidents. For this we had already  merging the data from another spreadsheet. Here is how it comes as.

 

10.png

Ok lets do a regression of this incident with Obestity % and plotting the output. This is open for interpretation.

11.png

Now plotting the linear regression of with Obesity numbers

12.png

 

 

 

Now out of curiosity let’s see if there is any relationship we can find with Obesity & Smoking using the chart and looks like there is a pretty good correlation just by looking into it. Now let’s find out what is the correlation coefficient.Now lets do a regression.

 

14.png

17.png

13.png

 

Now plotting the predicted numbers in a bar chart shows there are some correlation. So we can infer if a person is a smoker there is a higher chance that

he is obese.

b.png

 

Now let’s plot it with the % of people eating fruit 5 times a day and see any correlation. Looks like people who are obese and smoke most likely eat less fruit and vegetable. As there is clearly a downward trend.So thats great its just confirming what we normally think.18.png

 

 

 

Now let’s quickly see any relation between Insured Percentage before any analysis. Again its quite obvious the persons who are obese and eat less fruit and veg tends to be less insured as well.

 

19.png

As I mentioned earlier this is my first post here , I will have some more post once I something more to share.Thanks for reading.

Comments

Actions

Filter Blog

By author:
By date:
By tag: