Assignment 1

My first assignment has three parts.

(a)

What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction

  • Data scientists spend a significant portion of their time collecting and preparing data before analysis.

  • R packages like tidyr and dplyr make data work more efficient.

  • Table structure allows you to work with large datasets more efficiently, only showing a portion of the data that fits in your console window.

  • The “pipe operator” (%>%) is a handy way to connect data work steps.

  • You can learn to do things like selecting specific data, filtering, creating new info, and summarizing data using these tools.

(b)

1.Indexing

  • Python

    Indexing in Python starts from 0.

    x=list(range(1,4))
    first_element=x[1]
    #output is 2 because first index is 0 we should use x[0] for first element of the list in python.
    print(first_element)
    2
  • R

    Indexing in R starts from 1.

    x <- c(1:4)
    first_element <- x[1]
    #when we try it in python output is going to be 2 because indexing starts from 0.
    first_element 
    [1] 1

2.Style

  • R

    R use more simple programming language rather then python.

    y<-c(1,2,3,4,5)
    square<-y^2 
    #using square root of the defined element give us the square of elements on the list.
    square
    [1]  1  4  9 16 25
  • Python

    We have to use for loop for calculate the square of the elements on the list.

    y = [1,2,3,4,5]
    sq = [y**2 for y in y] 
    print(sq)
    [1, 4, 9, 16, 25]

3.Syntax for Conditional Statements

  • R

    Ifelse applies the condition to each element on the list without for loop.

    t <- c(1,2,3,4,5,6)
    ifelse(t<4,t+1,0)
    [1] 2 3 4 0 0 0
  • Python

    In python, we need for loop to apply the condition to each element on the list.

    t = [1,2,3,4,5,6]
    new_list = [x + 1 if x < 4 else 0 for x in t]
    print(new_list)
    [2, 3, 4, 0, 0, 0]

(c)

#install.packages("dslabs") 
# if you install the packages once there is no need to instaal each time just use for first time is enough.

library(dslabs)

data("na_example")

print(na_example) #print na_example
   [1]  2  1  3  2  1  3  1  4  3  2  2 NA  2  2  1  4 NA  1  1  2  1  2  2  1
  [25]  2  5 NA  2  2  3  1  2  4  1  1  1  4  5  2  3  4  1  2  4  1  1  2  1
  [49]  5 NA NA NA  1  1  5  1  3  1 NA  4  4  7  3  2 NA NA  1 NA  4  1  2  2
  [73]  3  2  1  2  2  4  3  4  2  3  1  3  2  1  1  1  3  1 NA  3  1  2  2  1
  [97]  2  2  1  1  4  1  1  2  3  3  2  2  3  3  3  4  1  1  1  2 NA  4  3  4
 [121]  3  1  2  1 NA NA NA NA  1  5  1  2  1  3  5  3  2  2 NA NA NA NA  3  5
 [145]  3  1  1  4  2  4  3  3 NA  2  3  2  6 NA  1  1  2  2  1  3  1  1  5 NA
 [169] NA  2  4 NA  2  5  1  4  3  3 NA  4  3  1  4  1  1  3  1  1 NA NA  3  5
 [193]  2  2  2  3  1  2  2  3  2  1 NA  2 NA  1 NA NA  2  1  1 NA  3 NA  1  2
 [217]  2  1  3  2  2  1  1  2  3  1  1  1  4  3  4  2  2  1  4  1 NA  5  1  4
 [241] NA  3 NA NA  1  1  5  2  3  3  2  4 NA  3  2  5 NA  2  3  4  6  2  2  2
 [265] NA  2 NA  2 NA  3  3  2  2  4  3  1  4  2 NA  2  4 NA  6  2  3  1 NA  2
 [289]  2 NA  1  1  3  2  3  3  1 NA  1  4  2  1  1  3  2  1  2  3  1 NA  2  3
 [313]  3  2  1  2  3  5  5  1  2  3  3  1 NA NA  1  2  4 NA  2  1  1  1  3  2
 [337]  1  1  3  4 NA  1  2  1  1  3  3 NA  1  1  3  5  3  2  3  4  1  4  3  1
 [361] NA  2  1  2  2  1  2  2  6  1  2  4  5 NA  3  4  2  1  1  4  2  1  1  1
 [385]  1  2  1  4  4  1  3 NA  3  3 NA  2 NA  1  2  1  1  4  2  1  4  4 NA  1
 [409]  2 NA  3  2  2  2  1  4  3  6  1  2  3  1  3  2  2  2  1  1  3  2  1  1
 [433]  1  3  2  2 NA  4  4  4  1  1 NA  4  3 NA  1  3  1  3  2  4  2  2  2  3
 [457]  2  1  4  3 NA  1  4  3  1  3  2 NA  3 NA  1  3  1  4  1  1  1  2  4  3
 [481]  1  2  2  2  3  2  3  1  1 NA  3  2  1  1  2 NA  2  2  2  3  3  1  1  2
 [505] NA  1  2  1  1  3  3  1  3  1  1  1  1  1  2  5  1  1  2  2  1  1 NA  1
 [529]  4  1  2  4  1  3  2 NA  1  1 NA  2  1  1  4  2  3  3  1  5  3  1  1  2
 [553] NA  1  1  3  1  3  2  4 NA  2  3  2  1  2  1  1  1  2  2  3  1  5  2 NA
 [577]  2 NA  3  2  2  2  1  5  3  2  3  1 NA  3  1  2  2  2  1  2  2  4 NA  6
 [601]  1  2 NA  1  1  2  2  3 NA  3  2  3  3  4  2 NA  2 NA  4 NA  1  1  2  2
 [625]  3  1  1  1  3 NA  2  5 NA  7  1 NA  4  3  3  1 NA  1  1  1  1  3  2  4
 [649]  2  2  3 NA NA  1  4  3  2  2  2  3  2  4  2  2  4 NA NA NA  6  3  3  1
 [673]  4  4  2  1 NA  1  6 NA  3  3  2  1  1  6 NA  1  5  1 NA  2  6  2 NA  4
 [697]  1  3  1  2 NA  1  1  3  1  2  4  2  1  3  2  4  3  2  2  1  1  5  6  4
 [721]  2  2  2  2  4 NA  1  2  2  2  2  4  5 NA NA NA  4  3  3  3  2  4  2  4
 [745] NA NA NA NA  2  1 NA  2  4  3  2 NA  2  3  1  3  4 NA  1  2  1  2 NA  3
 [769]  1  2  1  2  1  2  1  2  2  2  2  1  1  3  3  1  3  4  3 NA NA  4  2  3
 [793]  2  1  3  2  4  2  2  3  1  2  4  3  3  4 NA  1  4  2  1  1  1  3  1  5
 [817]  2  2  4  2 NA  1  3  1  2 NA  1  2  1  2  1 NA  1  3  2  3  2 NA  2  1
 [841]  4  2 NA NA NA  2  4  2 NA NA  3  1 NA  5  5  2  2  2 NA  2  1  3  1  3
 [865]  2  4  2  4 NA  4  1  2  3  2  3  3  2  3  2  2  2  1  3  2  4  2 NA  3
 [889]  3  2  2 NA NA  3  2  1  2  4  1  1  1  1  4  3  2 NA  3  2 NA  1 NA  3
 [913]  2  1  1  1  2 NA  2  2  3  3  2 NA NA  4  5  2  2  2  1  2  3  1  3  3
 [937]  4  3 NA  1  1  1 NA  4  3  5  1  1  2 NA  2  2  2  2  5  2  2  3  1  2
 [961]  3 NA  1  2 NA NA  2 NA  3  1  1  2  5  3  5  1  1  4 NA  2  1  3  1  1
 [985]  2  4  3  3  3 NA  1  1  2  2  1  1  2  2 NA  2
na_check <-ifelse(is.na(na_example),1,0) #for sumation check NA and print as 1

sum_na <- sum(na_check) 
sum_na # total numbers of NA
[1] 145
without_na<-ifelse(is.na(na_example),0,na_example) #turn the na values to 0

updated_num_na<-sum(ifelse(is.na(without_na),1,0))
updated_num_na
[1] 0
Back to top