Assignment 1

My first assignment has three parts.

(a)

I watched the video was named ‘Digging a Pit of Success for Your Organization: Embracing a R-based ecosystem in the US federal government’

Click here for watching the video

SUMMARY

An economist for a US federal agency (USAID) explains how they analyzed millions of records from health institutions in countries around the world. They evaluate data such as HIV status, HIV treatment methods, and protection indicators in the President’s Emergency Plan for AIDS Relief unit. In 2015, their teams were largely using EXCEL, but later they realized that EXCEL was not suitable for large data and went beyond the limits of Excel to be more efficient in their analysis. They turned to taking advantage of R for “We have created our own ‘pit of success’ by providing analysts with the infrastructure and support needed to make it easier to learn and work with R,” says Eknomist. At this point, he talks about the importance of key point;

  1. Packages,

  2. Buy in,

  3. Community

First of all, they created packages that make things easier for everyone to access and use. He explains that they achieved the creation of these packages by first creating scripts and then creating functions.

The second important point is that he invites people to buy and support their analysts to learn R. Although there is a need for a learning curve at this point, he emphasizes the value of learning curve time by emphasizing that routine work will accelerate and be of better quality after giving analysts enough time to learn.

The last important way is to build communities, he adds. While integrating R into internal use, they created a Slack group, where they saw the importance of people asking each other questions and sharing new information within the community.

He finishes his speech by emphasizing that they increase the quality of work and work efficiency.

(b)

Differences between R and Python:

R: In R, indices begin at 1. Additionally, it contains special data structures that are ideal for statistical analysis, such as factors, data frames, and vectors.

Python: Python indexes data with a base of 0. Moreover, it makes use of more conventional data structures like sets, dictionaries, and lists.

R: The syntax of R is designed with statistical data analysis in mind. R code is frequently written in a way that is more functional.

Python: Python is renowned for its readability and boasts a more versatile syntax. For code blocks, it employs indentation (whitespace).

Example:

In R:

::: {.cell}

nums <- c(16, 20, 35, 63, 78)
mean <- mean(nums)
print(mean)

::: {.cell-output .cell-output-stdout} [1] 42.4 ::: :::

In Python:

::: {.cell}

nums = [16, 20, 35, 63, 78]
mean = sum(nums) / len(nums)
print(mean)

:::

R: R uses specific functions and expressions for string processing.

Python: Python provides built-in string methods and libraries for string processing.

Example:

In R:

::: {.cell}

text <- "Hello, World!"
uppercase_text <- toupper(text)

:::

In Python:

::: {.cell}

text = "Hello, World!"
uppercase_text = text.upper()

:::

(c)

library(dslabs)
data("na_example")

# Count NAs
print(paste("Total number of NAs in na_example:", sum(is.na(na_example)), "\n"))
[1] "Total number of NAs in na_example: 145 \n"
# Replace NAs with 0
na_example_no_na <- na_example
na_example_no_na[is.na(na_example_no_na)] <- 0

# Print the new data (without NAs) and count NAs in the new data
print(na_example_no_na)
   [1] 2 1 3 2 1 3 1 4 3 2 2 0 2 2 1 4 0 1 1 2 1 2 2 1 2 5 0 2 2 3 1 2 4 1 1 1 4
  [38] 5 2 3 4 1 2 4 1 1 2 1 5 0 0 0 1 1 5 1 3 1 0 4 4 7 3 2 0 0 1 0 4 1 2 2 3 2
  [75] 1 2 2 4 3 4 2 3 1 3 2 1 1 1 3 1 0 3 1 2 2 1 2 2 1 1 4 1 1 2 3 3 2 2 3 3 3
 [112] 4 1 1 1 2 0 4 3 4 3 1 2 1 0 0 0 0 1 5 1 2 1 3 5 3 2 2 0 0 0 0 3 5 3 1 1 4
 [149] 2 4 3 3 0 2 3 2 6 0 1 1 2 2 1 3 1 1 5 0 0 2 4 0 2 5 1 4 3 3 0 4 3 1 4 1 1
 [186] 3 1 1 0 0 3 5 2 2 2 3 1 2 2 3 2 1 0 2 0 1 0 0 2 1 1 0 3 0 1 2 2 1 3 2 2 1
 [223] 1 2 3 1 1 1 4 3 4 2 2 1 4 1 0 5 1 4 0 3 0 0 1 1 5 2 3 3 2 4 0 3 2 5 0 2 3
 [260] 4 6 2 2 2 0 2 0 2 0 3 3 2 2 4 3 1 4 2 0 2 4 0 6 2 3 1 0 2 2 0 1 1 3 2 3 3
 [297] 1 0 1 4 2 1 1 3 2 1 2 3 1 0 2 3 3 2 1 2 3 5 5 1 2 3 3 1 0 0 1 2 4 0 2 1 1
 [334] 1 3 2 1 1 3 4 0 1 2 1 1 3 3 0 1 1 3 5 3 2 3 4 1 4 3 1 0 2 1 2 2 1 2 2 6 1
 [371] 2 4 5 0 3 4 2 1 1 4 2 1 1 1 1 2 1 4 4 1 3 0 3 3 0 2 0 1 2 1 1 4 2 1 4 4 0
 [408] 1 2 0 3 2 2 2 1 4 3 6 1 2 3 1 3 2 2 2 1 1 3 2 1 1 1 3 2 2 0 4 4 4 1 1 0 4
 [445] 3 0 1 3 1 3 2 4 2 2 2 3 2 1 4 3 0 1 4 3 1 3 2 0 3 0 1 3 1 4 1 1 1 2 4 3 1
 [482] 2 2 2 3 2 3 1 1 0 3 2 1 1 2 0 2 2 2 3 3 1 1 2 0 1 2 1 1 3 3 1 3 1 1 1 1 1
 [519] 2 5 1 1 2 2 1 1 0 1 4 1 2 4 1 3 2 0 1 1 0 2 1 1 4 2 3 3 1 5 3 1 1 2 0 1 1
 [556] 3 1 3 2 4 0 2 3 2 1 2 1 1 1 2 2 3 1 5 2 0 2 0 3 2 2 2 1 5 3 2 3 1 0 3 1 2
 [593] 2 2 1 2 2 4 0 6 1 2 0 1 1 2 2 3 0 3 2 3 3 4 2 0 2 0 4 0 1 1 2 2 3 1 1 1 3
 [630] 0 2 5 0 7 1 0 4 3 3 1 0 1 1 1 1 3 2 4 2 2 3 0 0 1 4 3 2 2 2 3 2 4 2 2 4 0
 [667] 0 0 6 3 3 1 4 4 2 1 0 1 6 0 3 3 2 1 1 6 0 1 5 1 0 2 6 2 0 4 1 3 1 2 0 1 1
 [704] 3 1 2 4 2 1 3 2 4 3 2 2 1 1 5 6 4 2 2 2 2 4 0 1 2 2 2 2 4 5 0 0 0 4 3 3 3
 [741] 2 4 2 4 0 0 0 0 2 1 0 2 4 3 2 0 2 3 1 3 4 0 1 2 1 2 0 3 1 2 1 2 1 2 1 2 2
 [778] 2 2 1 1 3 3 1 3 4 3 0 0 4 2 3 2 1 3 2 4 2 2 3 1 2 4 3 3 4 0 1 4 2 1 1 1 3
 [815] 1 5 2 2 4 2 0 1 3 1 2 0 1 2 1 2 1 0 1 3 2 3 2 0 2 1 4 2 0 0 0 2 4 2 0 0 3
 [852] 1 0 5 5 2 2 2 0 2 1 3 1 3 2 4 2 4 0 4 1 2 3 2 3 3 2 3 2 2 2 1 3 2 4 2 0 3
 [889] 3 2 2 0 0 3 2 1 2 4 1 1 1 1 4 3 2 0 3 2 0 1 0 3 2 1 1 1 2 0 2 2 3 3 2 0 0
 [926] 4 5 2 2 2 1 2 3 1 3 3 4 3 0 1 1 1 0 4 3 5 1 1 2 0 2 2 2 2 5 2 2 3 1 2 3 0
 [963] 1 2 0 0 2 0 3 1 1 2 5 3 5 1 1 4 0 2 1 3 1 1 2 4 3 3 3 0 1 1 2 2 1 1 2 2 0
[1000] 2
print(paste("Total number of NAs in na_example_no_na:", sum(is.na(na_example_no_na))))
[1] "Total number of NAs in na_example_no_na: 0"
Back to top