My first assignment has two parts.

(a) Selected Video: Mustafa Gökçe Baydoğan - A Talk on Data Analytics and Industrial Engineering

Brief Summary

In this seminar, Mustafa Gökçe Baydoğan discusses the integral role of Industrial Engineering (IE) in the field of data analytics through various real-world industrial projects. The key highlights of the talk include:

  • Types of Analytics: He explains the transition from descriptive and predictive analytics to prescriptive analytics, emphasizing how Operations Research (OR) and Machine Learning (ML) work together to provide actionable insights.

  • Case Studies: Two major projects are detailed:

    • Predicting lumber warping in the forest industry by treating digital images as matrices and extracting physical features like knot locations and ring orientations.

    • Managing electricity consumption forecasts and imbalance costs in the energy sector.

  • The IE Advantage: He stresses that the strength of an Industrial Engineer lies in identifying the root cause of a problem and ensuring that models are “explainable” for business decision-makers.

  • Advice for Students: He encourages aspiring data scientists to learn by “getting their hands dirty” with real datasets rather than just watching tutorials and highlights the importance of graduate studies for specialization.

(b) Selected Video: Mustafa Gökçe Baydoğan - A Talk on Data Analytics and Industrial Engineering

Question: In the lumber warping prediction project, how did Mustafa Baydoğan incorporate domain-specific knowledge into the data analytics process, and why was this approach preferred over a purely data-driven (black-box) model?

Answer: Mustafa Baydoğan researched forestry literature to understand the physical causes of wood warping, such as knot density and the orientation of growth rings based on where the lumber was cut from the tree. He transformed digital images into matrices to extract these specific features. This approach was preferred because it made the model “explainable” (interpretability), allowing industry experts to understand why a certain piece of wood was flagged and take corrective actions before the drying process.

Code
library(dslabs)
data("polls_us_election_2016")

my_first <- "Onur Furkan"
my_birth_year <- 2004

k <- (nchar(my_first) + my_birth_year) %% 15 + 8
print(paste("Hesaplanan k değeri:", k))
[1] "Hesaplanan k değeri: 13"
Code
if (k %% 2 == 0) {
  head(polls_us_election_2016, k)
} else {
  tail(polls_us_election_2016, k)
}
              state  startdate    enddate                pollster grade
4196 North Carolina 2016-05-20 2016-05-22   Public Policy Polling    B+
4197       Kentucky 2016-09-30 2016-10-13                   Ipsos    A-
4198        Florida 2016-07-30 2016-08-07   Quinnipiac University    A-
4199   Pennsylvania 2016-06-08 2016-06-19   Quinnipiac University    A-
4200           Ohio 2016-06-30 2016-07-11   Quinnipiac University    A-
4201 North Carolina 2016-03-18 2016-03-20   Public Policy Polling    B+
4202   South Dakota 2016-10-28 2016-11-02                   Ipsos    A-
4203     Washington 2016-10-21 2016-11-02                   Ipsos    A-
4204       Virginia 2016-09-16 2016-09-22                   Ipsos    A-
4205      Wisconsin 2016-08-04 2016-08-07    Marquette University     A
4206           Utah 2016-11-01 2016-11-07 Google Consumer Surveys     B
4207         Oregon 2016-10-21 2016-11-02                   Ipsos    A-
4208       Michigan 2016-01-23 2016-01-26                EPIC-MRA    A-
     samplesize population rawpoll_clinton rawpoll_trump rawpoll_johnson
4196        928          v           41.00         43.00            3.00
4197        336         lv           39.38         53.08              NA
4198       1056         lv           43.00         43.00            7.00
4199        950         rv           39.00         36.00            9.00
4200        955         rv           36.00         37.00            7.00
4201        843          v           44.00         42.00              NA
4202        170         lv           28.45         47.20              NA
4203        538         lv           46.71         38.33              NA
4204        452         lv           46.54         40.04              NA
4205        683         lv           47.00         34.00            9.00
4206        286         lv           21.33         35.05            9.99
4207        446         lv           46.46         37.41              NA
4208        600         lv           43.00         41.00              NA
     rawpoll_mcmullin adjpoll_clinton adjpoll_trump adjpoll_johnson
4196               NA        43.28262      47.12021       -0.036293
4197               NA        38.34430      54.36357              NA
4198               NA        45.19351      46.65680        3.448447
4199               NA        43.35339      41.19061        4.791570
4200               NA        40.73937      42.33380        2.936299
4201               NA        42.13165      43.55006              NA
4202               NA        26.57791      45.43384              NA
4203               NA        45.56387      38.22545              NA
4204               NA        46.47852      40.48017              NA
4205               NA        48.74781      39.07778        4.705020
4206               NA        26.65200      40.57738        9.705791
4207               NA        45.12949      37.10720              NA
4208               NA        42.14966      42.05508              NA
     adjpoll_mcmullin
4196               NA
4197               NA
4198               NA
4199               NA
4200               NA
4201               NA
4202               NA
4203               NA
4204               NA
4205               NA
4206               NA
4207               NA
4208               NA
Code
total_na <- sum(is.na(polls_us_election_2016))
print(paste("Toplam NA sayısı:", total_na))
[1] "Toplam NA sayısı: 11604"
Code
na_counts <- colSums(is.na(polls_us_election_2016))
sort(na_counts, decreasing = TRUE)[1:8]
rawpoll_mcmullin adjpoll_mcmullin  rawpoll_johnson  adjpoll_johnson 
            4178             4178             1409             1409 
           grade       samplesize            state        startdate 
             429                1                0                0 
Code
new_data <- polls_us_election_2016


new_data[] <- lapply(new_data, function(x) {
  if(is.numeric(x)) {
    x[is.na(x)] <- my_birth_year + k
  }
  return(x)
})

new_data[] <- lapply(new_data, function(x) {
  if(is.character(x) | is.factor(x)) {
    x <- as.character(x) 
    x[is.na(x)] <- paste0(my_first, "_", k)
  }
  return(x)
})


if (k %% 2 == 0) {
  head(new_data, k)
} else {
  tail(new_data, k)
}
              state  startdate    enddate                pollster grade
4196 North Carolina 2016-05-20 2016-05-22   Public Policy Polling    B+
4197       Kentucky 2016-09-30 2016-10-13                   Ipsos    A-
4198        Florida 2016-07-30 2016-08-07   Quinnipiac University    A-
4199   Pennsylvania 2016-06-08 2016-06-19   Quinnipiac University    A-
4200           Ohio 2016-06-30 2016-07-11   Quinnipiac University    A-
4201 North Carolina 2016-03-18 2016-03-20   Public Policy Polling    B+
4202   South Dakota 2016-10-28 2016-11-02                   Ipsos    A-
4203     Washington 2016-10-21 2016-11-02                   Ipsos    A-
4204       Virginia 2016-09-16 2016-09-22                   Ipsos    A-
4205      Wisconsin 2016-08-04 2016-08-07    Marquette University     A
4206           Utah 2016-11-01 2016-11-07 Google Consumer Surveys     B
4207         Oregon 2016-10-21 2016-11-02                   Ipsos    A-
4208       Michigan 2016-01-23 2016-01-26                EPIC-MRA    A-
     samplesize population rawpoll_clinton rawpoll_trump rawpoll_johnson
4196        928          v           41.00         43.00            3.00
4197        336         lv           39.38         53.08         2017.00
4198       1056         lv           43.00         43.00            7.00
4199        950         rv           39.00         36.00            9.00
4200        955         rv           36.00         37.00            7.00
4201        843          v           44.00         42.00         2017.00
4202        170         lv           28.45         47.20         2017.00
4203        538         lv           46.71         38.33         2017.00
4204        452         lv           46.54         40.04         2017.00
4205        683         lv           47.00         34.00            9.00
4206        286         lv           21.33         35.05            9.99
4207        446         lv           46.46         37.41         2017.00
4208        600         lv           43.00         41.00         2017.00
     rawpoll_mcmullin adjpoll_clinton adjpoll_trump adjpoll_johnson
4196             2017        43.28262      47.12021       -0.036293
4197             2017        38.34430      54.36357     2017.000000
4198             2017        45.19351      46.65680        3.448447
4199             2017        43.35339      41.19061        4.791570
4200             2017        40.73937      42.33380        2.936299
4201             2017        42.13165      43.55006     2017.000000
4202             2017        26.57791      45.43384     2017.000000
4203             2017        45.56387      38.22545     2017.000000
4204             2017        46.47852      40.48017     2017.000000
4205             2017        48.74781      39.07778        4.705020
4206             2017        26.65200      40.57738        9.705791
4207             2017        45.12949      37.10720     2017.000000
4208             2017        42.14966      42.05508     2017.000000
     adjpoll_mcmullin
4196             2017
4197             2017
4198             2017
4199             2017
4200             2017
4201             2017
4202             2017
4203             2017
4204             2017
4205             2017
4206             2017
4207             2017
4208             2017
Code
sum(is.na(new_data))
[1] 0
Code
anyNA(new_data)
[1] FALSE

Notes on Methodology

This assignment was prepared by integrating human intelligence (HI) with AI-assisted research tools. The video analysis in Part (a) and (b) was synthesized using AI to provide a concise English summary, while the technical R scripts in Part 4 were developed to meet specific logical constraints.

Verification: All AI-generated outputs, including calculations (e.g., the value of \(k=13\)) and data transformations, have been manually audited and verified for accuracy to ensure they align with the course requirements.

Prompts used for generation: 1. “Synthesize a professional summary and quiz questions from the provided video transcript in English.” 2. “Develop an R script for data cleaning and NA replacement based on user-defined variables and logical conditions.”

Back to top