diff --git a/02_RProgramming/DataTypes/Introduction to the R Language.pdf b/02_RProgramming/DataTypes/Introduction to the R Language.pdf index cec046b60..b8f1bc492 100644 Binary files a/02_RProgramming/DataTypes/Introduction to the R Language.pdf and b/02_RProgramming/DataTypes/Introduction to the R Language.pdf differ diff --git a/02_RProgramming/DataTypes/index.Rmd b/02_RProgramming/DataTypes/index.Rmd index 65eb1ce54..19f8f1af4 100644 --- a/02_RProgramming/DataTypes/index.Rmd +++ b/02_RProgramming/DataTypes/index.Rmd @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -200,7 +200,9 @@ NAs introduced by coercion > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion ``` --- @@ -472,4 +474,4 @@ Data Types - data frames -- names \ No newline at end of file +- names diff --git a/02_RProgramming/DataTypes/index.html b/02_RProgramming/DataTypes/index.html index 9b50617cb..00c65c081 100644 --- a/02_RProgramming/DataTypes/index.html +++ b/02_RProgramming/DataTypes/index.html @@ -8,46 +8,46 @@ - - + - - - - + + - - + - + + +
+

Introduction to the R Language

+

Data Types and Basic Operations

+

Roger Peng, Associate Professor
Johns Hopkins Bloomberg School of Public Health

+
+
+
- - - - -
-

Introduction to the R Language

-

Data Types and Basic Operations

-

Roger Peng, Associate Professor
Johns Hopkins Bloomberg School of Public Health

-
-
- - +

Objects

-
+

R has five basic or “atomic” classes of objects:

    @@ -73,11 +73,11 @@

    Objects

    - +

    Numbers

    -
    +
    • Numbers in R a generally treated as numeric objects (i.e. double precision real numbers)

    • @@ -97,11 +97,11 @@

      Numbers

      - +

      Attributes

      -
      +

      R objects can have attributes

        @@ -119,11 +119,11 @@

        Attributes

        - +

        Entering Input

        -
        +

        At the R prompt we type expressions. The <- symbol is the assignment operator.

        > x <- 1
        @@ -145,11 +145,11 @@ 

        Entering Input

        - +

        Evaluation

        -
        +

        When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.

        > x <- 5  ## nothing printed
        @@ -165,11 +165,11 @@ 

        Evaluation

        - +

        Printing

        -
        +
        > x <- 1:20 
         > x
          [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
        @@ -182,11 +182,11 @@ 

        Printing

        - +

        Creating Vectors

        -
        +

        The c() function can be used to create vectors of objects.

        > x <- c(0.5, 0.6)       ## numeric
        @@ -208,11 +208,11 @@ 

        Creating Vectors

        - +

        Mixing Objects

        -
        +

        What about the following?

        > y <- c(1.7, "a")   ## character
        @@ -226,11 +226,11 @@ 

        Mixing Objects

        - +

        Explicit Coercion

        -
        +

        Objects can be explicitly coerced from one class to another using the as.* functions, if available.

        > x <- 0:6
        @@ -248,11 +248,11 @@ 

        Explicit Coercion

        - +

        Explicit Coercion

        -
        +

        Nonsensical coercion results in NAs.

        > x <- c("a", "b", "c")
        @@ -263,18 +263,20 @@ 

        Explicit Coercion

        > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion
        - +

        Matrices

        -
        +

        Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

        > m <- matrix(nrow = 2, ncol = 3) 
        @@ -293,11 +295,11 @@ 

        Matrices

        - +

        Matrices (cont’d)

        -
        +

        Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

        > m <- matrix(1:6, nrow = 2, ncol = 3) 
        @@ -311,11 +313,11 @@ 

        Matrices (cont’d)

        - +

        Matrices (cont’d)

        -
        +

        Matrices can also be created directly from vectors by adding a dimension attribute.

        > m <- 1:10 
        @@ -332,11 +334,11 @@ 

        Matrices (cont’d)

        - +

        cbind-ing and rbind-ing

        -
        +

        Matrices can be created by column-binding or row-binding with cbind() and rbind().

        > x <- 1:3
        @@ -356,11 +358,11 @@ 

        cbind-ing and rbind-ing

        - +

        Lists

        -
        +

        Lists are a special type of vector that can contain elements of different classes. Lists are a very important data type in R and you should get to know them well.

        > x <- list(1, "a", TRUE, 1 + 4i) 
        @@ -382,11 +384,11 @@ 

        Lists

        - +

        Factors

        -
        +

        Factors are used to represent categorical data. Factors can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label.

          @@ -398,11 +400,11 @@

          Factors

          - +

          Factors

          -
          +
          > x <- factor(c("yes", "yes", "no", "yes", "no")) 
           > x
           [1] yes yes no yes no
          @@ -421,11 +423,11 @@ 

          Factors

          - +

          Factors

          -
          +

          The order of the levels can be set using the levels argument to factor(). This can be important in linear modelling because the first level is used as the baseline level.

          > x <- factor(c("yes", "yes", "no", "yes", "no"),
          @@ -439,11 +441,11 @@ 

          Factors

          - +

          Missing Values

          -
          +

          Missing values are denoted by NA or NaN for undefined mathematical operations.

            @@ -457,11 +459,11 @@

            Missing Values

            - +

            Missing Values

            -
            +
            > x <- c(1, 2, NA, 10, 3)
             > is.na(x)
             [1] FALSE FALSE  TRUE FALSE FALSE
            @@ -478,11 +480,11 @@ 

            Missing Values

            - +

            Data Frames

            -
            +

            Data frames are used to store tabular data

              @@ -498,11 +500,11 @@

              Data Frames

              - +

              Data Frames

              -
              +
              > x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) 
               > x
                 foo   bar
              @@ -520,11 +522,11 @@ 

              Data Frames

              - +

              Names

              -
              +

              R objects can also have names, which is very useful for writing readable code and self-describing objects.

              > x <- 1:3
              @@ -542,11 +544,11 @@ 

              Names

              - +

              Names

              -
              +

              Lists can also have names.

              > x <- list(a = 1, b = 2, c = 3) 
              @@ -565,11 +567,11 @@ 

              Names

              - +

              Names

              -
              +

              And matrices.

              > m <- matrix(1:4, nrow = 2, ncol = 2)
              @@ -584,11 +586,11 @@ 

              Names

              - +

              Summary

              -
              +

              Data Types

                @@ -606,34 +608,191 @@

                Summary

                - - - - - - - - - - - + + + - - - - \ No newline at end of file + + + + + \ No newline at end of file diff --git a/02_RProgramming/DataTypes/index.md b/02_RProgramming/DataTypes/index.md index ccd9ff364..19f8f1af4 100644 --- a/02_RProgramming/DataTypes/index.md +++ b/02_RProgramming/DataTypes/index.md @@ -8,7 +8,7 @@ framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: - lib: ../../libraries + lib: ../../librariesNew assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} @@ -200,7 +200,9 @@ NAs introduced by coercion > as.logical(x) [1] NA NA NA > as.complex(x) -[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i +[1] NA NA NA +Warning message: +NAs introduced by coercion ``` --- diff --git a/03_GettingData/dplyr/chicago.zip b/03_GettingData/dplyr/chicago.zip new file mode 100644 index 000000000..d0eb3a7fd Binary files /dev/null and b/03_GettingData/dplyr/chicago.zip differ diff --git a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf index 092a7e4c4..a04ed124a 100644 Binary files a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf and b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pdf differ diff --git a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx index 44e98c5d4..7c3ec5979 100644 Binary files a/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx and b/04_ExploratoryAnalysis/ggplot2/ppt/ggplot2.pptx differ diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd b/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd index eae99fc98..e2db82a63 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.Rmd @@ -110,7 +110,7 @@ mode : selfcontained # {standalone, draft} * Not necessary if you use R markdown * Should contain step-by-step instructions for analysis -* Here is an example [https://github.com/jtleek/swfdr/blob/master/README](https://github.com/jtleek/swfdr/blob/master/README) +* Here is an example [https://github.com/jtleek/swfdr/blob/master/README.md](https://github.com/jtleek/swfdr/blob/master/README.md) --- diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.html b/05_ReproducibleResearch/organizingADataAnalysis/index.html index 027e1d484..ab751d4e7 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.html +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.html @@ -1,59 +1,171 @@ - Organizing a Data Analysis - - - - - - - - - - - - - - - - - - - - + + +Data analysis files + + + + + + + + + + - - - - - - - - - - - -
                -

                Organizing a Data Analysis

                -

                -

                Roger D. Peng, Associate Professor of Biostatistics
                Johns Hopkins Bloomberg School of Public Health

                -
                -
                - - - -
                -

                Data analysis files

                -
                -
                -
                  + + +

                  Data analysis files

                  + +
                  • Data
                      @@ -81,32 +193,22 @@

                      Data analysis files

                  -
                - -
                +
                - -
                -

                Raw Data

                -
                -
                -

                +

                Raw Data

                + +

                • Should be stored in your analysis folder
                • If accessed from the web, include url, description, and date accessed in README
                -
                - -
                +
                + +

                Processed data

                - -
                -

                Processed data

                -
                -
                -

                +

                • Processed data should be named so it is easy to see which script generated the data.
                • @@ -114,32 +216,22 @@

                  Processed data

                • Processed data should be tidy
                -
                - -
                +
                + +

                Exploratory figures

                - -
                -

                Exploratory figures

                -
                -
                -

                +

                • Figures made during the course of your analysis, not necessarily part of your final report.
                • -
                • They do not need to be "pretty"
                • +
                • They do not need to be “pretty”
                -
                - -
                +
                - -
                -

                Final Figures

                -
                -
                -

                +

                Final Figures

                + +

                • Usually a small subset of the original figures
                • @@ -147,16 +239,11 @@

                  Final Figures

                • Possibly multiple panels
                -
                - -
                +
                + +

                Raw scripts

                - -
                -

                Raw scripts

                -
                -
                -

                +

                • May be less commented (but comments help you!)
                • @@ -164,16 +251,11 @@

                  Raw scripts

                • May include analyses that are later discarded
                -
                - -
                +
                - -
                -

                Final scripts

                -
                -
                -

                +

                Final scripts

                + +

                • Clearly commented @@ -186,16 +268,11 @@

                  Final scripts

                • Only analyses that appear in the final write-up
                -
                - -
                +
                + +

                R markdown files

                - -
                -

                R markdown files

                -
                -
                -

                +

                • R markdown files can be used to generate reproducible reports
                • @@ -203,33 +280,23 @@

                  R markdown files

                • Very easy to create in Rstudio
                -
                - -
                +
                - -
                -

                Readme files

                -
                - - -
                +
                + +

                Text of the document

                - -
                -

                Text of the document

                -
                -
                -

                +

                • It should include a title, introduction (motivation), methods (statistics you used), results (including measures of uncertainty), and conclusions (including potential problems)
                • @@ -238,56 +305,17 @@

                  Text of the document

                • References should be included for statistical methods
                -
                - -
                +
                + +

                Further resources

                - -
                -

                Further resources

                -
                - - -
                - - -
                - - - - - - - - - - - - - - - - \ No newline at end of file + + diff --git a/05_ReproducibleResearch/organizingADataAnalysis/index.md b/05_ReproducibleResearch/organizingADataAnalysis/index.md index eae99fc98..e2db82a63 100644 --- a/05_ReproducibleResearch/organizingADataAnalysis/index.md +++ b/05_ReproducibleResearch/organizingADataAnalysis/index.md @@ -110,7 +110,7 @@ mode : selfcontained # {standalone, draft} * Not necessary if you use R markdown * Should contain step-by-step instructions for analysis -* Here is an example [https://github.com/jtleek/swfdr/blob/master/README](https://github.com/jtleek/swfdr/blob/master/README) +* Here is an example [https://github.com/jtleek/swfdr/blob/master/README.md](https://github.com/jtleek/swfdr/blob/master/README.md) --- diff --git a/06_StatisticalInference/homework/hw1.Rmd b/06_StatisticalInference/homework/hw1.Rmd index 7f31c63a0..f5476f5c7 100644 --- a/06_StatisticalInference/homework/hw1.Rmd +++ b/06_StatisticalInference/homework/hw1.Rmd @@ -40,22 +40,22 @@ Creating Data Products --- &radio -Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 6% while that the mother contracted the disease is 5%. What is the probability that both contracted influenza expressed as a whole number percentage? +Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 10% while that the mother contracted the disease is 9%. What is the probability that both contracted influenza expressed as a whole number percentage? 1. 15% -2. 6% -3. 5% -4. _2%_ +2. 10% +3. 9% +4. _4%_ *** .hint -$A = Father$, $P(A) = .06$, $B = Mother$, $P(B) = .05$ +$A = Father$, $P(A) = .10$, $B = Mother$, $P(B) = .09$ $P(A\cup B) = .15$, *** .explanation -$P(A\cup B) = P(A) + P(B) - 2 P(AB)$ thus -$$.15 = .06 + .05 - 2 P(AB)$$ +$P(A\cup B) = P(A) + P(B) - P(AB)$ thus +$$.15 = .10 + .09 - P(AB)$$ ```{r} -(0.15 - .06 - .05) / 2 +.10 + .09 - .15 ``` --- &radio diff --git a/06_StatisticalInference/homework/hw1.html b/06_StatisticalInference/homework/hw1.html index e2e14b4a6..3db02f1ae 100644 --- a/06_StatisticalInference/homework/hw1.html +++ b/06_StatisticalInference/homework/hw1.html @@ -63,13 +63,13 @@

                About these slides

                -

                Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 6% while that the mother contracted the disease is 5%. What is the probability that both contracted influenza expressed as a whole number percentage?

                +

                Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 10% while that the mother contracted the disease is 9%. What is the probability that both contracted influenza expressed as a whole number percentage?

                1. 15%
                2. -
                3. 6%
                4. -
                5. 5%
                6. -
                7. 2%
                8. +
                9. 10%
                10. +
                11. 9%
                12. +
                13. 4%
                @@ -78,18 +78,18 @@

                About these slides

                -

                \(A = Father\), \(P(A) = .06\), \(B = Mother\), \(P(B) = .05\) +

                \(A = Father\), \(P(A) = .10\), \(B = Mother\), \(P(B) = .09\) \(P(A\cup B) = .15\),

                -

                \(P(A\cup B) = P(A) + P(B) - 2 P(AB)\) thus -\[.15 = .06 + .05 - 2 P(AB)\]

                +

                \(P(A\cup B) = P(A) + P(B) - P(AB)\) thus +\[.15 = .10 + .09 - P(AB)\]

                -
                (0.15 - .06 - .05) / 2
                +
                .10 + .09 - .15
                 
                -
                [1] 0.02
                +
                [1] 0.04
                 
                @@ -107,7 +107,7 @@

                About these slides

                1. 1.00
                2. 0.75
                3. -
                4. 0.50
                5. +
                6. 0.50
                7. 0.25
                diff --git a/06_StatisticalInference/homework/hw1.md b/06_StatisticalInference/homework/hw1.md index 4a7add5f0..0025d9fc3 100644 --- a/06_StatisticalInference/homework/hw1.md +++ b/06_StatisticalInference/homework/hw1.md @@ -25,27 +25,27 @@ Creating Data Products --- &radio -Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 6% while that the mother contracted the disease is 5%. What is the probability that both contracted influenza expressed as a whole number percentage? +Consider influenza epidemics for two parent heterosexual families. Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is 10% while that the mother contracted the disease is 9%. What is the probability that both contracted influenza expressed as a whole number percentage? 1. 15% -2. 6% -3. 5% -4. _2%_ +2. 10% +3. 9% +4. _4%_ *** .hint -$A = Father$, $P(A) = .06$, $B = Mother$, $P(B) = .05$ +$A = Father$, $P(A) = .10$, $B = Mother$, $P(B) = .09$ $P(A\cup B) = .15$, *** .explanation -$P(A\cup B) = P(A) + P(B) - 2 P(AB)$ thus -$$.15 = .06 + .05 - 2 P(AB)$$ +$P(A\cup B) = P(A) + P(B) - P(AB)$ thus +$$.15 = .10 + .09 - P(AB)$$ ```r -(0.15 - .06 - .05) / 2 +.10 + .09 - .15 ``` ``` -[1] 0.02 +[1] 0.04 ``` @@ -55,7 +55,7 @@ A random variable, $X$, is uniform, a box from $0$ to $1$ of height $1$. (So tha 1. 1.00 2. 0.75 -3. 0.50 +3. _0.50_ 4. 0.25 *** .hint diff --git a/06_StatisticalInference/homework/hw2.Rmd b/06_StatisticalInference/homework/hw2.Rmd index 3a568425c..b38eab299 100644 --- a/06_StatisticalInference/homework/hw2.Rmd +++ b/06_StatisticalInference/homework/hw2.Rmd @@ -157,10 +157,10 @@ Let $p=.5$ and $X$ be binomial *** .explanation -`r round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1)` +`r round(pbinom(4, prob = .5, size = 6, lower.tail = FALSE) * 100, 1)` ```{r} -round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1) +round(pbinom(4, prob = .5, size = 6, lower.tail = FALSE) * 100, 1) ``` --- &multitext @@ -210,9 +210,9 @@ If you roll ten standard dice, take their average, then repeat this process over $$Var(\bar X) = \sigma^2 /n$$ *** .explanation -The answer will be `r round( mean(1 : 6 - 3.5) ^2 / 100, 3)` -since the variance of the sampling distribution of the mean is $\sigma^2/12$ -and the variance of a die roll is +The answer will be `r round( mean( (1 : 6 - 3.5) ^2) / 10, 3)` +since the variance of the sampling distribution of the mean is $\sigma^2/10$ +where $\sigma^2$ is the variance of a single die roll, which is ```{r} mean((1 : 6 - 3.5)^2) diff --git a/06_StatisticalInference/homework/hw2.html b/06_StatisticalInference/homework/hw2.html index 5bab28043..e7a3d6a50 100644 --- a/06_StatisticalInference/homework/hw2.html +++ b/06_StatisticalInference/homework/hw2.html @@ -48,12 +48,11 @@

                About these slides

                  -
                • These are some practice problems for Statistical Inference Quiz 1
                • +
                • These are some practice problems for Statistical Inference Quiz 2
                • They were created using slidify interactive which you will learn in Creating Data Products
                • Please help improve this with pull requests here -(https://github.com/bcaffo/courses) -runif(1)
                • +(https://github.com/bcaffo/courses)
                @@ -288,12 +287,12 @@

                About these slides

                -

                89.1

                +

                10.9

                -
                round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1)
                +
                round(pbinom(4, prob = .5, size = 6, lower.tail = FALSE) * 100, 1)
                 
                -
                [1] 89.1
                +
                [1] 10.9
                 
                @@ -388,9 +387,9 @@

                About these slides

                -

                The answer will be 0 -since the variance of the sampling distribution of the mean is \(\sigma^2/12\) -and the variance of a die roll is

                +

                The answer will be 0.292 +since the variance of the sampling distribution of the mean is \(\sigma^2/10\) +where \(\sigma^2\) is the variance of a single die roll, which is

                mean((1 : 6 - 3.5)^2)
                 
                diff --git a/06_StatisticalInference/homework/hw2.md b/06_StatisticalInference/homework/hw2.md index 32ef6b25f..44ecbe56b 100644 --- a/06_StatisticalInference/homework/hw2.md +++ b/06_StatisticalInference/homework/hw2.md @@ -16,12 +16,11 @@ mode : selfcontained # {standalone, draft} ## About these slides -- These are some practice problems for Statistical Inference Quiz 1 +- These are some practice problems for Statistical Inference Quiz 2 - They were created using slidify interactive which you will learn in Creating Data Products - Please help improve this with pull requests here (https://github.com/bcaffo/courses) -runif(1) --- &radio The probability that a manuscript gets accepted to a journal is 12% (say). However, @@ -182,15 +181,15 @@ Let $p=.5$ and $X$ be binomial *** .explanation -89.1 +10.9 ```r -round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1) +round(pbinom(4, prob = .5, size = 6, lower.tail = FALSE) * 100, 1) ``` ``` -[1] 89.1 +[1] 10.9 ``` @@ -247,9 +246,9 @@ If you roll ten standard dice, take their average, then repeat this process over $$Var(\bar X) = \sigma^2 /n$$ *** .explanation -The answer will be 0 -since the variance of the sampling distribution of the mean is $\sigma^2/12$ -and the variance of a die roll is +The answer will be 0.292 +since the variance of the sampling distribution of the mean is $\sigma^2/10$ +where $\sigma^2$ is the variance of a single die roll, which is ```r diff --git a/06_StatisticalInference/homework/hw3.Rmd b/06_StatisticalInference/homework/hw3.Rmd new file mode 100644 index 000000000..df1866fc0 --- /dev/null +++ b/06_StatisticalInference/homework/hw3.Rmd @@ -0,0 +1,206 @@ +--- +title : Homework 3 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- +```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F, results='hide'} +# make this an external chunk that can be included in any file +library(knitr) +options(width = 100) +opts_chunk$set(message = F, error = F, warning = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache.path = '.cache/', fig.path = 'fig/') + +options(xtable.type = 'html') +knit_hooks$set(inline = function(x) { + if(is.numeric(x)) { + round(x, getOption('digits')) + } else { + paste(as.character(x), collapse = ', ') + } +}) +knit_hooks$set(plot = knitr:::hook_plot_html) +``` + +## About these slides +- These are some practice problems for Statistical Inference Quiz 3 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) + + + +--- &multitext +Load the data set `mtcars` in the `datasets` R package. Calculate a +95% confidence interval to the nearest MPG. + +1. What is the lower endpoint of the interval? +2. What is the upper endpoint of the interval? + +*** .hint +Do `library(datasets)` and then `data(mtcars)` to get the data. +Consider `t.test` for calculations. You may have to install +the datasets package. + + +*** .explanation +```{r} +library(datasets); data(mtcars) +round(t.test(mtcars$mpg)$conf.int) +``` + +`r round(min(t.test(mtcars$mpg)$conf.int))` +`r round(max(t.test(mtcars$mpg)$conf.int))` + +--- &multitext +Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero? + +1. Give the number here to two decimal places + +*** .hint +The t interval is $\bar x t_{.95, 8}\pm s /sqrt{n}$ + +*** .explanation +`r round(qt(.95, df = 8) * 1 / 3, 2)` + +We want $\bar x = t_{.95} s / sqrt{n}$ +```{r} +round(qt(.95, df = 8) * 1 / 3, 2) +``` + + +--- &radio +An independent group Student's T interval is used over +a paired T interval when: + +1. The observations are paired between the groups. +2. _The observations between the groups are natually assumed to be statistically independent_ +3. As long as you do it correctly, either is fine. +4. More details are needed to answer this question + +*** .hint +A paired interval is for paired observations. + +*** .explanation +We can't pair them if the groups are independent of each other as well as independent within themselves. + + +--- &multitext +Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance. + +1. What is the lower endpoint of the interval to 1 decimal place? +2. What is the upper endpoint of the interval to 1 decimal place? + +*** .hint +Use `t.test` with `var.equal=TRUE` + +*** .explanation + +```{r} +m4 <- mtcars$mpg[mtcars$cyl == 4] +m6 <- mtcars$mpg[mtcars$cyl == 6] +#this does 4 - 6 +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) +``` + +`r round(min(confint), 1)` +`r round(max(confint), 1)` + + +--- &radio +If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do? + +1. _Make your interval as wide as possible_ +2. Make your interval as small as possible +3. Call the authorities + +*** .hint +C'mon. You don't need a hint + +*** .explanation +This is just an example of what happens to confidence intervals as you +increase the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width + +--- &radio + +Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? + +1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG +2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ +3. The interval does not tell you anything about the hypothesis test; you have to do the test. +4. The interval contains 0 suggesting no difference. + +*** .hint +Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem. + +*** .explanation +The interval was conducted subtracting 4 - 6 and was entirely above zero. + +--- &multitext +Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups. + +What is the pooled variance estimate? (to 2 decimal places) + + +*** .hint +The sample sizes are equal, so the pooled variance is the average of the +individual variances + + +*** .explanation +```{r} +n1 <- n2 <- 9 +x1 <- -3 ##treated +x2 <- 1 ##placebo +s1 <- 1.5 ##treated +s2 <- 1.8 ##placebo +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) +``` +`r round(spsq, 2)` + + +--- &radio + +For Binomial data the maximum likelihood estimate for the probability of +a success is + +1. _The proportion of successes_ +2. The proportion of failures +3. A shrunken version of the proportion of successes +4. A shrunken version of the proportion of failures + +*** .hint +Look back at the notes about likelihood. + +*** .explanation +The MLE for binomial data is always the proportion of successes. + +--- &radio + +Bayesian inference requires + +1. A type I error rate +2. Setting your confidence level +3. _Assigning a prior probability distribution_ +4. Evaluating frequency error rates + +*** .explanation +All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. + + diff --git a/06_StatisticalInference/homework/hw3.html b/06_StatisticalInference/homework/hw3.html new file mode 100644 index 000000000..6e54ea85e --- /dev/null +++ b/06_StatisticalInference/homework/hw3.html @@ -0,0 +1,476 @@ + + + + Homework 3 for Stat Inference + + + + + + + + + + + + + + + + + + + + + + + + + + + +
                +

                Homework 3 for Stat Inference

                +

                Extra problems for Stat Inference

                +

                Brian Caffo
                Johns Hopkins Bloomberg School of Public Health

                +
                +
                +
                + + + + +
                +

                About these slides

                +
                +
                +
                  +
                • These are some practice problems for Statistical Inference Quiz 3
                • +
                • They were created using slidify interactive which you will learn in +Creating Data Products
                • +
                • Please help improve this with pull requests here +(https://github.com/bcaffo/courses)
                • +
                + +
                + +
                + + +
                + +
                +

                Load the data set mtcars in the datasets R package. Calculate a +95% confidence interval to the nearest MPG.

                + +
                  +
                1. What is the lower endpoint of the interval?
                2. +
                3. What is the upper endpoint of the interval?
                4. +
                + + + + + + +
                +

                Do library(datasets) and then data(mtcars) to get the data. +Consider t.test for calculations. You may have to install +the datasets package.

                + +
                +
                +
                library(datasets); data(mtcars)
                +round(t.test(mtcars$mpg)$conf.int)
                +
                + +
                [1] 18 22
                +attr(,"conf.level")
                +[1] 0.95
                +
                + +

                18 +22

                + +
                +
                +
                + +
                + + +
                + +
                +

                Suppose that data of 9 paired differences has a standard error of \(1\), what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero?

                + +
                  +
                1. Give the number here to two decimal places
                2. +
                + + + + + + +
                +

                The t interval is \(\bar x t_{.95, 8}\pm s /sqrt{n}\)

                + +
                +
                +

                0.62

                + +

                We want \(\bar x = t_{.95} s / sqrt{n}\)

                + +
                round(qt(.95, df = 8) * 1 / 3, 2)
                +
                + +
                [1] 0.62
                +
                + +
                +
                +
                + +
                + + +
                + +
                +

                An independent group Student's T interval is used over +a paired T interval when:

                + +
                  +
                1. The observations are paired between the groups.
                2. +
                3. The observations between the groups are natually assumed to be statistically independent
                4. +
                5. As long as you do it correctly, either is fine.
                6. +
                7. More details are needed to answer this question
                8. +
                + + + + + + +
                +

                A paired interval is for paired observations.

                + +
                +
                +

                We can't pair them if the groups are independent of each other as well as independent within themselves.

                + +
                +
                +
                + +
                + + +
                + +
                +

                Consider the mtcars dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance.

                + +
                  +
                1. What is the lower endpoint of the interval to 1 decimal place?
                2. +
                3. What is the upper endpoint of the interval to 1 decimal place?
                4. +
                + + + + + + +
                +

                Use t.test with var.equal=TRUE

                + +
                +
                +
                m4 <- mtcars$mpg[mtcars$cyl == 4]
                +m6 <- mtcars$mpg[mtcars$cyl == 6]
                +#this does 4 - 6
                +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int)
                +
                + +

                3.2 +10.7

                + +
                +
                +
                + +
                + + +
                + +
                +

                If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do?

                + +
                  +
                1. Make your interval as wide as possible
                2. +
                3. Make your interval as small as possible
                4. +
                5. Call the authorities
                6. +
                + + + + + + +
                +

                C'mon. You don't need a hint

                + +
                +
                +

                This is just an example of what happens to confidence intervals as you +increase the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width

                + +
                +
                +
                + +
                + + +
                + +
                +

                Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude?

                + +
                  +
                1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG
                2. +
                3. The interval is above zero, suggesting 4 is better than 6 in the terms of MPG
                4. +
                5. The interval does not tell you anything about the hypothesis test; you have to do the test.
                6. +
                7. The interval contains 0 suggesting no difference.
                8. +
                + + + + + + +
                +

                Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem.

                + +
                +
                +

                The interval was conducted subtracting 4 - 6 and was entirely above zero.

                + +
                +
                +
                + +
                + + +
                + +
                +

                Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups.

                + +

                What is the pooled variance estimate? (to 2 decimal places)

                + + + + + + +
                +

                The sample sizes are equal, so the pooled variance is the average of the +individual variances

                + +
                +
                +
                n1 <- n2 <- 9
                +x1 <- -3  ##treated
                +x2 <- 1  ##placebo
                +s1 <- 1.5  ##treated
                +s2 <- 1.8  ##placebo
                +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2)
                +
                + +

                2.75

                + +
                +
                +
                + +
                + + +
                + +
                +

                For Binomial data the maximum likelihood estimate for the probability of +a success is

                + +
                  +
                1. The proportion of successes
                2. +
                3. The proportion of failures
                4. +
                5. A shrunken version of the proportion of successes
                6. +
                7. A shrunken version of the proportion of failures
                8. +
                + + + + + + +
                +

                Look back at the notes about likelihood.

                + +
                +
                +

                The MLE for binomial data is always the proportion of successes.

                + +
                +
                +
                + +
                + + +
                + +
                +

                Bayesian inference requires

                + +
                  +
                1. A type I error rate
                2. +
                3. Setting your confidence level
                4. +
                5. Assigning a prior probability distribution
                6. +
                7. Evaluating frequency error rates
                8. +
                + + + + + + +
                +

                All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior.

                + +
                +
                +
                + +
                + + +
                + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw3.md b/06_StatisticalInference/homework/hw3.md new file mode 100644 index 000000000..93859ed5b --- /dev/null +++ b/06_StatisticalInference/homework/hw3.md @@ -0,0 +1,210 @@ +--- +title : Homework 3 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- + + + +## About these slides +- These are some practice problems for Statistical Inference Quiz 3 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) + + + +--- &multitext +Load the data set `mtcars` in the `datasets` R package. Calculate a +95% confidence interval to the nearest MPG. + +1. What is the lower endpoint of the interval? +2. What is the upper endpoint of the interval? + +*** .hint +Do `library(datasets)` and then `data(mtcars)` to get the data. +Consider `t.test` for calculations. You may have to install +the datasets package. + + +*** .explanation + +```r +library(datasets); data(mtcars) +round(t.test(mtcars$mpg)$conf.int) +``` + +``` +[1] 18 22 +attr(,"conf.level") +[1] 0.95 +``` + + +18 +22 + +--- &multitext +Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero? + +1. Give the number here to two decimal places + +*** .hint +The t interval is $\bar x t_{.95, 8}\pm s /sqrt{n}$ + +*** .explanation +0.62 + +We want $\bar x = t_{.95} s / sqrt{n}$ + +```r +round(qt(.95, df = 8) * 1 / 3, 2) +``` + +``` +[1] 0.62 +``` + + + +--- &radio +An independent group Student's T interval is used over +a paired T interval when: + +1. The observations are paired between the groups. +2. _The observations between the groups are natually assumed to be statistically independent_ +3. As long as you do it correctly, either is fine. +4. More details are needed to answer this question + +*** .hint +A paired interval is for paired observations. + +*** .explanation +We can't pair them if the groups are independent of each other as well as independent within themselves. + + +--- &multitext +Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance. + +1. What is the lower endpoint of the interval to 1 decimal place? +2. What is the upper endpoint of the interval to 1 decimal place? + +*** .hint +Use `t.test` with `var.equal=TRUE` + +*** .explanation + + +```r +m4 <- mtcars$mpg[mtcars$cyl == 4] +m6 <- mtcars$mpg[mtcars$cyl == 6] +#this does 4 - 6 +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) +``` + + +3.2 +10.7 + + +--- &radio +If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do? + +1. _Make your interval as wide as possible_ +2. Make your interval as small as possible +3. Call the authorities + +*** .hint +C'mon. You don't need a hint + +*** .explanation +This is just an example of what happens to confidence intervals as you +increase the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width + +--- &radio + +Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? + +1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG +2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ +3. The interval does not tell you anything about the hypothesis test; you have to do the test. +4. The interval contains 0 suggesting no difference. + +*** .hint +Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem. + +*** .explanation +The interval was conducted subtracting 4 - 6 and was entirely above zero. + +--- &multitext +Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups. + +What is the pooled variance estimate? (to 2 decimal places) + + +*** .hint +The sample sizes are equal, so the pooled variance is the average of the +individual variances + + +*** .explanation + +```r +n1 <- n2 <- 9 +x1 <- -3 ##treated +x2 <- 1 ##placebo +s1 <- 1.5 ##treated +s2 <- 1.8 ##placebo +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) +``` + +2.75 + + +--- &radio + +For Binomial data the maximum likelihood estimate for the probability of +a success is + +1. _The proportion of successes_ +2. The proportion of failures +3. A shrunken version of the proportion of successes +4. A shrunken version of the proportion of failures + +*** .hint +Look back at the notes about likelihood. + +*** .explanation +The MLE for binomial data is always the proportion of successes. + +--- &radio + +Bayesian inference requires + +1. A type I error rate +2. Setting your confidence level +3. _Assigning a prior probability distribution_ +4. Evaluating frequency error rates + +*** .explanation +All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. + + diff --git a/06_StatisticalInference/homework/hw4.Rmd b/06_StatisticalInference/homework/hw4.Rmd new file mode 100644 index 000000000..bf5a8da3b --- /dev/null +++ b/06_StatisticalInference/homework/hw4.Rmd @@ -0,0 +1,37 @@ +--- +title : Homework 4 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- +```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F, results='hide'} +# make this an external chunk that can be included in any file +library(knitr) +options(width = 100) +opts_chunk$set(message = F, error = F, warning = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache.path = '.cache/', fig.path = 'fig/') + +options(xtable.type = 'html') +knit_hooks$set(inline = function(x) { + if(is.numeric(x)) { + round(x, getOption('digits')) + } else { + paste(as.character(x), collapse = ', ') + } +}) +knit_hooks$set(plot = knitr:::hook_plot_html) +``` + +## About these slides +- These are some practice problems for Statistical Inference Quiz 4 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) diff --git a/06_StatisticalInference/homework/hw4.html b/06_StatisticalInference/homework/hw4.html new file mode 100644 index 000000000..565621a6d --- /dev/null +++ b/06_StatisticalInference/homework/hw4.html @@ -0,0 +1,112 @@ + + + + Homework 4 for Stat Inference + + + + + + + + + + + + + + + + + + + + + + + + + + + +
                +

                Homework 4 for Stat Inference

                +

                Extra problems for Stat Inference

                +

                Brian Caffo
                Johns Hopkins Bloomberg School of Public Health

                +
                +
                +
                + + + + +
                +

                About these slides

                +
                +
                +
                  +
                • These are some practice problems for Statistical Inference Quiz 4
                • +
                • They were created using slidify interactive which you will learn in +Creating Data Products
                • +
                • Please help improve this with pull requests here +(https://github.com/bcaffo/courses)
                • +
                + +
                + +
                + + +
                + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw4.md b/06_StatisticalInference/homework/hw4.md new file mode 100644 index 000000000..a22e64543 --- /dev/null +++ b/06_StatisticalInference/homework/hw4.md @@ -0,0 +1,23 @@ +--- +title : Homework 4 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- + + + +## About these slides +- These are some practice problems for Statistical Inference Quiz 4 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) diff --git a/08_PracticalMachineLearning/019predictingWithTrees/index.Rmd b/08_PracticalMachineLearning/019predictingWithTrees/index.Rmd index a3cccb90f..1c4dc6e20 100644 --- a/08_PracticalMachineLearning/019predictingWithTrees/index.Rmd +++ b/08_PracticalMachineLearning/019predictingWithTrees/index.Rmd @@ -124,7 +124,7 @@ plot(x,y,xaxt="n",yaxt="n",cex=3,col=c(rep("blue",8),rep("red",8)),pch=19) * __Misclassification:__ $8/16 = 0.5$ * __Gini:__ $1 - [(8/16)^2 + (8/16)^2] = 0.5$ -* __Information:__$-[1/16 \times log2(1/16) + 15/16 \times log2(15/16)] = 1$ +* __Information:__$-[8/16 \times log2(8/16) + 8/16 \times log2(8/16)] = 1$ diff --git a/08_PracticalMachineLearning/025combiningPredictors/index.Rmd b/08_PracticalMachineLearning/025combiningPredictors/index.Rmd index 958b2c4d8..ca6ec2f93 100644 --- a/08_PracticalMachineLearning/025combiningPredictors/index.Rmd +++ b/08_PracticalMachineLearning/025combiningPredictors/index.Rmd @@ -67,7 +67,7 @@ BellKor = Combination of 107 predictors Suppose we have 5 completely independent classifiers If accuracy is 70% for each: - * $10\times(0.7)^3(0.3)^2 + 5\times(0.7)^4(0.3)^2 + (0.7)^5$ + * $10\times(0.7)^3(0.3)^2 + 5\times(0.7)^4(0.3)^1 + (0.7)^5$ * 83.7% majority vote accuracy With 101 independent classifiers diff --git a/09_DevelopingDataProducts/rStudioPresent/index.Rpres b/09_DevelopingDataProducts/rStudioPresent/index.Rpres index a237721f7..00c9487c7 100644 --- a/09_DevelopingDataProducts/rStudioPresent/index.Rpres +++ b/09_DevelopingDataProducts/rStudioPresent/index.Rpres @@ -1,132 +1,132 @@ -RStudio Presenter -=== -author: Brian Caffo, Jeff Leek Roger Peng -date: `r format(Sys.Date(), format="%B %d %Y")` -transition: rotate - - -Department of Biostatistics -Bloomberg School of Public Health -Johns Hopkins University -Coursera Data Science Specialization - - - -RStudio Presentation -=== -- RStudio created a presentation authoring tool within their -development environment. -- If you are familiar with slidify, you will also be familiar with this tool - - Code is authored in a generalized markdown format that allows for code chunks - - The output is an html5 presentation - - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired - - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file - -Authoring content -=== -- This is a fairly complete guide - - http://www.rstudio.com/ide/docs/presentations/overview -- Quick start is - - `file` then `New File` then `R Presentation` - - (`alt-f` then `f` then `p` if you want key strokes) - - Use basically the same R markdown format for authoring as slidify/knitr - - Single quotes for inline code - - Tripple qutoes for block code - - Same options for code evaluation, caching, hiding etcetera - -Compiling and tools -=== -- R Studio auto formats and runs the code when you save the document -- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ -- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck -- Clicking on `more` yields options for - - Clearning the knitr cache - - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) - - Create a html file to save where you want) -- A refresh button -- A zoom button that brings up a full window - -Visuals -=== -transition: linear - -- R Studio has made it easy to get some cool html5 effects, like cube transitions -with simple options in YAML-like code after the first slide such as -`transition: rotate` -- You can specify it in a slide-by-slide basis - -Here's the option "linear" -=== -transition: linear - -- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) -- Tansition options - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation - -Hierarchical organization -=== -type: section -- If you want a hierarchical organization structure, just add a `type: typename` option after the slide -- This changes the default appearance - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation -- This is of type `section` - -Here's a subsection -=== -type: subsection - -Two columns -=== -- Do whatever for column one -- Then put `***` on a line by itself with blank lines before and after - -*** - -- Then do whatever for column two - - -Changing the slide font -========================================================== -font-import: http://fonts.googleapis.com/css?family=Risque -font-family: 'Risque' - -- Add a `font-family: fontname` option after the slide - - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance -- Specified in the same way as css font families - - http://www.w3schools.com/cssref/css_websafe_fonts.asp -- Use `font-import: url` to import fonts -- Important caveats - - Fonts must be present on the system that you're presenting on, or it will go to a fallback font - - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) -- This is the `Risque` - - http://fonts.googleapis.com/css?family=Risque - -Really changing things -=== -- If you know html5 and CSS well, then you can basically change whatever you want -- A css file with the same names as your presentation will be autoimported -- You can use `css: file.css` to import a css file -- You have to create named classes and then use `class: classname` to get slide-specific style control from your css - - (Or you can apply then within a ``) -- Ultimately, you have an html file, that you can edit as you wish - - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product - -Slidify versus R Studio Presenter -=== -**Slidify** -- Flexible control from the R MD file -- Under rapid ongoing development -- Large user base -- Lots and lots of styles and options -- Steeper learning curve -- More command-line oriented - -*** -**R Studio Presenter** -- Embedded in R Studio -- More GUI oriented -- Very easy to get started -- Smaller set of easy styles and options -- Default styles look very nice -- Ultimately as flexible as slidify with a little CSS and HTML knowledge - +RStudio Presenter +=== +author: Brian Caffo, Jeff Leek Roger Peng +date: `r format(Sys.Date(), format="%B %d %Y")` +transition: rotate + + +Department of Biostatistics +Bloomberg School of Public Health +Johns Hopkins University +Coursera Data Science Specialization + + + +RStudio Presentation +=== +- RStudio created a presentation authoring tool within their +development environment. +- If you are familiar with slidify, you will also be familiar with this tool + - Code is authored in a generalized markdown format that allows for code chunks + - The output is an html5 presentation + - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired + - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file + +Authoring content +=== +- This is a fairly complete guide + - http://www.rstudio.com/ide/docs/presentations/overview +- Quick start is + - `file` then `New File` then `R Presentation` + - (`alt-f` then `f` then `p` if you want key strokes) + - Use basically the same R markdown format for authoring as slidify/knitr + - Single quotes for inline code + - Tripple qutoes for block code + - Same options for code evaluation, caching, hiding etcetera + +Compiling and tools +=== +- R Studio auto formats and runs the code when you save the document +- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ +- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck +- Clicking on `more` yields options for + - Clearning the knitr cache + - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) + - Create a html file to save where you want) +- A refresh button +- A zoom button that brings up a full window + +Visuals +=== +transition: linear + +- R Studio has made it easy to get some cool html5 effects, like cube transitions +with simple options in YAML-like code after the first slide such as +`transition: rotate` +- You can specify it in a slide-by-slide basis + +Here's the option "linear" +=== +transition: linear + +- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) +- Tansition options + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation + +Hierarchical organization +=== +type: section +- If you want a hierarchical organization structure, just add a `type: typename` option after the slide +- This changes the default appearance + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation +- This is of type `section` + +Here's a subsection +=== +type: subsection + +Two columns +=== +- Do whatever for column one +- Then put `***` on a line by itself with blank lines before and after + +*** + +- Then do whatever for column two + + +Changing the slide font +========================================================== +font-import: http://fonts.googleapis.com/css?family=Risque +font-family: 'Risque' + +- Add a `font-family: fontname` option after the slide + - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance +- Specified in the same way as css font families + - http://www.w3schools.com/cssref/css_websafe_fonts.asp +- Use `font-import: url` to import fonts +- Important caveats + - Fonts must be present on the system that you're presenting on, or it will go to a fallback font + - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) +- This is the `Risque` + - http://fonts.googleapis.com/css?family=Risque + +Really changing things +=== +- If you know html5 and CSS well, then you can basically change whatever you want +- A css file with the same names as your presentation will be autoimported +- You can use `css: file.css` to import a css file +- You have to create named classes and then use `class: classname` to get slide-specific style control from your css + - (Or you can apply then within a ``) +- Ultimately, you have an html file, that you can edit as you wish + - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product + +Slidify versus R Studio Presenter +=== +**Slidify** +- Flexible control from the R MD file +- Under rapid ongoing development +- Large user base +- Lots and lots of styles and options +- Steeper learning curve +- More command-line oriented + +*** +**R Studio Presenter** +- Embedded in R Studio +- More GUI oriented +- Very easy to get started +- Smaller set of easy styles and options +- Default styles look very nice +- Ultimately as flexible as slidify with a little CSS and HTML knowledge + diff --git a/09_DevelopingDataProducts/rStudioPresent/index.md b/09_DevelopingDataProducts/rStudioPresent/index.md index 399fb071a..b998542ae 100644 --- a/09_DevelopingDataProducts/rStudioPresent/index.md +++ b/09_DevelopingDataProducts/rStudioPresent/index.md @@ -1,132 +1,132 @@ -RStudio Presenter -=== -author: Brian Caffo, Jeff Leek Roger Peng -date: April 24 2014 -transition: rotate - - -Department of Biostatistics -Bloomberg School of Public Health -Johns Hopkins University -Coursera Data Science Specialization - - - -RStudio Presentation -=== -- RStudio created a presentation authoring tool within their -development environment. -- If you are familiar with slidify, you will also be familiar with this tool - - Code is authored in a generalized markdown format that allows for code chunks - - The output is an html5 presentation - - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired - - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file - -Authoring content -=== -- This is a fairly complete guide - - http://www.rstudio.com/ide/docs/presentations/overview -- Quick start is - - `file` then `New File` then `R Presentation` - - (`alt-f` then `f` then `p` if you want key strokes) - - Use basically the same R markdown format for authoring as slidify/knitr - - Single quotes for inline code - - Tripple qutoes for block code - - Same options for code evaluation, caching, hiding etcetera - -Compiling and tools -=== -- R Studio auto formats and runs the code when you save the document -- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ -- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck -- Clicking on `more` yields options for - - Clearning the knitr cache - - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) - - Create a html file to save where you want) -- A refresh button -- A zoom button that brings up a full window - -Visuals -=== -transition: linear - -- R Studio has made it easy to get some cool html5 effects, like cube transitions -with simple options in YAML-like code after the first slide such as -`transition: rotate` -- You can specify it in a slide-by-slide basis - -Here's the option "linear" -=== -transition: linear - -- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) -- Tansition options - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation - -Hierarchical organization -=== -type: section -- If you want a hierarchical organization structure, just add a `type: typename` option after the slide -- This changes the default appearance - - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation -- This is of type `section` - -Here's a subsection -=== -type: subsection - -Two columns -=== -- Do whatever for column one -- Then put `***` on a line by itself with blank lines before and after - -*** - -- Then do whatever for column two - - -Changing the slide font -========================================================== -font-import: http://fonts.googleapis.com/css?family=Risque -font-family: 'Risque' - -- Add a `font-family: fontname` option after the slide - - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance -- Specified in the same way as css font families - - http://www.w3schools.com/cssref/css_websafe_fonts.asp -- Use `font-import: url` to import fonts -- Important caveats - - Fonts must be present on the system that you're presenting on, or it will go to a fallback font - - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) -- This is the `Risque` - - http://fonts.googleapis.com/css?family=Risque - -Really changing things -=== -- If you know html5 and CSS well, then you can basically change whatever you want -- A css file with the same names as your presentation will be autoimported -- You can use `css: file.css` to import a css file -- You have to create named classes and then use `class: classname` to get slide-specific style control from your css - - (Or you can apply then within a ``) -- Ultimately, you have an html file, that you can edit as you wish - - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product - -Slidify versus R Studio Presenter -=== -**Slidify** -- Flexible control from the R MD file -- Under rapid ongoing development -- Large user base -- Lots and lots of styles and options -- Steeper learning curve -- More command-line oriented - -*** -**R Studio Presenter** -- Embedded in R Studio -- More GUI oriented -- Very easy to get started -- Smaller set of easy styles and options -- Default styles look very nice -- Ultimately as flexible as slidify with a little CSS and HTML knowledge - +RStudio Presenter +=== +author: Brian Caffo, Jeff Leek Roger Peng +date: May 21 2014 +transition: rotate + + +Department of Biostatistics +Bloomberg School of Public Health +Johns Hopkins University +Coursera Data Science Specialization + + + +RStudio Presentation +=== +- RStudio created a presentation authoring tool within their +development environment. +- If you are familiar with slidify, you will also be familiar with this tool + - Code is authored in a generalized markdown format that allows for code chunks + - The output is an html5 presentation + - The file index for the presenter file is .Rpres, which gets converted to an .md file and then to an html file if desired + - There's a preview tool in RStudio and GUIs for publishing to Rpubs or viewing/creating an html file + +Authoring content +=== +- This is a fairly complete guide + - http://www.rstudio.com/ide/docs/presentations/overview +- Quick start is + - `file` then `New File` then `R Presentation` + - (`alt-f` then `f` then `p` if you want key strokes) + - Use basically the same R markdown format for authoring as slidify/knitr + - Single quotes for inline code + - Tripple qutoes for block code + - Same options for code evaluation, caching, hiding etcetera + +Compiling and tools +=== +- R Studio auto formats and runs the code when you save the document +- Mathjax JS library is loaded by default so that `$x^2$` yields $x^2$ +- Slide navigation button on the preview; clicking on the notepad icon takes you to that slide in the deck +- Clicking on `more` yields options for + - Clearning the knitr cache + - Viewing in a browser (creates a temporay html file in `AppData/local/temp` for me) + - Create a html file to save where you want) +- A refresh button +- A zoom button that brings up a full window + +Visuals +=== +transition: linear + +- R Studio has made it easy to get some cool html5 effects, like cube transitions +with simple options in YAML-like code after the first slide such as +`transition: rotate` +- You can specify it in a slide-by-slide basis + +Here's the option "linear" +=== +transition: linear + +- Just put `transition: linear` right after the slide creation (three equal signs or more in a row) +- Tansition options + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation + +Hierarchical organization +=== +type: section +- If you want a hierarchical organization structure, just add a `type: typename` option after the slide +- This changes the default appearance + - http://www.rstudio.com/ide/docs/presentations/slide_transitions_and_navigation +- This is of type `section` + +Here's a subsection +=== +type: subsection + +Two columns +=== +- Do whatever for column one +- Then put `***` on a line by itself with blank lines before and after + +*** + +- Then do whatever for column two + + +Changing the slide font +========================================================== +font-import: http://fonts.googleapis.com/css?family=Risque +font-family: 'Risque' + +- Add a `font-family: fontname` option after the slide + - http://www.rstudio.com/ide/docs/presentations/customizing_fonts_and_appearance +- Specified in the same way as css font families + - http://www.w3schools.com/cssref/css_websafe_fonts.asp +- Use `font-import: url` to import fonts +- Important caveats + - Fonts must be present on the system that you're presenting on, or it will go to a fallback font + - You have to be connected to the internet to use an imported font (so don't rely on this for offline presentations) +- This is the `Risque` + - http://fonts.googleapis.com/css?family=Risque + +Really changing things +=== +- If you know html5 and CSS well, then you can basically change whatever you want +- A css file with the same names as your presentation will be autoimported +- You can use `css: file.css` to import a css file +- You have to create named classes and then use `class: classname` to get slide-specific style control from your css + - (Or you can apply then within a ``) +- Ultimately, you have an html file, that you can edit as you wish + - This should be viewed as a last resort, as the whole point is to have reproducible presentations, but may be the easiest way to get the exact style control you want for a final product + +Slidify versus R Studio Presenter +=== +**Slidify** +- Flexible control from the R MD file +- Under rapid ongoing development +- Large user base +- Lots and lots of styles and options +- Steeper learning curve +- More command-line oriented + +*** +**R Studio Presenter** +- Embedded in R Studio +- More GUI oriented +- Very easy to get started +- Smaller set of easy styles and options +- Default styles look very nice +- Ultimately as flexible as slidify with a little CSS and HTML knowledge +