Skip to content

Commit c2903d3

Browse files
committed
reordering content + some rewording
1 parent 60e19f5 commit c2903d3

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

dataset.qmd

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ title: "Our Running Example"
66

77
![](images/streaming-services.png){width="370"}
88

9-
This workshop utilizes the **streaming-master-messy** comma-separated value (CSV) file which is derived from the movies and TV shows featured by major streaming services and distributed in Kaggle Project under a CC0 Public License:
9+
This workshop utilizes the **streaming-master-messy** comma-separated value (CSV) file which is derived from the movies and TV shows featured by major streaming services and distributed in Kaggle Project under a CC0 Public License[^1].
1010

11-
Henrique, D. (2020). *A simple movie & TV show recommendation system*. Kaggle. <https://www.kaggle.com/code/dgoenrique/a-simple-movie-tv-show-recommendation-system?select=credits.csv>
11+
[^1]: Henrique, D. (2020). *A simple movie & TV show recommendation system*. Kaggle. <https://www.kaggle.com/code/dgoenrique/a-simple-movie-tv-show-recommendation-system?select=credits.csv>
1212

13-
We have merged six `titles.csv` files—each representing one of the streaming services featured in this project (Amazon Prime Video, Apple TV+, Disney+, HBO Max, Netflix, and Paramount)—into a single master dataset.
13+
We have merged six `titles.csv` files—each representing one of the streaming services featured in this project (Amazon Prime Video, Apple TV+, Disney+, HBO Max, Netflix, and Paramount)—into a single master spreadsheet.
1414

1515
The dataset contains 25,223 rows with movies and TV series titles along with the following variables as described in the data dictionary:
1616

@@ -33,11 +33,6 @@ The dataset contains 25,223 rows with movies and TV series titles along with the
3333
- tmdb_popularity: Votes on The Movie Database (TMDB).
3434
- tmdb_score: Score on on The Movie Database TMDB.
3535

36-
::: {.callout-important collapse="true"}
37-
## Disclaimer
38-
39-
Please note that, for the purposes of this lesson, the data has been intentionally modified to support the associated exercises. Therefore, we do not vouch for the use of this dataset for actual research. The data has been specifically edited and curated for instructional purposes and may not represent a fully accurate or comprehensive source of data for formal analysis.
40-
:::
4136

4237
## Downloading the Dataset
4338

@@ -49,6 +44,12 @@ Now that we have a clearer understanding of the data we'll be working with, plea
4944

5045
Let's open the file and check how the data looks like. Also, can you spot your favorite movie or TV series on it?
5146

47+
::: {.callout-important collapse="true"}
48+
## Disclaimer
49+
50+
Please note that, for the purposes of this lesson, the data has been intentionally modified to support the associated exercises. Therefore, we do not vouch for the use of this dataset for actual research. The data has been specifically edited and curated for instructional purposes and may not represent a fully accurate or comprehensive source of data for formal analysis.
51+
:::
52+
5253
## Our Challenge
5354

5455
In this workshop, we will explore how OpenRefine can support data organization and preparation for analysis. For instance, you might want to compare scores across genres, plot the most common age classifications over the years, or investigate whether the country of origin affects popularity. These are just a few examples of the kinds of insights you could uncover once your data is properly cleaned and organized. But before that the data has to be cleaned and prepared accordingly. Ready?

0 commit comments

Comments
 (0)