Panopticum News, data preparation

Introduction

The following collection of notebooks is part of the Panopticum News Project data analysis and preparation. For now, I use one year of NYT coverage as a sample, in a further iteration the notebooks will be (hopefully) adapted to crunch data coming from other news outlets. This means that the notebooks work for NYT data specifically. These data can be downloaded using the official NYT API and are not included in this folder.

Structure of the folder

└── 📁data
    └── 📁categories
        └── categories-over-time.json
        └── network.json
    └── 📁places
        └── nyt-coverage-places.json
        └── nyt-sum-places.json
        └── top_keywords_by_location.json
└── 📁notebooks
    └── 📁categories
        └── keywords_places.ipynb
        └── keywords-structure.ipynb
        └── places-analysis.ipynb
        └── places-countries-kws.ipynb
        └── places.ipynb
    └── 📁images
        └── prepare-img-download.ipynb
        └── face-recognition.ipynb
        └── get-images.sh
        └── 📁assets
    └── 📁other  
        └── temporal-overview.ipynb
    └── create-temp-data.ipynb
└── README.md

The src folder contains the Jupyter Notebooks. The data folder contains the exported data.

./notebooks/create-temp-data.ipynb

Assuming the existance of a folder named input-data that contains separate .json files for each month of coverage, this notebook simply loads and concatenate all the files and exports a dataset called temp-data.json in the same folder. This data will be used within the notebooks to analyze the full coverage.

Images

./notebooks/images/prepare-img-download.ipynb

The notebook should be used to prepare the csv file to download images. It creates a csv called nyt_image_urls.csv in the input_data folder, where only two columns are retained: clean_id (used for consistent naming) and image_url (used to download the image).

./notebooks/images/get-images.sh

The script can be run in the terminal and downloads all images contained in the nyt_image_urls.csv. Make it executable: chmod +x get-images.sh Run: bash get-images.sh ../../input-data/nyt-image-urls.csv

./notebooks/images/face-recognition.ipynb

The notebook introduces a top-down technique for face recognition. Currently work in progress.

Other

./notebooks/categories/temporal-overview.ipynb

The notebook can be used to analyze the volume and freqency of coverage. It aggregates individual articles along categories, retaining information about the news outlet section to allow future filter operations. The resulting dataset is stored ar ./data/categories-over-time.json.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Panopticum News, data preparation

Introduction

Structure of the folder

./notebooks/create-temp-data.ipynb

Categories

./notebooks/categories/keywords_places.ipynb

./notebooks/categories/keywords-structure.ipynb

./notebooks/categories/places-analysis.ipynb

./notebooks/categories/places-countries-kws.ipynb

./notebooks/categories/places.ipynb

Images

./notebooks/images/prepare-img-download.ipynb

./notebooks/images/get-images.sh

./notebooks/images/face-recognition.ipynb

Other

./notebooks/categories/temporal-overview.ipynb

About

Uh oh!

Languages

more-ginger/pn-analysis

Folders and files

Latest commit

History

Repository files navigation

Panopticum News, data preparation

Introduction

Structure of the folder

./notebooks/create-temp-data.ipynb

Categories

./notebooks/categories/keywords_places.ipynb

./notebooks/categories/keywords-structure.ipynb

./notebooks/categories/places-analysis.ipynb

./notebooks/categories/places-countries-kws.ipynb

./notebooks/categories/places.ipynb

Images

./notebooks/images/prepare-img-download.ipynb

./notebooks/images/get-images.sh

./notebooks/images/face-recognition.ipynb

Other

./notebooks/categories/temporal-overview.ipynb

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages