Skip to content

Design Concepts

Tammy Yang edited this page Feb 19, 2020 · 5 revisions

Why Facebook json is not friendly to analyze?

Here is an example of Facebook json: example_facebook_json
We can find that if we want to analyze the relationship between reaction type ("LIKE" or "WOW") and time by using python,

we have to write code like:

timestamp = [x["timestamp"] for x in example_json["reactions"]]
reaction_type = [x["data"][0]["reactions"]["reactions"] for x in example_json["reactions"]]

Reaction is the basic data you probably want to analyze, but Facebook makes the task complicated. The json is a nested structure with many layers, and you need to specify multiple keys to access the values. What makes things more difficult is that, if there is no such data type in your data, a "null" will show up instead of the normal key-value pair. This makes it almost impossible to know exactly how many data types Facebook records.

Take the FB post json as an example. As you can see in this file, one can find that if there is no photo attached to the given post, that post will not contains keys for photos at all. That means, if the first post contains no photo, and you creates the parser based on the first post, you might miss the photo-related data in all the later posts!

Last but not least, which is also the most annoying thing that stops users to automate the data parsing process -- the structure of Facebook json may change, and have changed! And the worst part is,

!!!!Facebook will not notify you!!!!

For example, in the data I downloaded a while ago, I can find posts in "posts/your_posts.json/", and the content looks like:

{
  "status_updates": [
    {
      "timestamp": 1415339550,
      ...

However, in the most recent data I downloaded, I need to visit "posts/your_posts_1.json/" if I want to read my post data. The the filename has been changed, and the content is also changed. It now looks like:

[
  {
    "timestamp": 1575375973,
    ...

As one can tell from the comparison> of two versions of json structures, there will be exception raised if one tried to use the key status_updates. Which means, if I wrote the parser based on my previously downloaded data, there will be a big amount of "KeyError" appear.

Clone this wiki locally