-
Notifications
You must be signed in to change notification settings - Fork 0
Database Design
GinkREAL edited this page Mar 17, 2019
·
2 revisions
Stores the articles used for labelling and training the ai model. Remains read-only for most of its lifetime. These are just guidelines and it is best to check reddit's documentation for this particular object.
- _id: Mongo's unique ID for this article, used for most of the labeling side API calls
- title: Title of the article as it is seen on reddit.
- subreddit: The subreddit this article belongs to.
- name: The name is the type of the 'object' + the unique name by reddit. (article) + 76rjtv(reddit id).
- upvotes: The number of upvotes of this article.
- downvotes: This one is a mystery to me.
- score: Usually same as upvotes.
- locked: Whether a moderator has locked the article from more comments.
- num_comments: Total comments of the article.
- url: If a post if a link post, then it has an url. Usually the news article the post is referring to
- created_utc: UTC time where it was created
- last_modified: Date last modified, format unknown.
- article_id: Reddit's assigned ID to the article. Used for AI side API calls.
- archived: Whether the post is old enough to be archived, and therefore locked in time.
- author_fullname: Same as name field, but for the author
- author: Human readable name of the author.
- comments: Array of comment subtypes.
- targets: Array of strings representing the possible targets of interest in the article. This is a required field and is added in by the reddit.py script.
- comment: The actual text of the comment.
- created_utc: The utc created time of the comment.
- distinguished: A mystery to me.
- edited: Whether the comment has been edited during its lifetime.
- comment_id: Reddit ID of the comment.
- score: Upvotes minus downvotes.
- sticked: Whether a moderator has stickied this comment to the top of the thread.
- author_fullname: Similar to author_fullname in article.
- author: Author of comment as seen on reddit.
- replies: Recursive array of more comment subtypes.
These are a sort of cache to help get the comments with the highest upvotes as the one to be labelled by volunteers. Can be generated by reddit.py using the 'heatmap' command.
- _id: Mongo's unique id for this heatmap.
- article: Mongo id of the article this heatmap caches.
- heatmap: A tuple of the comment_address and score of that comment. Note: comment_address is simply multiple array indices separated by a comma that represent the 'address' of that comment within the forest of comments.
Volunteers who help label the comments in the system.
- _id: Mongo's unique id for this user.
- username: The username of this user.
- password: The BCrypt encrypted password of this user.
- role: Either admin or user. Admin-created accounts have endorsed set to true.
- endorsed: Just an identifier to easily delete bogus labels.
- assignedArticles: Array of the mongo id of articles assigned to this user for labeling.
- checkpoint: Represents the save point of the user. First number is assignedArticle indice, then second number is heatmap indice.
The labels made by the volunteers
- _id: Mongo's unique id for this label.
- labeller: The name of the user who made this label. Useful for inter-rate reliability tests.
- article_id: Article that this label refers to.
- comment_address: Address of the comment in the forest of the article the label is referring to.
- label: The stance of the comment against the target according to the labeller.
- target: The target of interest of this label.
Job queue system polled by worker.py for analysis.
- article: The reddit id of the article requested for analysis.
Results made by the worker.py on requests found in jobs collection.