- Read whole document before making any changes.
- Don't use relative imports.
- Indent with 4 spaces. Use double quotes by default. Follow PEP8 naming conventions:
- Follow DRY and SOLID principles.
- Keep in mind that your code should be testable e.g. https://youtu.be/XVZpi7VJ_ws.
- If your code has state, you must wrap it with class:
# Bad
x = "something"
def foo(arg):
x = arg
# Better
class Foo:
def __init__():
self.x = "something"
def foo(self, arg):
x = arg
- Fork this repo to make your own changes and asign @ijimiji as collaborator.
- Add autogenerated files
*.pyc
, caches, database file in.gitignore
. - Don't use force push. Use reverts or roll back code manually.
- The main branch has to be
master
, main development branch has to bedev
. Approved changes should be only made todev
branch. - Each step has to be done in separate feature branch e.g.
feature/initial
,feauture/parser
. I advice checking outdev
, rebasing into previous feature branch and creating new branch at this point. - You don't have to make changes in branch, if it would cause merge conflicts. You can create new branch for fixes.
- You can split each step in it's logical parts and create separate branches.
- When feauture is ready, you have to create pull request
dev <- feauture/foo
and asign @ijimiji as reviewer. - Commits have be informative e.g. no "update", "fix" commits, describe made changes. Use the same capitalization throughout your project.
# Bad commit messages
upd
fix
foo
try again
upd2
fix of fix
# Better commit messages
Remove unused imports
Implement web parser class
Add additional checks to parser
Implement a web parser CLI utility.
- Use https://python-poetry.org/ to manage your dependencies.
- You can come back to this step later, after you finish next steps, but in this case you have to provide
requirements.txt
forpip
.
- Create
config
module withConfig
class that implementsread()
andget()
methods. read()
parses a config stored in plain text file in following format into dictionary:
key1 = value
key2 = foo
# Has to produce
# {
# "key1": "value",
# "key2": "foo"
# }
Config
recieves filename of a file to be parsed throuh constructor.get()
returns parsed dictionary.- Wrap
read
with cache decorator to avoid multiple fs reads.
Other modules should use your config like
config = Config(filename="cfg.txt")
config.read()
url = config.get()["password"]
- Peek a news website to parse.
- Add
beautifulsoup4
as a dependecy to your project https://beautiful-soup-4.readthedocs.io/en/latest/. - In
parser
module createParser
class that recieves url to be parsed from with constructor. - In
parser
module createArticle
class that contains:- Title
- Abstract
- Image preview URL (if provided)
Article
has to use@dataclass
decorator.Parser
has to implementparse()
method that returns a list ofArticle
objects created with data parsed from the website.- Use
bs4
to parse the data. - You can add additional packages if you need XML support.
- Caller has to provide url with config file.
- Add
SQLAlchemy
as a dependecy to your project. https://docs.sqlalchemy.org/en/14/intro.html#installation - Use
sqlite
as an SQL database. https://realpython.com/python-sqlite-sqlalchemy/ - Create
Article
model withsqlalchemy
. Read docs for more details. - Create
ArticleDatabase
class indatabase
module that has to:- Ensure in constructor that appropriate database and table is created, create if not present.
- Implement
save
method that accepts a list ofArticle
and saves them in the database. - Implement
get
method that returns a list ofArticle
contained in the database.
- Dockerfile for the project.
- Formatters:
isort
black
pre-commit
that runsisort
andblack
automatically https://pre-commit.com/hooks