This repo is a Python cover of the R-project, A Text-Based Network, by Majeed Simaan. The original R project refers to Rpubs: https://rpubs.com/simaan84/410145
This program is recommend to run with
- Python version 3.X.
- BeautifulSoup4
- nltk
- pyvis
- numpy
- textdistance
- pandas
Install by downloading the file from the github page, or using git code Install via GitHub:
git clone https://github.com/songyesog2000/A-Text-Based-Network.git
The profiles of listing companies from Yahoo Finance, collected via Web mining. The python program for that executes as
python profiles_list.py --tickers [string of ticker symbols separated by ',']
if the --tickers [ticker symbol]
is not provided, the program is default to collect profiles of 'JPM', 'BAC', 'GOOG', 'AAPL', 'MMM', 'AAC', 'T', 'VZ', 'XOM', 'CVX', 'KO', 'BUD'.
The collected profiles are stored in the json file profiles_list.json
.
###Text-based Network
The distance of the companies is defined by the Jaro-winkler distance of their profiles.
Then, transfer the distance to truncated similarity values ( truncated by 0.25 in example ).
Based on the similarity, a network graph is composed and stored in
G.html
, the html file will show up the visualization in browsers.
All the above process is integrated by running
python Text-based\ Network.py