For this project, I analyzed the sentiment analysis from the recently concluded Olympic Games Paris 2024. I used the Bing Search v7 engine to get all the news from Microsoft Bing Search Engine to determine the sentiments from each piece of news. I picked the Bing API because working with APIs will always be part of any data engineering process at one point in your career. For this project, I carried out the following:
- Create a Bing Search Resource in Azure.
- Data Ingestion.
- Data Transformation with Incremental Loading.
- Sentiment Analysis with Incremental Loading.
- Data Visualization and Reporting in Power BI.
- Set up Alerts with Data Activator with notifications on Teams.
Datasource: Olympic news data were loaded from Bing Search Resource News Search APIs v7 (https://api.bing.microsoft.com/v7.0/news/search) available in Azure.
- Data Ingestion
- Data Pipeline: Data factory, incremental load, schedule refresh
- Fabric Data Storage: JSON, Lake House
- Data Science: Sentiment Analysis, Synase ML Model
- Data Warehousing: Analytics Reporting
- Extract, Transform and Load (ETL) process
- Power BI Data Visualization technical skills (Documentation, Data Gathering, Power Query, Data Modelling, Report Design, Data Analysis Expression (DAX), Business and Analytics Reporting, Performance Optimization, Deployment and Power BI Service, Scalability)
- Continuous Improvement
I created a resource group called AzureXfabric_rg and used the marketplace to search for the Bing Search v7 API to create a Bing Search resource. I picked the F1 (3 calls per second and 1k Calls per month) pricing tier because it is free.
As our data lake, I created a Microsoft Fabric Lakehouse called the bing_olympic_news_db which I connected the Bing Search resource API and also configured the Source and Destination.
- Configuring the Data Source To connect to the source data, I use the RestAPI connection available in Fabric to connect to the Bing Search resource API in Azure.
- Setting the Data Destination. For the Destination, I set the destination to the bing_olympic_news_db Lakehouse I had created earlier. I also created a file path called olympic-news.json and set the file format to JSON.
I use the spark job within Microsoft Fabric for this and I also implement a Type 1 SCD to load our data into the Lakehouse incrementally.
For the Sentiment Analysis, I use SynapseML formerly (MMLSpark) which is an open-source library that is available within Microsoft Fabric. It is a pre-built intelligent model for our machine learning task and predicts the Olympic news sentiments based on the news description column which has a detailed description of the news article as seen above. View Sentiment Analysis codes here
Since the news data is live, there is a need to schedule its refresh every morning at 7 am in Data Factory. This refresh covers the data Ingestion (pipeline), ETL_process_olympic_news (notebook), and olympic_news_sentiment_analysis.ipynb (notebook)
The refresh schedule is shown here 👇
Below is the Data Visualization and report of the Olympic news for the past 7 days as at when I performed this analysis. Over 100 news was published and only 17% of it has positive sentiments. Download the Sentiment Analysis Dashboard in PDF here
I created an alert called Positive Alert Item and I would like to receive a Teams message alert when the Positive Sentiment is greater than 17%.
For the Visualization, I measure the following KPIs.
- Total published News (positive, negative, neutral, and mixed
- Percentage of each sentiment (positive, negative, neutral, and mixed)
- This was an exciting opportunity to demonstrate how Azure Data Services (known as items in Fabric) can be replicated in Microsoft Fabric(an all-in-one analytics solution). These items include DataFactory, Lakehouse, Pipeline, Power BI, Intelligence and Machine Learning many more.