Skip to content

Commit a1bc609

Browse files
committed
## [v9.0]() (2023-03-11)
**Added** - added use of the typing module. All parameters in the method calls use typing support to make it easier to understand what type is expected. - added autosuggest methods `suggestEventTypes`, `suggestIndustries`, `getSdgUris`, `getSasbUris` - all to be used only when querying mentions - **Updated** - `QueryArticles` class. Added filters `authorsFilter`, `videosFilter`, `linksFilter` - `QueryMentions` class. Added several filters: `industryUri`, `sdgUri`, `sasbUri`, `esgUri`, `minSentenceIndex`, `maxSentenceIndex`, `showDuplicates` - updated several code example files
1 parent 8949341 commit a1bc609

31 files changed

+1198
-754
lines changed

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# Change Log
22

3+
## [v9.0]() (2023-05-15)
4+
5+
**Added**
6+
- added use of the typing module. All parameters in the method calls use typing support to make it easier to understand what type is expected.
7+
- added autosuggest methods `suggestEventTypes`, `suggestIndustries`, `getSdgUris`, `getSasbUris` - all to be used only when querying mentions
8+
-
9+
10+
**Updated**
11+
- `QueryArticles` class. Added filters `authorsFilter`, `videosFilter`, `linksFilter`
12+
- `QueryMentions` class. Added several filters: `industryUri`, `sdgUri`, `sasbUri`, `esgUri`, `minSentenceIndex`, `maxSentenceIndex`, `showDuplicates`
13+
- updated several code example files
14+
15+
316
## [v8.12]() (2022-03-11)
417

518
**Updated**

README.md

Lines changed: 85 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,12 @@
1-
## Accessing Event Registry's News API through Python
2-
3-
This library contains classes and methods that allow one to obtain from Event Registry (http://eventregistry.org) all available data, such as news articles, events, trends, etc.
4-
5-
The detailed documentation on how to use the library is available at the [project's wiki page](https://github.com/EventRegistry/event-registry-python/wiki). Examples of use are in the [Examples folder in the repository](https://github.com/EventRegistry/event-registry-python/tree/master/eventregistry/examples).
6-
7-
Changes introduced in the different versions of the module are described in the [CHANGELOG.md](https://github.com/EventRegistry/event-registry-python/blob/master/CHANGELOG.md) as well as on the [Releases](https://github.com/EventRegistry/event-registry-python/releases) page.
1+
Event Registry is a Python package that can be used to easily access the news data available in [Event Registry](http://eventregistry.org/) through the API. The package can be used to query for articles or events by filtering using a large set of filters, like keywords, concepts, topics, sources, sentiment, date, etc. Details about the News API are available on the [landing page of the product](https://newsapi.ai/).
82

93
## Installation
104

115
Event Registry package can be installed using Python's pip installer. In the command line, simply type:
126

137
pip install eventregistry
148

15-
and the package should be installed. Alternatively, you can also clone the package from the GitHub repository at https://github.com/EventRegistry/event-registry-python. After cloning it, open the command line and run:
9+
and the package should be installed. Alternatively, you can also clone the package from the [GitHub repository](https://github.com/EventRegistry/event-registry-python). After cloning it, open the command line and run:
1610

1711
python setup.py install
1812

@@ -24,7 +18,7 @@ To ensure the package has been properly installed run python and type:
2418
import eventregistry
2519
```
2620

27-
If you don't get any error messages then your installation has been successful.
21+
If you don't get any error messages, then your installation has been successful.
2822

2923
### Updating the package
3024

@@ -34,51 +28,112 @@ As features are added to the package you will need at some point to update it. I
3428

3529
### Authentication and API key
3630

37-
When making queries to Event Registry you will have to use an API key that you can obtain for free. The details how to obtain and use the key are described in the [Authorization](../../wiki/EventRegistry-class#authorization) section.
31+
When making queries to Event Registry you will have to use an API key that you can obtain for free. The details on how to obtain and use the key are described in the [Authorization](../../wiki/EventRegistry-class#authorization) section.
3832

39-
## Three simple examples to make you interested
33+
## Four simple examples to get you interested
4034

41-
**Find news articles that mention Tesla in the article title**
35+
**Print a list of recently articles or blog posts from *US based sources* *with positive sentiment* mentioning phrases *"George Clooney"* or *"Sandra Bullock"***
4236

4337
```python
4438
from eventregistry import *
4539
er = EventRegistry(apiKey = YOUR_API_KEY)
46-
# print at most 500 articles
47-
MAX_ITEMS = 500
48-
q = QueryArticlesIter(keywords = "tesla", keywordsLoc="title")
49-
for art in q.execQuery(er, sortBy = "date", maxItems = MAX_ITEMS):
40+
41+
# get the USA URI
42+
usUri = er.getLocationUri("USA") # = http://en.wikipedia.org/wiki/United_States
43+
44+
q = QueryArticlesIter(
45+
keywords = QueryItems.OR(["George Clooney", "Sandra Bullock"]),
46+
minSentiment = 0.4,
47+
sourceLocationUri = usUri,
48+
dataType = ["news", "blog"])
49+
50+
# obtain at most 500 newest articles or blog posts, remove maxItems to get all
51+
for art in q.execQuery(er, sortBy = "date", maxItems = 500):
5052
print(art)
5153
```
5254

53-
**Print a list of recently added articles mentioning George Clooney**
55+
**Print a list of most relevant *business* articles from the last month related to *Microsoft* or *Google*. The articles should be in any language (including Chinese, Arabic, ...)**
5456

5557
```python
5658
from eventregistry import *
57-
er = EventRegistry(apiKey = YOUR_API_KEY)
58-
q = QueryArticlesIter(conceptUri = er.getConceptUri("George Clooney"))
59-
for art in q.execQuery(er, sortBy = "date"):
60-
print art
59+
# allowUseOfArchive=False will allow us to search only over the last month of data
60+
er = EventRegistry(apiKey = YOUR_API_KEY, allowUseOfArchive=False)
61+
62+
# get the URIs for the companies and the category
63+
microsoftUri = er.getConceptUri("Microsoft") # = http://en.wikipedia.org/wiki/Microsoft
64+
googleUri = er.getConceptUri("Google") # = http://en.wikipedia.org/wiki/Google
65+
businessUri = er.getCategoryUri("news business") # = news/Business
66+
67+
q = QueryArticlesIter(
68+
conceptUri = QueryItems.OR([microsoftUri, googleUri]),
69+
categoryUri = businessUri)
70+
71+
# obtain at most 500 newest articles, remove maxItems to get all
72+
for art in q.execQuery(er, sortBy = "date", maxItems = 500):
73+
print(art)
6174
```
6275

76+
6377
**Search for latest events related to Star Wars**
6478

6579
```python
6680
from eventregistry import *
6781
er = EventRegistry(apiKey = YOUR_API_KEY)
68-
q = QueryEvents(conceptUri = er.getConceptUri("Star Wars"))
69-
q.setRequestedResult(RequestEventsInfo(sortBy = "date", count=10)) # return event details for last 10 events
70-
print er.execQuery(q)
82+
83+
q = QueryEvents(keywords = "Star Wars")
84+
q.setRequestedResult(RequestEventsInfo(sortBy = "date", count = 50)) # request event details for latest 50 events
85+
86+
# get the full list of 50 events at once
87+
print(er.execQuery(q))
7188
```
7289

73-
## Run a Jupyter notebook
90+
**Search for articles that (a) mention immigration, (b) are related to business, and (c) were published by news sources located in New York City**
7491

75-
We've also prepared an interactive Jupyter notebook where we demonstrate how you can use the SDK. You can run it online and modify the individual examples.
92+
```python
93+
from eventregistry import *
94+
er = EventRegistry(apiKey = YOUR_API_KEY)
7695

77-
**[Run Jupyter notebook with examples](https://mybinder.org/v2/gh/EventRegistry/event-registry-python-intro/master)**
96+
q = QueryArticlesIter(
97+
# here we don't use keywords so we will also get articles that mention immigration using various synonyms
98+
conceptUri = er.getConceptUri("immigration"),
99+
categoryUri = er.getCategoryUri("business"),
100+
sourceLocationUri = er.getLocationUri("New York City"))
78101

79-
## Where to next?
102+
# obtain 500 articles that have were shared the most on social media
103+
for art in q.execQuery(er, sortBy = "socialScore", maxItems = 500):
104+
print(art)
105+
```
80106

81-
Depending on your interest and existing knowledge of the `eventregistry` package you can check different things:
107+
**What are the currently trending topics**
108+
109+
```python
110+
from eventregistry import *
111+
er = EventRegistry(apiKey = YOUR_API_KEY)
112+
113+
# top 10 trending concepts in the news
114+
q = GetTrendingConcepts(source = "news", count = 10)
115+
print(er.execQuery(q))
116+
```
117+
118+
## Learning from examples
119+
120+
We believe that it's easiest to learn how to use our service by looking at examples. For this reason, we have prepared examples of various most used features. View the examples grouped by main search actions:
121+
122+
[View examples of searching for articles](https://github.com/EventRegistry/event-registry-python/blob/master/eventregistry/examples/QueryArticlesExamples.py)
123+
124+
[View examples of searching for events](https://github.com/EventRegistry/event-registry-python/blob/master/eventregistry/examples/QueryEventsExamples.py)
125+
126+
[View examples of obtaining information about an individual event](https://github.com/EventRegistry/event-registry-python/blob/master/eventregistry/examples/QueryEventExamples.py)
127+
128+
[Examples of how to obtain the full feed of articles](https://github.com/EventRegistry/event-registry-python/blob/master/eventregistry/examples/FeedOfNewArticlesExamples.py)
129+
130+
[Examples of how to obtain the full feed of events](https://github.com/EventRegistry/event-registry-python/blob/master/eventregistry/examples/FeedOfNewEventsExamples.py)
131+
132+
## Play with interactive Jupyter notebook
133+
134+
To interactively learn about how to use the SDK, see examples of use, see how to get extra meta-data properties, and more, please open [this Binder](https://mybinder.org/v2/gh/EventRegistry/event-registry-python-intro/master). You'll be able to view and modify the examples.
135+
136+
## Where to next?
82137

83138
**[Terminology](../../wiki/Terminology)**. There are numerous terms in the Event Registry that you will constantly see. If you don't know what we mean by an *event*, *story*, *concept* or *category*, you should definitely check this page first.
84139

@@ -94,10 +149,6 @@ Depending on your interest and existing knowledge of the `eventregistry` package
94149

95150
**[Articles and events shared the most on social media](../../wiki/Social-shares)**. Do you want to get the list of articles that have been shared the most on Facebook and Twitter on a particular date? What about the most relevant event based on shares on social media?
96151

97-
**[Daily mentions and sentiment of concepts and categories](../../wiki/Number-of-mentions-in-news-or-social-media)**. Are you interested in knowing how often was a particular concept or category mentioned in the news in the previous two years? How about the sentiment expressed on social media about your favorite politician?
98-
99-
**[Correlations of concepts](../../wiki/Correlations)**. Do you have some time series of daily measurements? Why not find the concepts that correlate the most with it based on the number of mentions in the news.
100-
101152
## Data access and usage restrictions
102153

103-
Event Registry is a commercial service but it allows also unsubscribed users to perform a certain number of operations. Free users are not allowed to use the obtained data for any commercial purposes (see the details on our [Terms of Service page](https://newsapi.ai/terms)). In order to avoid these restrictions please contact us about the [available plans](https://newsapi.ai/plans).
154+
Event Registry is a commercial service but it allows also unsubscribed users to perform a certain number of operations. Non-paying users are not allowed to use the obtained data for any commercial purposes (see the details on our [Terms of Service page](http://newsapi.ai/terms)) and have access to only last 30 days of content. In order to avoid these restrictions please contact us about the [available plans](http://newsapi.ai/plans).

eventregistry/Analytics.py

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,20 @@
1111
"""
1212

1313
import json
14+
from typing import Union, List
15+
from eventregistry.EventRegistry import EventRegistry
1416
from eventregistry.Base import *
1517
from eventregistry.ReturnInfo import *
1618

1719
class Analytics:
18-
def __init__(self, eventRegistry):
20+
def __init__(self, eventRegistry: EventRegistry):
1921
"""
2022
@param eventRegistry: instance of EventRegistry class
2123
"""
2224
self._er = eventRegistry
2325

2426

25-
def annotate(self, text, lang = None, customParams = None):
27+
def annotate(self, text: str, lang: str = None, customParams: dict = None):
2628
"""
2729
identify the list of entities and nonentities mentioned in the text
2830
@param text: input text to annotate
@@ -36,18 +38,21 @@ def annotate(self, text, lang = None, customParams = None):
3638
return self._er.jsonRequestAnalytics("/api/v1/annotate", params)
3739

3840

39-
def categorize(self, text, taxonomy = "dmoz"):
41+
def categorize(self, text: str, taxonomy: str = "dmoz", concepts: List[str] = None):
4042
"""
4143
determine the set of up to 5 categories the text is about. Currently, only English text can be categorized!
4244
@param text: input text to categorize
4345
@param taxonomy: which taxonomy use for categorization. Options "dmoz" (over 5000 categories in 3 levels, English language only)
4446
or "news" (general news categorization, 9 categories, any langauge)
4547
@returns: dict
4648
"""
47-
return self._er.jsonRequestAnalytics("/api/v1/categorize", { "text": text, "taxonomy": taxonomy })
49+
params = { "text": text, "taxonomy": taxonomy }
50+
if isinstance(concepts, list) and len(concepts) > 0:
51+
params["concepts"] = concepts
52+
return self._er.jsonRequestAnalytics("/api/v1/categorize", params)
4853

4954

50-
def sentiment(self, text, method = "vocabulary", sentencesToAnalyze = 10, returnSentences = True):
55+
def sentiment(self, text: str, method: str = "vocabulary", sentencesToAnalyze: int = 10, returnSentences: bool = True):
5156
"""
5257
determine the sentiment of the provided text in English language
5358
@param text: input text to categorize
@@ -61,7 +66,7 @@ def sentiment(self, text, method = "vocabulary", sentencesToAnalyze = 10, return
6166
return self._er.jsonRequestAnalytics("/api/v1/sentiment", { "text": text, "method": method, "sentences": sentencesToAnalyze, "returnSentences": returnSentences })
6267

6368

64-
def semanticSimilarity(self, text1, text2, distanceMeasure = "cosine"):
69+
def semanticSimilarity(self, text1: str, text2: str, distanceMeasure: str = "cosine"):
6570
"""
6671
determine the semantic similarity of the two provided documents
6772
@param text1: first document to analyze
@@ -72,7 +77,7 @@ def semanticSimilarity(self, text1, text2, distanceMeasure = "cosine"):
7277
return self._er.jsonRequestAnalytics("/api/v1/semanticSimilarity", { "text1": text1, "text2": text2, "distanceMeasure": distanceMeasure })
7378

7479

75-
def detectLanguage(self, text):
80+
def detectLanguage(self, text: str):
7681
"""
7782
determine the language of the given text
7883
@param text: input text to analyze
@@ -81,7 +86,7 @@ def detectLanguage(self, text):
8186
return self._er.jsonRequestAnalytics("/api/v1/detectLanguage", { "text": text })
8287

8388

84-
def extractArticleInfo(self, url, proxyUrl = None, headers = None, cookies = None):
89+
def extractArticleInfo(self, url: str, proxyUrl: str = None, headers: Union[str, dict] = None, cookies: Union[dict, str] = None):
8590
"""
8691
extract all available information about an article available at url `url`. Returned information will include
8792
article title, body, authors, links in the articles, ...
@@ -105,7 +110,7 @@ def extractArticleInfo(self, url, proxyUrl = None, headers = None, cookies = Non
105110
return self._er.jsonRequestAnalytics("/api/v1/extractArticleInfo", params)
106111

107112

108-
def ner(self, text):
113+
def ner(self, text: str):
109114
"""
110115
extract named entities from the provided text. Supported languages are English, German, Spanish and Chinese.
111116
@param text: text on wich to extract named entities
@@ -114,9 +119,9 @@ def ner(self, text):
114119
return self._er.jsonRequestAnalytics("/api/v1/ner", {"text": text})
115120

116121

117-
def trainTopicOnTweets(self, twitterQuery, useTweetText=True, useIdfNormalization=True,
118-
normalization="linear", maxTweets=2000, maxUsedLinks=500, ignoreConceptTypes=[],
119-
maxConcepts = 20, maxCategories = 10, notifyEmailAddress = None):
122+
def trainTopicOnTweets(self, twitterQuery: str, useTweetText: bool = True, useIdfNormalization: bool = True,
123+
normalization: bool = "linear", maxTweets: int = 2000, maxUsedLinks: int = 500, ignoreConceptTypes: Union[str, List[str]] = [],
124+
maxConcepts: int = 20, maxCategories: int = 10, notifyEmailAddress: str = None):
120125
"""
121126
create a new topic and train it using the tweets that match the twitterQuery
122127
@param twitterQuery: string containing the content to search for. It can be a Twitter user account (using "@" prefix or user's Twitter url),
@@ -145,23 +150,23 @@ def trainTopicOnTweets(self, twitterQuery, useTweetText=True, useIdfNormalizatio
145150
return self._er.jsonRequestAnalytics("/api/v1/trainTopicOnTwitter", params)
146151

147152

148-
def trainTopicCreateTopic(self, name):
153+
def trainTopicCreateTopic(self, name: str):
149154
"""
150155
create a new topic to train. The user should remember the "uri" parameter returned in the result
151156
@returns object containing the "uri" property that should be used in the follow-up call to trainTopic* methods
152157
"""
153158
return self._er.jsonRequestAnalytics("/api/v1/trainTopic", { "action": "createTopic", "name": name})
154159

155160

156-
def trainTopicClearTopic(self, uri):
161+
def trainTopicClearTopic(self, uri: str):
157162
"""
158163
if the topic is already existing, clear the definition of the topic. Use this if you want to retrain an existing topic
159164
@param uri: uri of the topic (obtained by calling trainTopicCreateTopic method) to clear
160165
"""
161166
return self._er.jsonRequestAnalytics("/api/v1/trainTopic", { "action": "clearTopic", "uri": uri })
162167

163168

164-
def trainTopicAddDocument(self, uri, text):
169+
def trainTopicAddDocument(self, uri: str, text: str):
165170
"""
166171
add the information extracted from the provided "text" to the topic with uri "uri"
167172
@param uri: uri of the topic (obtained by calling trainTopicCreateTopic method)
@@ -170,8 +175,8 @@ def trainTopicAddDocument(self, uri, text):
170175
return self._er.jsonRequestAnalytics("/api/v1/trainTopic", { "action": "addDocument", "uri": uri, "text": text})
171176

172177

173-
def trainTopicGetTrainedTopic(self, uri, maxConcepts = 20, maxCategories = 10,
174-
ignoreConceptTypes=[], idfNormalization = True):
178+
def trainTopicGetTrainedTopic(self, uri: str, maxConcepts: int = 20, maxCategories: int = 10,
179+
ignoreConceptTypes: Union[str, List[str]] = [], idfNormalization: bool = True):
175180
"""
176181
retrieve topic for the topic for which you have already finished training
177182
@param uri: uri of the topic (obtained by calling trainTopicCreateTopic method)

0 commit comments

Comments
 (0)