Skip to content

Commit e09a66c

Browse files
committed
## [v8.4]() (2018-08-24)
**Added** - added `EventRegistry.getUsageInfo()` method, which returns the number of used tokens and the total number of available tokens for the given user. The existing methods `EventRegisty.getRemainingAvailableRequests()` and `EventRegistry.getDailyAvailableRequests()` are still there, but their value is only valid after making at least one request. - added searching of articles and events based on article authors. You can now provide `authorUri` parameter when creating the `QueryArticles` and `QueryEvents` instances. - added author related methods to `EventRegistry` class: `EventRegistry.suggestAuthors()` to obtain uris of authors for given (partial) name and `EventRegistry.getAuthorUri()` to obtain a single author uri for the given (partial) name. - added ability to search articles and events by authors. `QueryArticles` and `QueryEvents` constructors now also accept `authorUri` parameter that can be used to limit the results to articles/events by those authors. Use `QueryOper.AND()` or `QueryOper.OR()` to specify multiple authors in the same query. - BETA: added a filter for returning only articles that are written by sources that have a certain ranking. The filter can be specified by setting the parameters `startSourceRankPercentile` and `endSourceRankPercentile` when creating the `QueryArticles` instance. The default value for `startSourceRankPercentile` is 0 and for `endSourceRankPercentile` is 100. The values that can be set are not any value between 0 and 100 but has to be a number divisible by 10. By setting `startSourceRankPercentile` to 0 and `endSourceRankPercentile` to 20 you would get only articles from top ranked news sources (according to [Alexa site ranking](https://www.alexa.com/siteinfo)) that would amount to about *approximately 20%* of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. The reason for this choice is that top ranked 10% of news sources writes about 30% of all news content and our choice normalizes this effect. This feature could potentially change in the future. - `QueryEventArticlesIter` is now able to return only a subset of articles assigned to an event. You can use the same filters as with the `QueryArticles` constructor and you can specify them when constructing the instance of `QueryEventArticlesIter`. The same kind of filtering is also possible if you want to use the `RequestEventArticles()` class instead. - added some parameters and changed default values in some of the result types to reflect the backend changes. - added optional parameter `proxyUrl` to `Analytics.extractArticleInfo()`. It can be used to download article info through a proxy that you provide (to avoid potential GDPR issues). The `proxyUrl` should be in format `{schema}://{username}:{pass}@{proxy url/ip}`.
1 parent 30e3bad commit e09a66c

File tree

12 files changed

+486
-105
lines changed

12 files changed

+486
-105
lines changed

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Change Log
22

3+
## [v8.4]() (2018-08-24)
4+
5+
**Added**
6+
- added `EventRegistry.getUsageInfo()` method, which returns the number of used tokens and the total number of available tokens for the given user. The existing methods `EventRegisty.getRemainingAvailableRequests()` and `EventRegistry.getDailyAvailableRequests()` are still there, but their value is only valid after making at least one request.
7+
- added searching of articles and events based on article authors. You can now provide `authorUri` parameter when creating the `QueryArticles` and `QueryEvents` instances.
8+
- added author related methods to `EventRegistry` class: `EventRegistry.suggestAuthors()` to obtain uris of authors for given (partial) name and `EventRegistry.getAuthorUri()` to obtain a single author uri for the given (partial) name.
9+
- added ability to search articles and events by authors. `QueryArticles` and `QueryEvents` constructors now also accept `authorUri` parameter that can be used to limit the results to articles/events by those authors. Use `QueryOper.AND()` or `QueryOper.OR()` to specify multiple authors in the same query.
10+
- BETA: added a filter for returning only articles that are written by sources that have a certain ranking. The filter can be specified by setting the parameters `startSourceRankPercentile` and `endSourceRankPercentile` when creating the `QueryArticles` instance. The default value for `startSourceRankPercentile` is 0 and for `endSourceRankPercentile` is 100. The values that can be set are not any value between 0 and 100 but has to be a number divisible by 10. By setting `startSourceRankPercentile` to 0 and `endSourceRankPercentile` to 20 you would get only articles from top ranked news sources (according to [Alexa site ranking](https://www.alexa.com/siteinfo)) that would amount to about *approximately 20%* of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. The reason for this choice is that top ranked 10% of news sources writes about 30% of all news content and our choice normalizes this effect. This feature could potentially change in the future.
11+
- `QueryEventArticlesIter` is now able to return only a subset of articles assigned to an event. You can use the same filters as with the `QueryArticles` constructor and you can specify them when constructing the instance of `QueryEventArticlesIter`. The same kind of filtering is also possible if you want to use the `RequestEventArticles()` class instead.
12+
- added some parameters and changed default values in some of the result types to reflect the backend changes.
13+
- added optional parameter `proxyUrl` to `Analytics.extractArticleInfo()`. It can be used to download article info through a proxy that you provide (to avoid potential GDPR issues). The `proxyUrl` should be in format `{schema}://{username}:{pass}@{proxy url/ip}`.
14+
315
## [v8.3.1]() (2018-08-12)
416

517
**Updated**

eventregistry/Analytics.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,13 +75,18 @@ def detectLanguage(self, text):
7575
return self._er.jsonRequestAnalytics("/api/v1/detectLanguage", { "text": text })
7676

7777

78-
def extractArticleInfo(self, url):
78+
def extractArticleInfo(self, url, proxyUrl = None):
7979
"""
8080
extract all available information about an article available at url `url`. Returned information will include
8181
article title, body, authors, links in the articles, ...
82+
@param url: article url to extract article information from
83+
@param proxyUrl: proxy that should be used for downloading article information. format: {schema}://{username}:{pass}@{proxy url/ip}
8284
@returns: dict
8385
"""
84-
return self._er.jsonRequestAnalytics("/api/v1/extractArticleInfo", { "url": url })
86+
params = { "url": url }
87+
if proxyUrl:
88+
params["proxyUrl"] = proxyUrl
89+
return self._er.jsonRequestAnalytics("/api/v1/extractArticleInfo", params)
8590

8691

8792
def ner(self, text):

eventregistry/Base.py

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -196,27 +196,6 @@ def _getQueryParams(self):
196196
return dict(self.queryParams)
197197

198198

199-
200-
class Query(QueryParamsBase):
201-
def __init__(self):
202-
QueryParamsBase.__init__(self)
203-
self.resultTypeList = []
204-
205-
206-
def _getQueryParams(self):
207-
"""encode the request."""
208-
allParams = {}
209-
if len(self.resultTypeList) == 0:
210-
raise ValueError("The query does not have any result type specified. No sense in performing such a query")
211-
allParams.update(self.queryParams)
212-
for request in self.resultTypeList:
213-
allParams.update(request.__dict__)
214-
# all requests in resultTypeList have "resultType" so each call to .update() overrides the previous one
215-
# since we want to store them all we have to add them here:
216-
allParams["resultType"] = [request.__dict__["resultType"] for request in self.resultTypeList]
217-
return allParams
218-
219-
220199
def _setQueryArrVal(self, value, propName, propOperName, defaultOperName):
221200
"""
222201
parse the value "value" and use it to set the property propName and the operator with name propOperName
@@ -251,4 +230,27 @@ def _setQueryArrVal(self, value, propName, propOperName, defaultOperName):
251230

252231
# there should be no other valid types
253232
else:
254-
assert False, "Parameter '%s' was of unsupported type. It should either be None, a string or an instance of QueryItems" % (propName)
233+
assert False, "Parameter '%s' was of unsupported type. It should either be None, a string or an instance of QueryItems" % (propName)
234+
235+
236+
237+
class Query(QueryParamsBase):
238+
def __init__(self):
239+
QueryParamsBase.__init__(self)
240+
self.resultTypeList = []
241+
242+
243+
def _getQueryParams(self):
244+
"""encode the request."""
245+
allParams = {}
246+
if len(self.resultTypeList) == 0:
247+
raise ValueError("The query does not have any result type specified. No sense in performing such a query")
248+
allParams.update(self.queryParams)
249+
for request in self.resultTypeList:
250+
allParams.update(request.__dict__)
251+
# all requests in resultTypeList have "resultType" so each call to .update() overrides the previous one
252+
# since we want to store them all we have to add them here:
253+
allParams["resultType"] = [request.__dict__["resultType"] for request in self.resultTypeList]
254+
return allParams
255+
256+

eventregistry/EventRegistry.py

Lines changed: 47 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -141,15 +141,20 @@ def printConsole(self, text):
141141

142142

143143
def getRemainingAvailableRequests(self):
144-
"""get the number of requests that are still available for the user today"""
144+
"""get the number of requests that are still available for the user today. Information is only accessible after you make some query."""
145145
return self._remainingAvailableRequests
146146

147147

148148
def getDailyAvailableRequests(self):
149-
"""get the total number of requests that the user can make in a day"""
149+
"""get the total number of requests that the user can make in a day. Information is only accessible after you make some query."""
150150
return self._dailyAvailableRequests
151151

152152

153+
def getUsageInfo(self):
154+
"""return the number of used and total available tokens. Can be used at any time (also before making queries)"""
155+
return self.jsonRequest("/api/v1/usage", { "apiKey": self._apiKey })
156+
157+
153158
def getUrl(self, query):
154159
"""
155160
return the url that can be used to get the content that matches the query
@@ -349,7 +354,7 @@ def suggestConcepts(self, prefix, sources = ["concepts"], lang = "eng", conceptL
349354
params = { "prefix": prefix, "source": sources, "lang": lang, "conceptLang": conceptLang, "page": page, "count": count}
350355
params.update(returnInfo.getParams())
351356
params.update(kwargs)
352-
return self.jsonRequest("/json/suggestConcepts", params)
357+
return self.jsonRequest("/json/suggestConceptsFast", params)
353358

354359

355360
def suggestCategories(self, prefix, page = 1, count = 20, returnInfo = ReturnInfo(), **kwargs):
@@ -364,7 +369,7 @@ def suggestCategories(self, prefix, page = 1, count = 20, returnInfo = ReturnInf
364369
params = { "prefix": prefix, "page": page, "count": count }
365370
params.update(returnInfo.getParams())
366371
params.update(kwargs)
367-
return self.jsonRequest("/json/suggestCategories", params)
372+
return self.jsonRequest("/json/suggestCategoriesFast", params)
368373

369374

370375
def suggestNewsSources(self, prefix, dataType = ["news", "pr", "blog"], page = 1, count = 20, **kwargs):
@@ -378,7 +383,7 @@ def suggestNewsSources(self, prefix, dataType = ["news", "pr", "blog"], page = 1
378383
assert page > 0, "page parameter should be above 0"
379384
params = {"prefix": prefix, "dataType": dataType, "page": page, "count": count}
380385
params.update(kwargs)
381-
return self.jsonRequest("/json/suggestSources", params)
386+
return self.jsonRequest("/json/suggestSourcesFast", params)
382387

383388

384389
def suggestSourceGroups(self, prefix, page = 1, count = 20, **kwargs):
@@ -413,7 +418,7 @@ def suggestLocations(self, prefix, sources = ["place", "country"], lang = "eng",
413418
assert len(sortByDistanceTo) == 2, "The sortByDistanceTo should contain two float numbers"
414419
params["closeToLat"] = sortByDistanceTo[0]
415420
params["closeToLon"] = sortByDistanceTo[1]
416-
return self.jsonRequest("/json/suggestLocations", params)
421+
return self.jsonRequest("/json/suggestLocationsFast", params)
417422

418423

419424
def suggestLocationsAtCoordinate(self, latitude, longitude, radiusKm, limitToCities = False, lang = "eng", count = 20, ignoreNonWiki = True, returnInfo = ReturnInfo(), **kwargs):
@@ -433,7 +438,7 @@ def suggestLocationsAtCoordinate(self, latitude, longitude, radiusKm, limitToCit
433438
params = { "action": "getLocationsAtCoordinate", "lat": latitude, "lon": longitude, "radius": radiusKm, "limitToCities": limitToCities, "count": count, "lang": lang }
434439
params.update(returnInfo.getParams())
435440
params.update(kwargs)
436-
return self.jsonRequest("/json/suggestLocations", params)
441+
return self.jsonRequest("/json/suggestLocationsFast", params)
437442

438443

439444
def suggestSourcesAtCoordinate(self, latitude, longitude, radiusKm, count = 20, **kwargs):
@@ -448,7 +453,7 @@ def suggestSourcesAtCoordinate(self, latitude, longitude, radiusKm, count = 20,
448453
assert isinstance(longitude, (int, float)), "The 'longitude' should be a number"
449454
params = {"action": "getSourcesAtCoordinate", "lat": latitude, "lon": longitude, "radius": radiusKm, "count": count}
450455
params.update(kwargs)
451-
return self.jsonRequest("/json/suggestSources", params)
456+
return self.jsonRequest("/json/suggestSourcesFast", params)
452457

453458

454459
def suggestSourcesAtPlace(self, conceptUri, dataType = "news", page = 1, count = 20, **kwargs):
@@ -461,7 +466,21 @@ def suggestSourcesAtPlace(self, conceptUri, dataType = "news", page = 1, count =
461466
"""
462467
params = {"action": "getSourcesAtPlace", "conceptUri": conceptUri, "page": page, "count": count, "dataType": dataType}
463468
params.update(kwargs)
464-
return self.jsonRequest("/json/suggestSources", params)
469+
return self.jsonRequest("/json/suggestSourcesFast", params)
470+
471+
472+
def suggestAuthors(self, prefix, page = 1, count = 20, **kwargs):
473+
"""
474+
return a list of news sources that match the prefix
475+
@param prefix: input text that should be contained in the author name and source url
476+
@param page: page of results
477+
@param count: number of returned suggestions
478+
"""
479+
assert page > 0, "page parameter should be above 0"
480+
params = {"prefix": prefix, "page": page, "count": count}
481+
params.update(kwargs)
482+
return self.jsonRequest("/json/suggestAuthorsFast", params)
483+
465484

466485

467486
def suggestConceptClasses(self, prefix, lang = "eng", conceptLang = "eng", source = ["dbpedia", "custom"], page = 1, count = 20, returnInfo = ReturnInfo(), **kwargs):
@@ -552,6 +571,13 @@ def getNewsSourceUri(self, sourceName, dataType = ["news", "pr", "blog"]):
552571
return None
553572

554573

574+
def getSourceUri(self, sourceName, dataType=["news", "pr", "blog"]):
575+
"""
576+
alternative (shorter) name for the method getNewsSourceUri()
577+
"""
578+
return self.getNewsSourceUri(sourceName, dataType)
579+
580+
555581
def getSourceGroupUri(self, sourceGroupName):
556582
"""
557583
return the URI of the source group that best matches the name
@@ -600,6 +626,18 @@ def getCustomConceptUri(self, label, lang = "eng"):
600626
return None
601627

602628

629+
def getAuthorUri(self, authorName):
630+
"""
631+
return author uri that that is the best match for the given author name (and potentially source url)
632+
if there are multiple matches for the given author name, they are sorted based on the number of articles they have written (from most to least frequent)
633+
@param authorName: partial or full name of the author, potentially also containing the source url (e.g. "george brown nytimes")
634+
"""
635+
matches = self.suggestAuthors(authorName)
636+
if matches != None and isinstance(matches, list) and len(matches) > 0 and "uri" in matches[0]:
637+
return matches[0]["uri"]
638+
return None
639+
640+
603641
@staticmethod
604642
def getUriFromUriWgt(uriWgtList):
605643
"""

eventregistry/Query.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,22 +33,24 @@ def __init__(self,
3333
dateEnd = None,
3434
dateMention = None,
3535
sourceLocationUri = None,
36-
sourceGroupUri = None,
36+
sourceGroupUri=None,
37+
authorUri = None,
3738
keywordLoc = "body",
3839
minMaxArticlesInEvent = None,
3940
exclude = None):
4041
"""
41-
@param keyword: keyword(s) to query. Either None, string or QueryItems
42-
@param conceptUri: concept(s) to query. Either None, string or QueryItems
43-
@param sourceUri: source(s) to query. Either None, string or QueryItems
44-
@param locationUri: location(s) to query. Either None, string or QueryItems
45-
@param categoryUri: categories to query. Either None, string or QueryItems
46-
@param lang: language(s) to query. Either None, string or QueryItems
42+
@param keyword: keyword(s) to query. Either None, string or QueryItems instance
43+
@param conceptUri: concept(s) to query. Either None, string or QueryItems instance
44+
@param sourceUri: source(s) to query. Either None, string or QueryItems instance
45+
@param locationUri: location(s) to query. Either None, string or QueryItems instance
46+
@param categoryUri: categories to query. Either None, string or QueryItems instance
47+
@param lang: language(s) to query. Either None, string or QueryItems instance
4748
@param dateStart: starting date. Either None, string or date or datetime
4849
@param dateEnd: ending date. Either None, string or date or datetime
4950
@param dateMention: search by mentioned dates - Either None, string or date or datetime or a list of these types
5051
@param sourceLocationUri: find content generated by news sources at the specified geographic location - can be a city URI or a country URI. Multiple items can be provided using a list
5152
@param sourceGroupUri: a single or multiple source group URIs. A source group is a group of news sources, commonly defined based on common topic or importance
53+
@param authorUri: author(s) to query. Either None, string or QueryItems instance
5254
@param keywordLoc: where should we look when searching using the keywords provided by "keyword" parameter. "body" (default), "title", or "body,title"
5355
@param minMaxArticlesInEvent: a tuple containing the minimum and maximum number of articles that should be in the resulting events. Parameter relevant only if querying events
5456
@param exclude: a instance of BaseQuery, CombinedQuery or None. Used to filter out results matching the other criteria specified in this query
@@ -78,6 +80,8 @@ def __init__(self,
7880

7981
self._setQueryArrVal("sourceLocationUri", sourceLocationUri)
8082
self._setQueryArrVal("sourceGroupUri", sourceGroupUri)
83+
self._setQueryArrVal("authorUri", authorUri)
84+
8185
if keywordLoc != "body":
8286
self._queryObj["keywordLoc"] = keywordLoc
8387

0 commit comments

Comments
 (0)