|Agent Source token
|Blog posts mentioned in RSS and Atom feeds on
|All DOIs, landing page URLs, plain-text DOIs.
|Curators of RSS and Atom feed aggregators. Authors of blog posts.
|RSS and Atom feeds, and the blog posts they point to.
|Every few hours.
|Linked DOIs, unlinked DOIs, Landing Page URLs
|Creative commons CC0 1.0 Universal (CC0 1.0)
|HTML of webpages (mostly blog posts) linked to from RSS and Atom feeds.
|Produces Evidence Records
|Produces relation types
|Updates or deletions
What it is
Links from blogs and other content with a newsfeed.
What it does
The Agent has a list of RSS feeds. It monitors each one for links to blog posts. If a blog post links to registered content, or mentions DOIs in the text, they are extracted into Events.
Where data comes from
newsfeed-list Artifact is consulted. On a regular basis the Agent retrieves the Artifact, then follows the link to every blog post or page mentioned. Data sources:
newsfeed-listArtifact, curated by Crossref.
- The content of each newsfeed. Each newsfeed may be operated by a different organization.
- The content of the individual blog posts.
Example RSS feeds include:
- ScienceSeeker blog aggregator
- ScienceBlogging blog aggregator
Content to follow.
On a regular basis (approximately every hour) the Newsfeed Agent starts a scan. Each scan:
- It retrieves the most recent version of the
- It scans over every RSS feed.
- It passes the URL of every blog post to the Percolator.
- Includes batches of
Edits / deletion
- Events may be edited if they are found to be faulty, e.g. non-existent DOIs
- Links to blog posts are followed. If a summary of the blog post is included in the RSS feed, it is not consulted.
- RSS feeds may be taken offline.
- RSS feeds may contain incomplete data.
- RSS feeds may update too quickly for the Agent to keep up.
- Publisher sites may block the Event Data Bot collecting landing pages.
- Publisher sites may prevent the Event Data Bot collecting landing pages with robots.txt
- Blog sites may block the Event Data Bot collecting landing pages.
- Blog sites may prevent the Event Data Bot collecting landing pages with robots.txt