Blogs

Blogs

Source Vitals

Source Attributes

Description

Source Attributes

Description

Data Collection Method

public RSS feeds

Latency

Generally under 24 hours

Geographic Coverage

global, over 170 languages

Key Use Cases

product & market research; company reputation analysis; influencer identification

Delivery Methods

full stream; filtered stream; search API

 

Data Dictionary

Streaming Json Schema Mapping

 

Element

Description

Data Example

authoremail

The author's email address.

null

authorname

Name of the author who submitted the comment.

Terra Trevor

authorurl

The url to the author's profile page.

http://www.blogger.com/profile/12772209186942182039

bloghost

The title of the blog.

blogger

bloghostid

For internal use only.

1

blogid

An incrementing numeric number to uniquely identify blog records.

87896168

blogtitle

Title of the blog

Terra Trevor

category

The post category value specified by the author of the post

Writing

content

The content of the comment.

"<div style=\"text-align: right;\"><div style=\"text-align: left;\"><a href=\"http://terratrevor.blogspot.com/\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\">Read Terra's Blog</span></a></div></div><div style=\"text-align: right;\"><div style=\"text-align: left;\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\"><br /></span></div></div><div style=\"text-align: left;\"><div style=\"text-align: right;\"><div style=\"text-align: left;\"><b><i><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\"><a href=\"http://terratrevor.blogspot.com/\">Writing, Reading and Living</a></span></i></b></div></div></div><div style=\"text-align: left;\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\">For me, writing is a way of reaching out to others, to people I don't know. I sit alone, in silence, but all that time I’m out there, connecting with whoever reads my words.</span></div>",

country

The source country identifier based on 2-letter codes from ISO-3166

us

generator

The generator field we get from the feed, which describes the software used for the blog technology.

https://wordpress.org/?v=4.9.2

guid

The md5 hash value of the post permanent url

52d0f0359b2f6ec49c69d499fa8c40b5

lang

The detected language at the comment level.

en

Link/href

The link to the feed

http://www.terratrevorauthor.com/feeds/posts/default

Link/rel

Always set to "alternate"

alternate

Link/type

Indicates the feed type, it can have one of the following values:
application/rss+xml :if the feed type is rss.
application/atom+xml: if the feed type is atom.
UNKNOWN : if we cannot determine the feed type.

application/atom+xml

mainurl

The url to the blog homepage

http://www.terratrevorauthor.com/

parseddate

The date time when we collected the comment.

2020-08-08T20:01:28

post

A parent element to contain a single post.

"post": {

postid

This is a guid value that is created on the fly when we extracted the post from the source. We do not track this value internally (e.g. not the same as post ID retrieved via API)

db047f80-c858-465b-a01e-b54430d99478

postlink

The url to the post permanent link

http://www.terratrevorauthor.com/2020/08/read-terras-blog-writing-reading-and.html

providerid

A numeric ID for the blog content provider.

0

pubdate

The post publication date in UTC format

2020-08-06T09:00:00

source

name of the channel that the item came from

null,

title

The title of the post or comment

Reading, Writing and Living

updated

Updated date If the post was updated (applies only to blogs with atom feeds)

2020-08-06T15:24:03

 

 

 

Sample Post Message

 

{ "post": { "Link": { "rel": "alternate", "type": "application/atom+xml", "href": "http://differentpenproductions.blogspot.com/feeds/posts/default" }, "bloghostid": "1", "providerid": "0", "bloghost": "blogger", "country": null, "generator": "Blogger v7.00 (http://www.blogger.com/)", "postid": "8570602b-2cc1-f482-73cd-6b7b0afda54d", "blogid": "68738433", "sourceguid": "7335a9866927c2d334a3e5a3965421e3", "title": "the five decades signpost", "blogtitle": "Different Pen", "mainurl": "http://differentpenproductions.blogspot.com/", "postlink": "http://differentpenproductions.blogspot.com/2024/08/the-five-decades-signpost.html", "content": "<p>I'm loved by God. I don't need additional love.</p><p>I carry faith (trust instead of worry), hope (joy and beauty) and love.&nbsp;</p><p>If I lose faith and even hope, love will still be there.</p><p>I don't have to live up to any norms and expectations, explain myself or submit to shame.</p><p>I will never marry. I'm set apart for something higher.</p><p>The kingdom of God is near and I'm bringing friends.</p>", "authorname": "Different Pen", "authoremail": null, "authorurl": "http://www.blogger.com/profile/02713516046183679147", "category": "de profundis,dreams,poet facts", "guid": "6eb91ce066a3963a9dfde608c1e06513", "source": null, "pubdate": "2024-08-09T12:30:00", "updated": "2024-08-09T12:30:41", "parseddate": "2024-08-10T01:57:14", "lang": "en" } }

 

Sample Comment Message

 

{ "comment": { "bloghostid": "1", "bloghost": "blogger", "providerid": "0", "provider": "", "discoveryMethod": "2", "generator": "Blogger v7.00 (http://www.blogger.com/)", "sourcetype": "blogs", "sourceid": "88769412", "sourceurl": "http://lion-muthucomics.blogspot.com/", "sourceguid": "cb5272ad834374baa5e9d67b1ccecaa5", "sourcecrawled": "2020-02-02T00:44:30", "sourcelanguage": "ta", "sourcetitle": "Lion-Muthu Comics", "country": null, "postlink": "http://lion-muthucomics.blogspot.com/2020/02/blog-post.html", "postguid": "c1127db48f7b5e477405c068b4834686", "postpublished": "2020-02-01T18:46:00.000Z", "posttitle": "அன்போடு அண்ணாத்தே !!", "commentid": "08a6b353-540a-6ea1-2f13-8d3d16a7606e", "commentlink": "http://lion-muthucomics.blogspot.com/2020/02/blog-post.html?showComment=1580739049577#c3189066246105663132", "title": "மாண்ட்ரேக்கை வழிமொழின்றேன்", "content": "மாண்ட்ரேக்கை வழிமொழின்றேன்", "authorname": "R.வெங்கடேசன்", "authorurl": "https://www.blogger.com/profile/10499829746774793680", "authoremail": null, "pubdate": "2020-02-03T14:10:49", "parseddate": "2024-08-09T01:17:27", "lang": "ta" } }

 

 

REST API Json Schema Mapping

 

Element or Attribute Name

Description

Included in Response? (based on mode parameter)

Basic

Full

Id

Unique ID of the Post

x

x

Subject

The Post's title. For Blog comments, this value is sometimes generated based on the author's alias or the original Post's title

x

x

Text

The text of the Post

x

x

Text/@truncated

When present, a value of 'true' indicates that the contents in the Text element have been abbreviated.

x

x

SubjectHtml

The HTML representation of the Post's title. This element is only displayed if body=html or both.

x

x

TextHtml

The HTML representation of the Post. This element is only displayed if body=html or both.

x

x

TextHtml/@truncated

When present, a value of 'true' indicates that the contents in the TextHtml element have been abbreviated.

x

x

PostTitle

The Post's title. This is redundant for Blog posts, but is included so there is a way to provide the blog post title with comments.

 

x

ThreadId

A unique identifier of the Thread (i.e. a blog post and its related comments)

x

x

Published

The date/time the Post was published (GMT)

x

x

Inserted

The date/time the Post was inserted into the BoardReader database (GMT)

x

x

Crawled

The date/time the Post was harvested by our crawler (GMT).

 

x

AuthorInfo/Id

Unique author identifier.

 

x

AuthorInfo/Url

Where available, a link to the author's profile

 

x

AuthorInfo/Name

Where available, the author's name

 

x

AuthorInfo/Nick

Where available, the author's alias or handle

 

x

Url

The URL where the Post was found (i.e. permalink page). For Blog Comments, the URL may include the anchor to the actual Comment.

x

x

Language

The language of the Post's text

x

x

Country

The country that best represents where the source is located based on the site's participants. Country identifiers are 2-letter country code from ISO 3166.

x

x

Tags

A string containing the terms used by the blog author to "tag" the post.

 

x

IsComment

A Boolean value that specifies if a Post is Comment or not. Values are '0' (false) and '1' (true).

 

x

FeedInfo/Url

The URL of the Blog site

 

x

FeedInfo/Id

The preferred unique identifier for the Blog site where the Post or Comment appeared

 

x

FeedInfo/ExtKey

A reference ID for the site. 

 

x

FeedInfo/Title

The title of the Blog site

 

x

Type

A string indicating the content type of the Post. Possible values are 'blog' and 'blogcomment'

 

x

PostSize

The size of the Post in characters

x

x

IsAdult

A Boolean value indicating if the Post is known or suspected to contain adult-oriented content

 

x

CommentsInThread

Obsolete

x

x

Sample Post Message

 

{ "Id": 71851995862853120, "Text": " A Yale Medicine gastroenterologist discusses SIBO (small intestinal bacterial overgrowth), a condition that can cause bloating and other symptoms. URL: https:\/\/www.yalemedicine.org\/news\/ibs-sibo-small-intestinal-bacterial-overgrowth-or-both-3-things-to-know ", "Subject": "IBS, SIBO (small intestinal bacterial overgrowth), or both? What to know", "PostTitle": "IBS, SIBO (small intestinal bacterial overgrowth), or both? What to know", "ThreadId": "52e668398cb6e82c859572e35b8547b0", "Published": "2024-08-15 19:31:35", "Inserted": "2024-08-16 02:49:25", "InsertedTs": 1723776565, "Crawled": "2024-08-16 02:47:10", "AuthorInfo": { "Id": "", "Url": "", "Name": "", "Nick": "JoAnn Piscitelli" }, "Url": "https:\/\/communications.yale.edu\/ibs-sibo-small-intestinal-bacterial-overgrowth-or-both-what-know", "Language": "English", "Country": "", "Tags": "", "IsComment": 0, "FeedInfo": { "Id": "", "ExtKey": "8353902030b1e475c97034a73018c90a", "Url": "https:\/\/communications.yale.edu\/", "Title": "Office of Public Affairs & Communications" }, "Type": "blog", "PostSize": 754, "CommentsInThread": 0, "IsAdult": 0 }