Blogs
Source Vitals
Source Attributes | Description |
|---|---|
Data Collection Method | public RSS feeds |
Latency | Generally under 24 hours |
Geographic Coverage | global, over 170 languages |
Key Use Cases | product & market research; company reputation analysis; influencer identification |
Delivery Methods | full stream; filtered stream; search API |
Data Dictionary
Streaming Json Schema Mapping
Element | Description | Data Example |
authoremail | The author's email address. | null |
authorname | Name of the author who submitted the comment. | Terra Trevor |
authorurl | The url to the author's profile page. | |
bloghost | The title of the blog. | blogger |
bloghostid | For internal use only. | 1 |
blogid | An incrementing numeric number to uniquely identify blog records. | 87896168 |
blogtitle | Title of the blog | Terra Trevor |
category | The post category value specified by the author of the post | Writing |
content | The content of the comment. | "<div style=\"text-align: right;\"><div style=\"text-align: left;\"><a href=\"http://terratrevor.blogspot.com/\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\">Read Terra's Blog</span></a></div></div><div style=\"text-align: right;\"><div style=\"text-align: left;\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\"><br /></span></div></div><div style=\"text-align: left;\"><div style=\"text-align: right;\"><div style=\"text-align: left;\"><b><i><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\"><a href=\"http://terratrevor.blogspot.com/\">Writing, Reading and Living</a></span></i></b></div></div></div><div style=\"text-align: left;\"><span style=\"font-family: \"georgia\" , \"times new roman\" , serif;\">For me, writing is a way of reaching out to others, to people I don't know. I sit alone, in silence, but all that time I’m out there, connecting with whoever reads my words.</span></div>", |
country | The source country identifier based on 2-letter codes from ISO-3166 | us |
generator | The generator field we get from the feed, which describes the software used for the blog technology. | |
guid | The md5 hash value of the post permanent url | 52d0f0359b2f6ec49c69d499fa8c40b5 |
lang | The detected language at the comment level. | en |
Link/href | The link to the feed | |
Link/rel | Always set to "alternate" | alternate |
Link/type | Indicates the feed type, it can have one of the following values: | application/atom+xml |
mainurl | The url to the blog homepage | |
parseddate | The date time when we collected the comment. | 2020-08-08T20:01:28 |
post | A parent element to contain a single post. | "post": { |
postid | This is a guid value that is created on the fly when we extracted the post from the source. We do not track this value internally (e.g. not the same as post ID retrieved via API) | db047f80-c858-465b-a01e-b54430d99478 |
postlink | The url to the post permanent link | http://www.terratrevorauthor.com/2020/08/read-terras-blog-writing-reading-and.html |
providerid | A numeric ID for the blog content provider. | 0 |
pubdate | The post publication date in UTC format | 2020-08-06T09:00:00 |
source | name of the channel that the item came from | null, |
title | The title of the post or comment | Reading, Writing and Living |
updated | Updated date If the post was updated (applies only to blogs with atom feeds) | 2020-08-06T15:24:03 |
Sample Post Message
{
"post": {
"Link": {
"rel": "alternate",
"type": "application/atom+xml",
"href": "http://differentpenproductions.blogspot.com/feeds/posts/default"
},
"bloghostid": "1",
"providerid": "0",
"bloghost": "blogger",
"country": null,
"generator": "Blogger v7.00 (http://www.blogger.com/)",
"postid": "8570602b-2cc1-f482-73cd-6b7b0afda54d",
"blogid": "68738433",
"sourceguid": "7335a9866927c2d334a3e5a3965421e3",
"title": "the five decades signpost",
"blogtitle": "Different Pen",
"mainurl": "http://differentpenproductions.blogspot.com/",
"postlink": "http://differentpenproductions.blogspot.com/2024/08/the-five-decades-signpost.html",
"content": "<p>I'm loved by God. I don't need additional love.</p><p>I carry faith (trust instead of worry), hope (joy and beauty) and love. </p><p>If I lose faith and even hope, love will still be there.</p><p>I don't have to live up to any norms and expectations, explain myself or submit to shame.</p><p>I will never marry. I'm set apart for something higher.</p><p>The kingdom of God is near and I'm bringing friends.</p>",
"authorname": "Different Pen",
"authoremail": null,
"authorurl": "http://www.blogger.com/profile/02713516046183679147",
"category": "de profundis,dreams,poet facts",
"guid": "6eb91ce066a3963a9dfde608c1e06513",
"source": null,
"pubdate": "2024-08-09T12:30:00",
"updated": "2024-08-09T12:30:41",
"parseddate": "2024-08-10T01:57:14",
"lang": "en"
}
}
Sample Comment Message
{
"comment": {
"bloghostid": "1",
"bloghost": "blogger",
"providerid": "0",
"provider": "",
"discoveryMethod": "2",
"generator": "Blogger v7.00 (http://www.blogger.com/)",
"sourcetype": "blogs",
"sourceid": "88769412",
"sourceurl": "http://lion-muthucomics.blogspot.com/",
"sourceguid": "cb5272ad834374baa5e9d67b1ccecaa5",
"sourcecrawled": "2020-02-02T00:44:30",
"sourcelanguage": "ta",
"sourcetitle": "Lion-Muthu Comics",
"country": null,
"postlink": "http://lion-muthucomics.blogspot.com/2020/02/blog-post.html",
"postguid": "c1127db48f7b5e477405c068b4834686",
"postpublished": "2020-02-01T18:46:00.000Z",
"posttitle": "அன்போடு அண்ணாத்தே !!",
"commentid": "08a6b353-540a-6ea1-2f13-8d3d16a7606e",
"commentlink": "http://lion-muthucomics.blogspot.com/2020/02/blog-post.html?showComment=1580739049577#c3189066246105663132",
"title": "மாண்ட்ரேக்கை வழிமொழின்றேன்",
"content": "மாண்ட்ரேக்கை வழிமொழின்றேன்",
"authorname": "R.வெங்கடேசன்",
"authorurl": "https://www.blogger.com/profile/10499829746774793680",
"authoremail": null,
"pubdate": "2020-02-03T14:10:49",
"parseddate": "2024-08-09T01:17:27",
"lang": "ta"
}
}
REST API Json Schema Mapping
Element or Attribute Name | Description | Included in Response? (based on mode parameter) | |
|---|---|---|---|
Basic | Full | ||
Id | Unique ID of the Post | x | x |
Subject | The Post's title. For Blog comments, this value is sometimes generated based on the author's alias or the original Post's title | x | x |
Text | The text of the Post | x | x |
Text/@truncated | When present, a value of 'true' indicates that the contents in the Text element have been abbreviated. | x | x |
SubjectHtml | The HTML representation of the Post's title. This element is only displayed if body=html or both. | x | x |
TextHtml | The HTML representation of the Post. This element is only displayed if body=html or both. | x | x |
TextHtml/@truncated | When present, a value of 'true' indicates that the contents in the TextHtml element have been abbreviated. | x | x |
PostTitle | The Post's title. This is redundant for Blog posts, but is included so there is a way to provide the blog post title with comments. |
| x |
ThreadId | A unique identifier of the Thread (i.e. a blog post and its related comments) | x | x |
Published | The date/time the Post was published (GMT) | x | x |
Inserted | The date/time the Post was inserted into the BoardReader database (GMT) | x | x |
Crawled | The date/time the Post was harvested by our crawler (GMT). |
| x |
AuthorInfo/Id | Unique author identifier. |
| x |
AuthorInfo/Url | Where available, a link to the author's profile |
| x |
AuthorInfo/Name | Where available, the author's name |
| x |
AuthorInfo/Nick | Where available, the author's alias or handle |
| x |
Url | The URL where the Post was found (i.e. permalink page). For Blog Comments, the URL may include the anchor to the actual Comment. | x | x |
Language | The language of the Post's text | x | x |
Country | The country that best represents where the source is located based on the site's participants. Country identifiers are 2-letter country code from ISO 3166. | x | x |
Tags | A string containing the terms used by the blog author to "tag" the post. |
| x |
IsComment | A Boolean value that specifies if a Post is Comment or not. Values are '0' (false) and '1' (true). |
| x |
FeedInfo/Url | The URL of the Blog site |
| x |
FeedInfo/Id | The preferred unique identifier for the Blog site where the Post or Comment appeared |
| x |
FeedInfo/ExtKey | A reference ID for the site. |
| x |
FeedInfo/Title | The title of the Blog site |
| x |
Type | A string indicating the content type of the Post. Possible values are 'blog' and 'blogcomment' |
| x |
PostSize | The size of the Post in characters | x | x |
IsAdult | A Boolean value indicating if the Post is known or suspected to contain adult-oriented content |
| x |
CommentsInThread | Obsolete | x | x |
Sample Post Message
{
"Id": 71851995862853120,
"Text": " A Yale Medicine gastroenterologist discusses SIBO (small intestinal bacterial overgrowth), a condition that can cause bloating and other symptoms. URL: https:\/\/www.yalemedicine.org\/news\/ibs-sibo-small-intestinal-bacterial-overgrowth-or-both-3-things-to-know ",
"Subject": "IBS, SIBO (small intestinal bacterial overgrowth), or both? What to know",
"PostTitle": "IBS, SIBO (small intestinal bacterial overgrowth), or both? What to know",
"ThreadId": "52e668398cb6e82c859572e35b8547b0",
"Published": "2024-08-15 19:31:35",
"Inserted": "2024-08-16 02:49:25",
"InsertedTs": 1723776565,
"Crawled": "2024-08-16 02:47:10",
"AuthorInfo": {
"Id": "",
"Url": "",
"Name": "",
"Nick": "JoAnn Piscitelli"
},
"Url": "https:\/\/communications.yale.edu\/ibs-sibo-small-intestinal-bacterial-overgrowth-or-both-what-know",
"Language": "English",
"Country": "",
"Tags": "",
"IsComment": 0,
"FeedInfo": {
"Id": "",
"ExtKey": "8353902030b1e475c97034a73018c90a",
"Url": "https:\/\/communications.yale.edu\/",
"Title": "Office of Public Affairs & Communications"
},
"Type": "blog",
"PostSize": 754,
"CommentsInThread": 0,
"IsAdult": 0
}