Message Boards
Source Vitals
Source Attributes | Description |
|---|---|
Data Collection Method | public Web scraping |
Geographic Coverage | global, 170 languages |
Key Use Cases | product & market research; company reputation analysis; influencer identification |
Delivery Methods | full stream; filtered stream; search API |
Datasets
The Message Boards service offers access to several datasets which are licensed separately:
General Boards, covering more than 400,000 sites
Discuz, a collection of 700 primarily Chinese-language sites using the Tencent Discuz message board framework
Imageboards including 4chan.org and 5ch.net
VerticalScope, a group of 400 enthusiast message boards focused on automotive topics
Streaming Data Dictionary
Field | Definition | Data Example |
boardname | The name the message board site is known by. | Range Rover Evoque Forums |
siteid | An ID generated to uniquely identify the Site. | 54f626f63bc |
forumid | An ID generated to uniquely identify the Forum. | 5c5932a72 |
forumname | Forum name (from Message Board) | https//www.evoqueownersclub.co.uk/forum/forumdisplay.php?f=49 |
forumurl | Specific Forum home page (URL) | https//www.evoqueownersclub.co.uk/forum/forumdisplay.php?f=49 |
ThreadID | This identifier represents a unique Thread on a given forum. It can contain alphanumeric characters | 17211 |
threadurl | A link to the first page of a Thread. | |
threadtitle | The title of the Thread. | Dpf warning light |
postid | A unique identifier assigned to each post by Socialgist. The value is generated as follows, dependent on the type of site uniqueness check the harvester uses for de-duplication | 54f626f63bc.291174 |
sitepostid | The "native" ID of the post as assigned by the message board site, where available. In rare cases, this value is synthetically created by our harvester based on data available in the parsed post. The value is normally unique across the entire site, but this is not guaranteed. Uniquess is guaranteed at the forum level. | 291174 |
parentid | The "native" ID of the parent post on sites where our harvesting supports hierarchical threads. |
|
Url | The URL where post was found (i.e. Thread page where post found). In some cases the URL provided may link to the first page of the Thread regardless of which page the post is found on. | |
Urlwithanchor | The URL where the post was found plus the anchor to locate the individual post on the page. | |
anchor | If available, a value that points to the location of an individual post within a page. | This field is no longer populated. |
MainUrl | The URL associated with the post's source (i.e. Site URL) | |
content | The parent element for data associated with the post | content |
content/date | The date/time a post was published as it appeared on the message board at the time of harvesting (nothing can be implied about the time zone by the provided date) | 2020-08-13 15:27:10 |
content/subject | The post's title (i.e. Thread Title and/or Post Title) | This field is no longer populated. Please use the threadtitle field. |
content/author | The identifier for the post's author | Spurds |
content/AuthorUrl | If available, a link to author's profile | |
content/avatarurl | If available, the URL to the author's avatar | This field is no longer populated. |
content/registered | Where available, the date author registered on the message board | This field is no longer populated. |
content/location | If available, the author's geographic location as found on board/posting (i.e. the text string is not validated or normalized in any way and may contain anything the author choose to divulge) | GB |
content/age | If available, the author's age (can be a number or a range like 5-10) | This field is no longer populated. |
content/sex | If available, the author's sex | This field is no longer populated. |
content/text | The full text of the post | Morning. I'm back again with same fault on my 2016 Evoque with 2l Ingenium diesel . Last week it was at dealer as Blocked filter was on for the 2rd time once again a regen and updated software from JLR . But after 400 mls the lights back . Is there a answer to this . Dont get me wrong my Rover dealer is doing his best for me .... do I go for a new DPF ... could this light be on because of another fault . The only fault code is DPF full .... any advice would be helpful please .... |
content/htmltext | The full text of the post including the HTML tags such as href tags. This is an optional element and must be specified when selecting a data feed configuration. If included, this element will increase the size of the data feed files that Effyis delivers. | Morning. I'm back again with same fault on my 2016 Evoque with 2l Ingenium diesel . Last week it was at dealer as Blocked filter was on for the 2rd time once again a regen and updated software from JLR . But after 400 mls the lights back . Is there a answer to this . Dont get me wrong my Rover dealer is doing his best for me .... do I go for a new DPF ... could this light be on because of another fault . The only fault code is DPF full .... any advice would be helpful please .... |
topics | The site topic | Social |
categories | The site category | General Talk |
Crawled | The date/time a post was retrieved by our web harvester. | 2020-08-23 12:48:21 |
language | The language of the post. | English |
languageCode | The language identifier associated with the post. These identifiers generally match the two-letter language code from ISO 639. If a language does not have a 2-letter code we use the 3-letter code (e.g. Filipino is 'fil'). Text identified as Chinese is output as either 'zh-cn' (Chinese - Simplified) or 'zh-tw' (Chinese – Traditional). The identifier 'unknown' means we are not able to reliably identify the language. See the table below for the full list of supported languages. | en |
threadstarter | A value to indicate if the post started the thread (value is 1) or was in response to another post (value is 0) | 1 |
sentiment | If available, a value (e.g. Strong Sell, Buy) generated by the post's author representing their enthusiasm for a specific security or investment. The value is only gathered for a very small number of message boards that target investors (e.g. Investor Village). | This field is no longer populated. |
recommendation | If available, a value generated by other users representing their rating of an author/post. The value is only gathered for a very small number of message boards that target investors (e.g. Investor Village). | This field is no longer populated. |
ticker | If available, a value assigned by Socialgist at the site or forum level that indicates an post is likely related to a specific stock market symbol (e.g. MSFT, GOOG). The value is only gathered for a small number of message boards that target investors (e.g. Investor Village, Raging Bull, Yahoo Finance). | This field is no longer populated. |
signature | If available, the author's signature text, which can include hobbies, URLs, etc… | This field is no longer populated. |
countrycode | If available, the 2-letter country code from ISO 3166 that specifies the source country of the message board. | GB |
providerid | An internal identifier used to identify the source of premium content. | 120 |
gmt | The GMT offset +/- from the source site, if provided | 0 |
boardname | The name the message board site is known by. | Range Rover Evoque Forums |
REST API Data Dictionary
Element or Attribute Name | Description | Included in Response? (based on mode parameter) | |
|---|---|---|---|
Basic | Full | ||
Id | Unique ID of the Post | x | x |
Subject | The Post's title. This value is sometimes inherited from the Thread's title. | x | x |
Text | The text of the Post | x | x |
Signature | The boilerplate text string used by the Post's author (parsed where available and feasible) |
| x |
SignatureHtml | The HTML representation of the Signature section of the post. This element is only displayed if body=html or both. |
| x |
SubjectHtml | The HTML representation of the Post's title. This element is only displayed if body=html or both. | x | x |
SubjectHtml | The HTML representation of the Post's title. This element is only displayed if body=html or both. | x | x |
TextHtml | The HTML representation of the Post. This element is only displayed if body=html or both. | x | x |
ThreadTitle | The title of the Thread |
| x |
ExtKey | A reference ID for the site. |
| x |
Published | The date/time the Post was published (as found on actual message board; time zone is unknown, but assumed EST/EDT). Please see the Message Boards publication date example in the Getting Started section for more details | x | x |
Inserted | The date/time the Post was inserted into the BoardReader database (GMT) | x | x |
Updated | The date/time the post was last updated (GMT) |
| x |
Crawled | The date/time the Post was harvested by our crawler (GMT). |
| x |
Author | The identifier for the Post's author |
| x |
AuthorAvatarUrl | Where available, a link to the author's avatar |
| x |
AuthorId | Always empty (left for backward compatibility purposes). |
| x |
AuthorInfo/Url | Where available, a link to the author's profile |
| x |
AuthorInfo/Location | Where available, the author's geographic location |
| x |
AuthorInfo/Sex | Where available, the author's sex |
| x |
AuthorInfo/Age | Where available, the author's age |
| x |
AuthorInfo/Registered | Where available, the date author registered on the message board |
| x |
AuthorInfo/Name | Where available, the author's name |
| x |
AuthorInfo/KloutScore | Where available, the author's Klout score |
| x |
Url | The URL where the Post was found (i.e. Thread page); may include the anchor to the actual Post. | x | x |
UrlWithAnchor | The URL where the Post was found (i.e. Thread page) with the anchor to the actual Post. |
| x |
SiteId | A unique identifier for the site where the Post appeared (preferred site identifier) | x | x |
SiteKey | A unique identifier for the site where the Post appeared (from Message Board Crawler; for internal use) |
| x |
SiteUrl | The root URL where the message board is found |
| x |
SiteTitle | The title that the message board is known as | x | x |
Domain | The 2nd level (e.g. http://site.com ) or 3rd level (e.g. http://site.com.mx ) domain identifier extracted from the SiteUrl | x | x |
LinksCount | Contains the number of external links referred to the domain name of the post. The value is only valid for the 100,000 domains with the most external links pointing to them. This is determined on a weekly basis. | x | x |
Icon | The URL associated with the site's favicon | x | x |
Language | The language of the Post's text (e.g. English) | x | x |
Country | The 2-letter country code from ISO 3166 that specifies the source country of the message board. | x | x |
BBSType | A text string assigned by our crawler to denote the type of message board technology and parser used to harvest content from the site. For internal use. | x | x |
ForumId | A unique identifier of the forum where the Post appeared (this is the preferred site identifier). There is one or more Forums associated with a Site. | x | x |
ForumKey | A unique identifier of the forum where the Post appeared (from Message Board Crawler; for internal use only). |
| x |
Forum | The name of the Forum as assigned by the web master | x | |