Reviews
Source Vitals
Source Attributes | Description |
---|---|
Data Collection Method | public Web scraping |
Latency | Higher-volume product pages are prioritized for daily coverage with sub-24 hour latency; lower-volume pages are crawled infrequently |
Geographic Coverage | global, over 80 languages |
Key Use Cases | understanding consumer opinion about products, services and destinations; employer reviews |
Delivery Methods | full stream; filtered stream; search APIs |
Data Dictionary
Reviews field name | Field description | Data example |
review.age | If available, the author's age | 25 to 34 |
review.attributeRatings | An object containing ratings of product or service attributes. The object can have any number of properties starting with zero (e.g. "Cleanliness":"80"). The attribute ratings are normalized on a 100 point scale just like overall item ratings are. | Culture & Values: "80", |
review.author | The name or alias of the reviewer | Orson Kern |
review.authorDetails | Free-form text associated with the author of the review | Current Employee - Account Manager |
review.authorName | The full name of the review's author, when known. | Ramona |
review.authorURL | If available, a link to the reviewer's profile | https:\/\/www.tripadvisor.com\/Profile\/Sfgirl1978\n |
review.avatarURL | A URL to the review author's avatar, where available | |
review.crawled | The date/time an Article was harvested from web or received in a partner feed. | 8/11/2020 8:39 |
A hash value of the review date, author and body. The value contains characters 0-9 and A-F and ranges from 9-16 characters in length. | 538f7fa82.48c0a2f709830fd59742dce17f1ac8be | |
review.item | The item being reviewed. | MatchAndTalk - Live Video Chat With Strangers |
review.itemCategory | The category the reviewed item is associated with, normally defined by the actual review site. The value can be assigned by the crawler team if a site does not have explicit item categories; for example a site that has nothing but hotel reviews may be assigned a category "Hotels" | Companies |
review.itemCategoryURL | The URL associated with the Category, where available. | \/\/www.aliexpress.com\/category\/200216280\/stowing-tidying.html |
review.itemOverallRating | A snapshot of the item's overall rating as determined by the total number of reviews on the site. Overall item ratings at the site level are normalized on a 100 point scale (where available) | 90 |
review.itemPercentRecommended | A snapshot of the percent of reviewers who recommend the item (where available) | 99 |
review.itemProperties | An object that contains information about the item reviewed. | itemProperties: { |
review.itemProperties.address | An optional property containing address information for the item being reviewed; common with certain types of reviews such as hotels or businesses. | 13115 Southwest Freeway Sugar Land, TX 77478 |
review.itemProperties.author | The author of the book or written work being reviewed. | Ann Brashares |
review.itemReviewCount | A snapshot of the total number of reviews associated with the item (where available) | 568 |
review.itemURL | The URL associated with the Item, where available (can be the same as URL). The page often contains detailed item information such as photos, specifications, etc. | https://play.google.com/store/apps/details?id=go.dev.matchandtalk |
review.language | The detected language of the review's text. See the Appendix for a list of language identifiers. | en |
review.location | If available, the author's geographic location | Bangalore |
review.modelNumb | Represents a particular model number for the product. | This is a legacy field which is no longer populated. |
review.numOfHelpfulVotes | Where available we parse data indicating the number of users who thought the review was helpful (e.g. "X of Y", where X is the number of users who voted favorably and Y is the total number of users who voted). This data is dynamic, but is only collected at the time the review is harvested. | This is a legacy field which is no longer populated. |
review.origSiteURL | Used if review was taken from some other source; represents the site URL where review originally found | This is a legacy field which is no longer populated. |
review.origURL | Used if review was taken from some other source; represents the URL where review was originally found | This is a legacy field which is no longer populated. |
review.recommend | A Boolean value (where available) representing whether the review author recommends the item (1 – "yes", 0 – "no"). | 1 |
review.reference.requestId | Unique identifier for URLs requested for lite/premium reviews. | 234 |
review.reference.projectId | Custom string value that Premium Reviews customers will be able to submit to group batches of URLs together. | “top beauty products” |
review.reference.origin | An MD5 hash value of the URL for Premium URL submissions. A full tutorial on how to use it can be found here. | c64d84310ac9d129efa6f5ad0faab72c |
review.reference.reviewProduct | Indicates where this review originated from; either | “premium” |
review.reviewDate | The date/time an Article was published. The time zone is normally unknown. | 7/2/2020 0:00 |
review.reviewRating | An object containing information about the review's rating | reviewRating: { |
review.reviewRating.RatingNum | The normalized ranking assigned by the reviewer. We use a 100 point system, so 4 of 5 stars would equate to a rating of 80. | 100 |
review.reviewRating.RawScore | Describes the rating as found on the Review site (i.e. non-normalized rating) | 5 |
review.reviewType | The type of review. This value can be 'consumer' or 'professional'. | consumer |
review.sex | If available, the author's sex | M |
review.siteAlias | This name or alias of the website where the review was harvested from (e.g. Amazon.com, Walmart) | |
review.siteCountry | The country identifier (based on ISO 3166 2-letter codes) best associated with the Review Site. | US |
review.siteID | A unique ID generated by the crawler to identify the website where the review was harvested from. The value contains characters 0-9 and A-F and can range from 9-16 characters in length. | 538f7fa82 |
review.siteURL | The URL of the website where the review was harvested from. | |
review.subject | The title of the review | working under PSO/DPS- DC and Cloud Team |
review.text | The text of the review | Nice interface and visual, MatchAndTalk is just certainly the best dating app |
review.textCons | Where available we parse cons; each con is included in a comma separated list | Top level politics within the department |
review.textPros | Where available we parse pros; each pro is included in a comma separated list | Flexibility which makes the work environment interesting. |
review.url | The URL where the review was found | https://play.google.com/store/apps/details?id=go.dev.matchandtalk&showAllReviews=true |
Sample Review Message
{
"review": {
"id": "164.4a8dbe082.6e9eb3046c5f47f612606011ad1c159c",
"reference": {
"requestId": "164",
"projectId": "",
"origin": "",
"reviewProduct": "premium"
},
"subject": "",
"text": " ",
"textPros": "",
"textCons": "",
"reviewDate": "2022-08-01 00:00:00",
"crawled": "2022-08-18 10:58:03",
"reviewType": "consumer",
"reviewRating": {
"RawScore": "8.0",
"ratingNum": "80"
},
"recommend": "",
"numOfHelpfulVotes": "",
"attributeRatings": {},
"author": "Over-60-yrs-travelling",
"authorName": "",
"authorURL": "",
"avatarURL": "",
"location": "Italy",
"modelNum": "",
"age": "",
"sex": "",
"authorDetails": "",
"url": "https://www.booking.com/hotel/gb/the-beauchamp.us.html",
"siteID": "4a8dbe082",
"siteAlias": "booking.com",
"siteURL": "http://www.booking.com",
"siteCountry": "US",
"origSiteURL": "",
"origURL": "",
"language": "unknown",
"itemCategory": "Home United Kingdom Greater London London Camden",
"itemCategoryURL": "https://www.booking.com/district/gb/london/camden.en-gb.html",
"item": "Hotel\nGrange Beauchamp Hotel",
"itemURL": "https://www.booking.com/hotel/gb/the-beauchamp.us.html",
"itemOverallRating": "70",
"itemReviewCount": "1401",
"itemPercentRecommended": "",
"itemProperties": {
"address": "24 - 27 Bedford Place, Bloomsbury, Camden, London, WC1B 5JH, United Kingdom",
"author": ""
}
}
}