Youtube
Youtube videos and comments are discovered and collected through YouTube channel subscriptions and generic keyword searches.
Average volumes - 885,833 new videos & 23,677,166 new comments.
Streaming Json Schema Mapping
Videos field name | Field description | Data example |
video | The parent element that contains information about an individual video | "video": { |
id | An ID generated by the crawler to uniquely identify each video. | yt.FauOVXgJ0zU |
siteid | A unique text value that identifies the video site (e.g. youtube). | youtube |
site_videoid | An identifier created by the video site to uniquely identify each video | FauOVXgJ0zU |
title | The title of the video. | maaf jelek |
description | The description of the video which is not always included. |
|
thumb_url_code | The image of the screenshot of the actual video. | |
tags | Tags that describe the content of the video. Not always included. | [] |
category | The category the video was assigned | People and Blogs |
published | The date/time the video was published (UTC) | 2024-08-13T13:11:01Z |
crawled | The date/time the video was harvested (UTC) | 2024-08-16T14:30:11Z |
duration | The length of the video in seconds | 17 |
videourl | The URL where the video was found. |
|
author | The parent element that contains author information | author: { |
author/name | The name of the author who submitted the video | Prinsca gaming |
author/site_authorid | An identifier created by the video site to uniquely identify each author | UCwmB2IO7xlu4fmTdH6Ul36A |
author/authorurl | The URL to the author's profile page | |
lang | The language code to identify the language used in comment. | id8 |
langid | The socialgist internal mapped id for this language | 40 |
comment | The parent element that contains information about an individual comment | comment: { |
id | An ID generated by the crawler to uniquely identify each video or comment. | yt.Ugxb8X5GgmyTo0M24FF4AaABAg |
videoid | The ID of the Video that the comment is associated with. This is used by BoardReader as the Thread ID. | yt.9IHwqdz8Xhw |
siteid | A unique text value that identifies the video site (e.g. youtube). | youtube |
site_commentid | An identifier created by the video site to uniquely identify each comment | UgxPGSMDXNWXumpJw8d4AaABAg |
title | Some video sites include a title for each comment. Most cases we just add part of the first sentence of the comment text. | অতীত এর ভুল থেকে শিক্ষা নিবেন। এ দেশ হতে আ লিগ শব্দ টা চিরতরে |
content | The text of the comment. | অতীত এর ভুল থেকে শিক্ষা নিবেন। এ দেশ হতে আ লিগ শব্দ টা চিরতরে কবর দিতে হবে জনগনের কাছে আইডল হতে হলে পরানতিক পর্যায়ে রিদয়ের ভিতর ডুকতে হলে জিয়াউর রহমানের আদর্শ উপলব্ধি করবেন। নির্বাচন নিয়ে মাথা….. |
published | The date/time the video was published (UTC) | 2024-08-15T18:56:03Z |
crawled | The date/time the video was harvested (UTC) | 2024-08-16T13:50:42Z |
videourl | The URL where the video was found. |
|
commentsurl | A URL that displays all the comments for the video. For YouTube this is limited to the most recent 1,000 comments. |
|
author | The author object element that contains information about the author of the comment | "author": { |
author/name | The comment author’s name or video account name | @kingtexzone5051 |
author/site_authorid | The video site specific id for the author | UCUb5_lzoVTJ67OienqU5FLQ |
author/authorurl | The url for the author of the comment. In some cases it can also be the author of the video. | |
author/profile_picture | The author profile avatar or picture. | |
lang | The language code to identify the language used in comment. | bn |
langid | The socialgist internal mapped id for this language | 8 |
SAMPLE MESSAGE TYPES
Video
{
"video": {
"id": "yt.MK6nBKODkVE",
"siteid": "youtube",
"site_videoid": "MK6nBKODkVE",
"title": "🟢 Thomas train exe vs Sonic the headgehog exe vs Siren Head vs Spider House Head 🌟 Who is best?",
"description": "🟢 Thomas train exe vs Sonic the headgehog exe vs Siren Head vs Spider House Head 🌟 Who is best? Tiles Hop EDM\n\n🎥🔥 Thanks for watching! Don't forget to leave a comment and be sure to watch to the end! 🙌💬🌟\n\n\n🎶 Welcome to the mesmerizing world of Adventure TilesHop! 🌟\n\nMy content uses some images and songs from other creators and brands to create music gaming videos for relaxation and entertainment. I have used these images and songs based on the fair use guidelines of Section 107 of the Copyright Act..\nThis content is intended for recreational and entertainment purposes only.\nThis is a music game. Please use headphones for the best experience.\n\nLike 👍\nComment ✍️\nSubscribe ✅\nShare 🙏\n\nFor more gaming content subscribe to Adventure TilesHop\n\n#tileshop #tileshopeveryday #coffindance #choochoocharles #sirenhead #mcqueeneater #thomastrainexe",
"thumb_url_code": "https://i.ytimg.com/vi/MK6nBKODkVE/default_live.jpg",
"tags": [
"tiles hop",
"tiles hop every day",
"choo choo charles tiles hop",
"lightning mcqueen tiles hop",
"skibidi toilet tiles hop",
"thomas train tiles hop",
"spider thomas tiles hop",
"coffin dance tiles hop",
"tiles hop edm rush",
"car eater tiles hop",
"sonic tiles hop",
"sonic exe tiles hop",
"death sonic tiles hop",
"siren head tiles hop",
"eater tiles hop",
"tiles hop song",
"coffin dance",
"sonic the hedgehog tiles hop",
"sonic exe coffin dance",
"siren head coffin dance",
"house head tiles hop",
"thomas train exe"
],
"category": "Gaming",
"published": "2024-08-16T11:28:45Z",
"crawled": "2024-08-16T15:47:20Z",
"duration": "3143",
"videourl": "https://www.youtube.com/watch?v=MK6nBKODkVE",
"author": {
"name": "Adventure TilesHop",
"site_authorid": "UCiW8EXSscOMq2pQ-lGsYMsw",
"authorurl": "http://youtube.com/channel/UCiW8EXSscOMq2pQ-lGsYMsw"
},
"lang": "en",
"langid": "22"
}
}
Comment
{
"comment": {
"id": "yt.UgxdiWsNRHBmpDlNggl4AaABAg.A79Qt_KSQPfA79h8Tq71DS",
"videoid": "yt.s7BjsDqHsto",
"siteid": "youtube",
"site_commentid": "UgxdiWsNRHBmpDlNggl4AaABAg.A79Qt_KSQPfA79h8Tq71DS",
"title": "How to tell I know nothing about motorcycle without saying I know nothing about motorcycle👆🏼",
"content": "How to tell I know nothing about motorcycle without saying I know nothing about motorcycle👆🏼",
"published": "2024-08-15T12:25:45Z",
"crawled": "2024-08-16T15:47:51Z",
"videourl": "https://www.youtube.com/watch?v=s7BjsDqHsto",
"commentsurl": "https://www.youtube.com/watch?v=s7BjsDqHsto&lc=UgxdiWsNRHBmpDlNggl4AaABAg.A79Qt_KSQPfA79h8Tq71DS",
"author": {
"name": "@surendarmohan9878",
"site_authorid": "UC-4A4p2rBFtcktdQfWLSgGg",
"authorurl": "http://www.youtube.com/@surendarmohan9878",
"profile_picture": "https://yt3.ggpht.com/ytc/AIdro_kx74ZYm1k0KoPfPQzXHltKaMnmZjeOLGI9rsIGnAgMQ0Ro=s48-c-k-c0x00ffffff-no-rj"
},
"lang": "en",
"langid": "22"
}
}
REST API Json Schema Mapping
Element or Attribute Name | Description | Included in Response? (based on mode parameter) | |
|---|---|---|---|
Basic | Full | ||
Id | Unique ID of the Video (or comment) | x | x |
Title | The Video's title. For Video comments, the title is the same as the comment's text |
| x |
Text | Video description or the text of the comment | x | x |
Text/@truncated | When present, a value of 'true' indicates that the contents in the Text element have been abbreviated. | x | x |
TitleHtml | The HTML representation of the Post's title (or Comment's text). This element is only displayed if body=html or both. |
| x |
TextHtml | The HTML representation of the Post. This element is only displayed if body=html or both. | x | x |
TextHtml/@truncated | When present, a value of 'true' indicates that the contents in the TextHtml element have been abbreviated. | x | x |
ThreadId | A unique identifier of the Thread (i.e. a video post and its related comments) | x | x |
VideoTitle | Same as Title. Deprecated and included only for backwards compatibility. |
| x |
Published | The date/time the Post was published (GMT) | x | x |
Inserted | The date/time the Post was inserted into the BoardReader database (GMT) | x | x |
Crawled | The date/time the Post was harvested by our crawler (GMT). |
| x |
AuthorInfo/Id | The unique identifier for the author |
| x |
AuthorInfo/Url | A link to the author's profile |
| x |
AuthorInfo/Name | The author's name |
| x |
AuthorInfo/SiteAuthorId | The video site's unique identifier for the author |
| x |
AuthorInfo/FirstName | The author's first name |
| x |
AuthorInfo/LastName | The author's last name |
| x |
Url | The URL where the Video or Comment was found (i.e. permalink page). | x | x |
ThumbnailUrl | The URL of the image with Video thumbnail |
| x |
Language | The language of the video or comment text | x | x |
Category | Category of the Video as it appeared on the source site |
| x |
Tags | A string containing the terms used by the video author to "tag" the post. |
| x |
Duration | Duration of the video in seconds |
| x |
ExtKey | A reference ID for the site. |
|
|
IsComment | A Boolean value that specifies if a Post is Comment or not. Values are '0' (false) and '1' (true). |
| x |
SiteInfo/Id | The unique identifier for the Video site where the Post or Comment appeared |
| x |
SiteInfo/Name | The code name of the Video site used by our harvesting system |
| x |
SiteInfo/SiteUrl | The base Url of the Video site. |
| x |
PostSize | The size of the Post in characters | x | x |
VideoStats | The VideoStats object contain statistics about the video. This object is only displayed for API Keys that have Premium Video enabled. |
| X |
VideoStats/ <FavouriteCount> | The favorite count at the time the video’s stats were checked |
| X |
VideoStats/ <NumLikes> | The number of likes at the time the video’s stats were checked |
| X |
VideoStats/ <NumDislikes> | The number of dislikes at the time the video’s stats were checked |
| X |
VideoStats/ <CommentCount> | The number of comments as reported by the video site at the time the video’s stats were checked. |
| X |
VideoStats/ <StatsUpdated> | The date/time when the video's stats were last checked. The value is specified in ISO 8601 (YYYY-MM-DDThh:mm:ss.Z) format. |
| X |
CommentsInThread | Obsolete | x | x |
Mapping with example
Videos Field Name | Field Description | Data Example |
|---|---|---|
Id | Unique ID used in API to identify video | 71330637727320943 |
Title | Title of video on Youtube | Charli xcx - party 4 u (official video) |
Text | Description of video created by author | Charli xcx - party 4 u (official video)\nStream: https://charlixcx.lnk.to/hifn-5yearsAY%5C%5Cn%5C%5CnDirector: Mitch Ryan\n \nManagement: Sam Pringle, Twiggy Rowley, Brandon Creed\n \nCreative Director: Imogene Strauss \n\nAtlantic Records \nVP, Creative: Andrew Reid\nEVP, Marketing: Marisa Aron\nCoordinator Creative: Caroline DeFranco\n \nProduction Company: Cadence Films\nEP/Co-Founder: Lorenzo Ragioneri\nEP/HOP: Jeff Sommar\nProducer: Sarah Park\n \nDirector of Photography: Ben Carey\n1st AC: Pancho Ortiz\n2nd AC: Fred Porras\nDIT: Gianennio Salucci\n \nProduction Designer: Miranda Lorenz\nArt Director: Matt Toth\nProp Master: Jillian Oliver\nSet Decorator: Piper Reilly\nLeadman: June Castillo\nArt PA: Henri Wuilloud \nConstruction: Superior Scenery\nSPFX Coord: Omar Torres\nSPFX: Arnold Peterson\nSPFX: Jeremy Hays\nSPFX: Cody Canel\n \nProduction Supervisor: Nathan Israel\nCoordinator: Denise Nuqui\n1st AD: Josh Montes\n2nd AD: David Henriquez\n \nGaffer: Isaac Marziali\nACLT: Em Shafer \nSLT: Matt Hall\nSLT: James \"Ryan\" Copeland\nSLT: Aron Trejo\nSLT: Harrison Segal\n \nKey Grip: Jake Reeder\nBBG: Charlie McGlinsky\nGrip: Joe Lepp\nGrip: Cale Nichols\nGrip / Driver: Nick Binnette\n \nLocation Manager: Travis Beck\nLocations: Matthew Mehl\nPA - Office: Connor Levett\nPA - Prod Truck: Jesus Arroyo\nPA - Camera Truck: Jeff Cerritos\nPA - AD: Aiden Delgado\nPA - Set: Rogelio Vargas\nPA - Set: Jack Lind\nPA - Set: India Whatley\nPA - Pass Van: Noah Vasquez\nPA - Pass Van: Mateo Caldas\nVTR: Gennadi Balitski\nPlayback: Ignacio Martinez\nCraft Service: Will Schwartz\nMedic: Mike Smith\nMotorhome - Talent Verde: Don Clark\nMotorhome - Production Bravo 35: Josh May\nWater Truck: Mike Bossen\n \nStylist: Chris Horan\nHair Stylist: Matt Benns\nMake Up: Yasmin\n \nEditing Company: Cabin Editing Company\nEditor: Dylan Edwards \nAssistant Editors: Ryan Andrus \nAssistant Editors: Diego Astorga\nEdit Producer: Ginna Schilling \nHead of Production: Michelle Dorsch \nExecutive Producer: Britt Carson \nDirector of Production: Liz Lydecker\nManaging Director: Kim Christensen\nManaging Partner: Carr Schilling \n \nColor House: Ethos Studio \nColorist: Dante Pasquinelli \nProducer: Nat Tereshchenko \nColor assists: Annie Cater \nColor assists: Alexandra Makarenko\nHead of Production: Natasha Sattler \nMD/EP: Eliana Carranza-Pitcher \nFounder/EP: James Drew\n\nText 1-310-861-2831\n\nhttps://charlixcx.com/%5C%5Cnhttps://www.tiktok.com/@charlixcx\nhttps://www.instagram.com/charli_xcx\\nhttps://twitter.com/charli_xcx\\nhttps://www.youtube.com/@officialcharlixcx\\nhttps://facebook.com/charlixcxmusic\\n\\n#Charlixcx#party4u #officialvideo |
ThreadId | Id used to distinguish comments related to this video | yt.agu22bqGHto |
VideoTitle | Title of video on Youtube | Charli xcx - party 4 u (official video) |
Published | Timestamp of when video was published | 2025-05-15 16:00:04 |
Inserted | Timestamp of when video was inserted to index | 2025-05-15 18:54:39 |
Crawled | Timestamp of when video was crawled by system | 2025-05-15 18:54:39 |
AuthorInfo | Parent element that contains author information | “AuthorInfo”:{ |
Id |