Segment - How to send historical data to MadKudu via a Segment Replay

Pre-requisites

When connecting Segment and MadKudu, Segment will start sending data to MadKudu starting from the day you activated the MadKudu + Segment integration. To train and monitor models, MadKudu will need at least 9 months of historical data as a one-off drop. 

If you are on a Business Plan or higher with Segment, you'll need to launch a Replay from your Segment Account to trigger a transfer of your historical Segment data source to MadKudu. Learn more about Segment Replay. 

If you are not on a Business Plan with Segment, please refer to this article.

Replays are not self-serve in Segment so you'll need to contact their team directly to request a replay specifying the following

  • from your workspace source(s): your website, app backend and/or front end... Please make sure it includes Identify, Track, Page, Group depending on what is needed. 

    • Name

    • SourceID

  • to the MadKudu destination

  • for the time period of 9 months

    • Start date: 9 months before the day you've connected Segment and MadKudu

    • End date: current time

  • All events or only a subset?

    • If you include too many events, there might be delays happening in the data transfer, so we advise you to curate the list of events you want to send to MadKudu.

    • Make sure to only send App usage data from Segment, not webpage visits or hand-raising events if they can be tracked elsewhere (from SFDC campaigns or Marketo for example).

    • Make sure MadKudu doesn’t pull Non-user activities: system events that are not actions performed by users (like enrichment_provider).

    • Make sure MadKudu doesn’t pull low-value activities: any notification, background event, system event, any event performed lots of times but with a low value (example: clicking an unimportant button).

    • Make sure MK doesn’t pull obsolete events that didn't happen in the past 90 days.

    • Please remove all these activities from your Segment Replay.

Important note: Depending on the volume of data to transfer, it may take from a few hours to a few weeks (>500M records) given the Segment API rate limit of the Replay to S3 and the limit rate of the pull from S3 to MadKudu. We recommend to only send the Segment events necessary to configure MadKudu and to filter out events with no relevance to your behavioral models and with high volumes (e.g. non-user events, very minor events, etc.), otherwise your implementation may be delayed until we get the full history of data. The Segment rate that we recommend ranges between 80 to 150 events per second. Please refer to this documentation or consult our implementation team at success@madkudu.com to understand what are relevant events versus irrelevant events to configure your scoring.  

mceclip0.png