Objective of this article

This article describes how you can send us a one-off drop of historical data from Segment, if you're not on the Segment's Business Plan.

If you are on the Business plan or have not connected Segment yet, please refer to the documentation of our Segment integration.

To export data from a non-Segment Redshift warehouse, please follow these instructions.

How to transfer the data 

There are two options to give us access to your historical Segment data:

(If you use another type of Segment Warehouse (Postgres, BigQuery), you can still integrate with us. Please follow the instructions to export data to S3)

If you did not enable any of those destinations before integrating with MadKudu, we might not be able to get your historical data immediately. Make sure to connect Segment to MadKudu so we can start accumulating behavioral data.

Option 1: Segment S3 destination

If you've been sending Segment data to a S3 destination in your AWS account, the easiest solution is to grant us access to that bucket

Option 2: Segment Redshift warehouse

If you're using a warehouse but not an S3 destination, you will first need to "unload" the data from Redshift into an S3 bucket.

For the bucket, there are two options:

Regardless of the option, you choose, the idea is then to use an UNLOAD  statement in Redshift to transfer the data to S3, from where we can reload it.

If you're using your bucket:

UNLOAD ("SELECT ... FROM [your_table]")
TO 's3://your_bucket/your_folder/project/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'

(where the credentials are the ones you maintain internally)

If you're using our bucket:

UNLOAD ("SELECT ... FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/project/track.csv'
CREDENTIALS 'aws_access_key_id=[our_access_key];aws_secret_access_key=[our_secret_key]'

Tracks

For unloading track data, there are also two options:

  1. The event properties, are not important, in which case you can just UNLOAD the track table
UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at FROM [source].tracks")
TO 's3://madkudu-data-in-[your_org_id]/[source]/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

2. The event properties are important, in which case, you need to do an UNLOAD per event table:

UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at, [properties] FROM [source].[event]")
TO 's3://madkudu-data-in-[your_org_id]/[source]/[event].csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

where [properties] is the list of properties you want to send us, separated by commas.
For example: SELECT id, ..., product_name, referrer FROM <source>.added_to_cart  

Identifies

UNLOAD ("SELECT id, anonymous_id, user_id, sent_at, [traits] FROM <source>.identifies")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

where [traits] is the list of properties you want to send us, separated by commas.

Groups

UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [traits] FROM <source>.groups")
TO 's3://madkudu-data-in-[your_org_id]/[source]/page.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

where [traits] is the list of properties you want to send us, separated by commas.

Pages [optional]

Pages can be quite voluminous. Only send them to us if your page tracking contains important information that is not included in track calls.

UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [properties] FROM <source>.<event>")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

For [properties], we usually recommend at least title   and url .

Did this answer your question?