How to send data from Amazon Redshift to MadKudu?

How to integrate Amazon Redshift with MadKudu

If your data is hosted in Amazon Redshift, you will need to 

  1. export the data to an S3 bucket with this format

  2. grant MadKudu access to the bucket following these instructions.

and we'll be able to start pulling your data. 

Note: If you're looking to export data for a Segment Warehouse in Redshift, please follow these instructions instead.

 

Exporting your Redshift data to S3

With Redshift, you can UNLOAD data directly to a S3 bucket. For the file format, please follow the same spec as our flat file import.

 

Event

Once you've mapped the data into the expected format, you can export it to S3 with a command like this

UNLOAD ("SELECT event_text, event_timestamp, contact_key, event_attribute_1, event_attribute_2 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/event/event.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

Where:

  • SELECT should be modified to match your column headers names

  • CREDENTIALS â†’ will be provided by MadKudu in the app > integration > S3 (if using our S3 bucket, otherwise, enter the credentials of your bucket)

Please use EXACTLY the same following commands as it tends to create less file corruption issues:

  • HEADER -> include the header

  • DELIMITER '~' â†’ delimits values with a ~(abc~efg~...)

  • ADDQUOTES â†’ add quotes around values (abc → "abc")

  • ESCAPE â†’ escape special characters with quotes

  • MANIFEST â†’ reads multiple files in parallel

  • GZIP â†’ compress result

  • ALLOWOVERWRITE : replace file with new one instead of keeping all files

    • Recommended when

      • files are big

      • customer sending all the table everyday with the increments (versus just the increments)

    • Not recommended (have a pattern, e.g. the date) for the name of the files)

      • If want to keep historic, especially when customer only sends increments (if the pull fails at a moment, then the missed file is not lost)

  • PARELLEL ON : write multiple files at a time to go faster

NOTE: with these settings, every time you execute this command the existing files will be overwritten.

Contact

UNLOAD ("SELECT contact_key, email, attribute_1, attribute_2 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/identify/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;

Account

UNLOAD ("SELECT contact_key, account_key, account_attribute_1, account_attribute_1 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/group/group.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;