Ingest Data with Lambda

In this chapter we will ingest data in the Kendra index and experience how the Lambda functions work.

We will continue to use the terminal window in the Cloud9 environment for commandline. Please ensure that you are working in the (~/environment) directory.

Prepare Test Data

  1. We will first get the test data by running the following on the command line:
curl "https://d3v65tjf58jeni.cloudfront.net/KendraData.tgz" -o Data.tgz
  1. Verify that the data is correct using the following command:
tar ztvf Data.tgz
  1. In the output of the above command you should see a list of subfolders and files in a folder called Data.
  2. Extract the data using the command below. Note that this will create a folder called Data in your current directory.
tar zxvf Data.tgz
  1. We are going to upload the Data folder along with all the subfolders and files to the S3 bucket which we have configured as the datasource of the Kendra index. As each file gets copied to the S3 bucket, it will trigger the Lambda event handler, which in turn will make an API call to add that file in the Kendra index with appropriate FileType custom attribute as well as user context attributes for access management based on the subfolders which the file belongs. Use the following command to take care of this. Don’t forget to replace the bucket-name below with the one you have configured as a datasource:
aws s3 cp Data/ s3://kendra-poc-ds-UNIQUE-SUFFIX/Data/ --recursive

Verify Lambda Execution for Data (Documents)

Open the Lambda management console and browse to the lambda function we created earlier and click on the Monitoring tab.

Lambda Monitoring

Click on View logs in CloudWatch and you will see the list of CloudWatch Log Groups as below

CloudWatch Log Groups

Click on one of the Log Groups and expand one of the log entries. It has the information we are logging from the lambda function such as the details of event being handled as well as the result of the batch-put-index API call to ingest the file in the Kendra index.

CloudWatch Logs
This confirms that the Lambda event handler has worked successfully.

Try out searches on the documents we ingested in Kendra

Open the Kendra management console and browse to the Kendra index. Click on Search Console on the left side.

Kendra Console
Type something interesting such as “Amazon Kendra” in the search window and check out the results. Play with the facets on the left side. You can try a few more searches.
Kendra Search

Verify Lambda Execution for Images using the command below

  1. Copy the image file to be ingested
curl "https://d3v65tjf58jeni.cloudfront.net/yellowstone.png" -o yellowstone.png
  1. Copy the image file to the Images folder in the S3 bucket that is configured as the datasource for the Kendra index. Don’t forget to replace the bucket name below with the one that you have configured.
aws s3 cp yellowstone.png s3://kendra-poc-ds-UNIQUE-SUFFIX/Images/
  1. Verify Lambda execution the similar to how we verified it for documents
  2. In the Kendra index search console try out the query “Where is yellowstone?” and you should see the results, click on the link and you should see the results as the images below
    Kendra yellowstone
    Kendra images

Congratulations! We now have a working Kendra index ingested with data. Let’s move forward to develop a web application to perform the searches.