This tutorial shows how to integrate Upstash Kafka with Apache Pinot
Apache Pinot is a real-time distributed OLAP
(Online Analytical Processing) data store. It aims to make users able to execute
OLAP queries with low latency. It can consume the data from batch data sources
or streaming sources, which can be Upstash Kafka.
Create a Kafka cluster using
Upstash Console or
Upstash CLI by following
Getting Started.Create one topic by following the creating topic
steps. This topic is going to
be source for Apache Pinot table. Let’s name it “transcript” for this example
tutorial.
You need a host to run Apache Pinot. For this quick setup, you can run it on
your local machine.First, download Docker. Running in docker container
is much better option for running Apache Pinot than running it locally.Once you have docker on your machine, you can follow the steps on
Getting Started
run Apache Pinot in docker.In short, you will need to pull the Apache Pinot image by running following
command.
Copy
Ask AI
docker pull apachepinot/pinot:latest
Create a file named docker-compose.yml with the following content.
Go into the directory from your terminal and run the following command to start
Pinot.
Copy
Ask AI
docker-compose --project-name pinot-demo up
Now, Apache Pinot should be up and running. You can check it by running:
Copy
Ask AI
docker container ls
You should see the output like this:
Copy
Ask AI
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESba5cb0868350 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8099/tcp, 9000/tcp pinot-server698f160852f9 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8098/tcp, 9000/tcp, 0.0.0.0:8099->8099/tcp, :::8099->8099/tcp pinot-brokerb1ba8cf60d69 apachepinot/pinot:0.9.3 "./bin/pinot-admin.s…" About a minute ago Up About a minute 8096-8099/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp pinot-controller54e7e114cd53 zookeeper:3.5.6 "/docker-entrypoint.…" About a minute ago Up About a minute 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp pinot-zookeeper
Now, you should add table to your Pinot to store the data streamed from Kafka
topic.You need to open http://localhost:9000/ on your
browser.
Click on “Tables” section.
First, click on “Add Schema” and fill it until you see the following JSON as
your schema config.
Click save and click to “Add Realtime Table” since we will stream the data
real-time.On this page, table name must be the same name with the schema name, which is
“transcript” in this case.Then, go below on this page and replace “segmentsConfig” and “tableIndexConfig”
sections in the table config on your browser with the following JSON. Do not
forget to replace UPSTASH-KAFKA-* placeholders with your cluster information.
Now, let’s create some events to our Kafka topic. Go to Upstash console, click
on your cluster then Topics, click “transcript”. Select Messages tab then click
Produce a new message. Send a message in JSON format like the below:
Now, go back to your Pinot console on your browser. Navigate to “Query Console”
from the left side bar. When you click on “transcript” table, you will see the
result of the following query automatically.