Setup TwitterMap

This page includes instructions on how to use Cloudberry and AsterixDB to setup a small instance of TwitterMap on a local machine. The following diagram illustrates its architecture: architecture

System requirements:

  • Linux or Mac
  • At least 4GB memory

0. Setup the environment:

Step 0.1: Follow these instructions to install sbt.

Step 0.2: Clone the Cloudberry codebase from github.

~> git clone

Suppose the repostory is cloned to the folder ~/cloudberry.

1. Setup AsterixDB

Step 1.1: Create a directory named asterixdb in your home directory and move to that directory:

$ mkdir asterixdb
$ cd asterixdb

Step 1.2: Download from this link:

$ wget

Step 1.3: Uncompress the file:

$ unzip

Step 1.4: Move to opt/local/bin directory.

$ cd opt/local/bin

Step 1.5: Execute to start the sample instance. Wait until you see “INFO: Cluster started and is ACTIVE.” message.

$ ./

INFO: Starting sample cluster...
INFO: Waiting up to 30 seconds for cluster to be available.
INFO: Cluster started and is ACTIVE.

Step 1.6: Execute jps to check one instance of “CCDriver” and two instances of “NCService” and “NCDriver” are running:

$ jps
59264 NCService
59280 NCDriver
59265 CCDriver
59446 Jps
59263 NCService
59279 NCDriver

Step 1.7: Open the AsterixDB Web interface at http://localhost:19001 and issue the following queries to see the AsterixDB instance is running.

select * from Metadata.`Dataverse`;
{ "Dataverse": { "DataverseName": "Default", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp": 0 } }
{ "Dataverse": { "DataverseName": "Metadata", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp": 0 } }

Note: You need to execute the following command to stop AsterixDB on asterixdb/opt/local/bin before you shutdown the system.

$ ./

Next time when you want to start/stop your AsterixDB instance, use the following command.

$ ~/asterixdb/opt/local/bin/
$ ~/asterixdb/opt/local/bin/

2. Setup Cloudberry and TwitterMap:

Step 2.1: Compile and run the Cloudberry server.

~/cloudberry> cd cloudberry
~/cloudberry> sbt compile
~/cloudberry> sbt "project neo" "run"

Wait until the shell prints the messages shown in the following screenshot: neo

Step 2.2: Open another terminal window to ingest sample tweets (about 47K) and US population data into AsterixDB.

~/cloudberry> cd ../examples/twittermap
~/twittermap> ./script/

When it finishes you should see the messages as shown in the following screenshot: ingestion

Step 2.3: Start the TwitterMap Web server (in port 9001) by running the following command in another shell:

~/twittermap> sbt "project web" "run 9001"

Wait until the shell prints the messages shown in the following screenshot: twittermap

Step 2.4: Open a browser to access http://localhost:9001 to see the TwitterMap frontend. The first time you open the page, it could take up to several minutes (depending on your machine’s speed) to show the following Web page: web

Congratulations! You have successfully set up TwitterMap using Cloudberry and AsterixDB!

3. Under the Hood

Next we explain the details of the TwitterMap.

Step 3.1 Create a Dataset in AsterixDB

In Step 2.2, we ran a script called examples/twittermap/script/ to create data sets in AsterixDB and ingest data into them. The following are the executed DDL statements.

create dataverse twitter if not exists; 
use twitter; 
create type typeUser if not exists as open { 
    id: int64, 
    name: string, 
    screen_name : string, 
    profile_image_url : string?, 
    lang : string, 
    location: string, 
    create_at: date, 
    description: string, 
    followers_count: int32, 
    friends_count: int32, 
    statues_count: int64 
create type typePlace if not exists as open{ 
    country : string, 
    country_code : string, 
    full_name : string, 
    id : string, 
    name : string, 
    place_type : string, 
    bounding_box : rectangle 
create type typeGeoTag if not exists as open { 
    stateID: int32, 
    stateName: string, 
    countyID: int32, 
    countyName: string, 
    cityID: int32?, 
    cityName: string? 
create type typeTweet if not exists as open{ 
    create_at : datetime, 
    id: int64, 
    text: string, 
    in_reply_to_status : int64, 
    in_reply_to_user : int64, 
    favorite_count : int64, 
    coordinate: point?, 
    retweet_count : int64, 
    lang : string, 
    is_retweet: boolean, 
    hashtags :  ?, 
    user_mentions :  ? , 
    user : typeUser, 
    place : typePlace?, 
    geo_tag: typeGeoTag 
create dataset ds_tweet(typeTweet) if not exists primary key id 
with filter on create_at with {"merge-policy":{"name":"prefix","parameters":{"max-mergable-component-size":134217728, "max-tolerance-component-count":5}}}; 

create index text_idx if not exists on ds_tweet(text) type fulltext; 

Read this page about the details.

The script uses a feature called Feed to ingest tweets into AsterixDB. The following statements create a feed called TweetFeed:

create feed TweetFeed with { 
    "adapter-name" : "socket_adapter", 
    "sockets" : "asterix_nc1:10001", 
    "address-type" : "nc", 
    "type-name" : "typeTweet", 
    "format" : "adm", 
    "upsert-feed" : "false" 

connect feed TweetFeed to dataset ds_tweet; 
start feed TweetFeed; 

The following shell command ingests the data from a local file with tweets into AsterixDB using the defined TweetFeed:

gunzip -c ./script/sample.adm.gz | ./script/ 10001

For more information about AsterixDB data feed, please refer to this page.

Step 3.2 Setup TwitterMap Web Server

The TwitterMap Web application uses the Play Framework to talk to the Cloudberry service. The configuration of the framework is in the file examples/twittermap/web/conf/application.conf. In the file, the cloudberry.register property specifies the HTTP API of Cloudberry:

cloudberry.register = "http://CLOUDBERRY-HOST-NAME:CLOUDBERRY-PORT/admin/register"

When the TwitterMap server starts, it will run twittermap/web/app/controllers/TwitterMapApplication.scala, which will run twittermap/web/app/model/Migration_20170428.scala. This script registers four data sets to the Cloudberry server. They are:

  • twitter.ds_tweet
  • twitter.dsStatePopulation
  • twitter.dsCountyPopulation
  • twitter.dsCityPopulation

Cloudberry will talk to AsterixDB to collect information about these data sets. Take the data set twitter.ds_tweet as an example. The script sends the following DDL request to Cloudberry to register the information about this data set in AsterixDB.


In the configuration file twittermap/web/conf/application.conf, the property tells the front-end the Web socket address of the Cloudbery server. = "ws://CLOUDBERRY_HOST_NAME:CLOUDBERRY_PORT/ws"

The frontend uses the web socket to communicate with the Cloudberry server directly. The corresponding logic can be found in twittermap/web/public/javascripts/common/services.js file.

For more information about how to write registration DDL and Cloudberry request please refer to this page.

4. Build your own application

TwitterMap is one example of how to use Cloudberry. To develop your own application, you can do the following steps:

  1. Use AsterixDB to create your own data sets;
  2. Register the necessary datasets into Cloudberry as in Step 3.2;
  3. Set up the Web socket connection between the front-end web page and the Cloudberry server as in Step 3.2;
  4. Define your queries and responses as in twittermap/web/public/javascripts/common/services.js.

Have fun! If you need assistance, please feel to contact us at