Setup TwitterMap
This page includes instructions on how to use Cloudberry and AsterixDB to setup a small instance of TwitterMap on a local machine. The following diagram illustrates its architecture: System requirements:
- Linux or Mac
- At least 4GB memory
- (If using Virtual Machine) At least 2 CPUs
0. Install Java and sbt:
Follow these instructions to install Java
and sbt
.
1. Setup AsterixDB
Step 1.1: Move to your home directory:
$ cd ~
Step 1.2: Download asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip
:
$ wget http://cloudberry.ics.uci.edu/img/asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip
Step 1.3: Uncompress the file:
$ unzip asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip
Step 1.4: Move to apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin
directory.
$ cd apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/
Step 1.5: Execute start-sample-cluster.sh
to start the sample instance. Wait until you see “INFO: Cluster started and is ACTIVE.” message.
$ ./start-sample-cluster.sh
CLUSTERDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/opt/local
INSTALLDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/
LOGSDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/logs
Using Java version: 1.8.0_XX
INFO: Starting sample cluster...
Using Java version: 1.8.0_XX
INFO: Waiting up to 30 seconds for cluster 127.0.0.1:19002 to be available.
INFO: Cluster started and is ACTIVE.
Step 1.6: Execute jps
to check one instance of “CCDriver” and two instances of “NCService” and “NCDriver” are running:
$ jps 59264 NCService 59280 NCDriver 59265 CCDriver 59446 Jps 59263 NCService 59279 NCDriver
Step 1.7: Open the AsterixDB Web interface at http://localhost:19001 and issue the following query to see the AsterixDB instance is running.
Query:
select * from Metadata.`Dataverse`;
Expected result:
{ "Dataverse": { "DataverseName": "Default", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp":0}} { "Dataverse": { "DataverseName": "Metadata", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp":0}}
Note: When you want to stop AsterixDB, use the following command:
$ cd ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin $ ./stop-sample-cluster.sh
Next time when you want to start/stop your AsterixDB instance, use the following command.
$ ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/start-sample-cluster.sh $ ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/stop-sample-cluster.sh
2. Setup Cloudberry and TwitterMap:
Step 2.1: Clone the Cloudberry Github repository.
~> git clone https://github.com/ISG-ICS/cloudberry.git
Suppose the repository is cloned to the folder
~/cloudberry
.
Step 2.2: Compile and run the Cloudberry server.
~/cloudberry> cd ~/cloudberry/cloudberry ~/cloudberry> sbt compile ~/cloudberry> sbt "project neo" "run"
Note: if you see errors like the following:
[ERROR] Failed to construct terminal; falling back to unsupported java.lang.NumberFormatException: For input string: "0x100" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.valueOf(Integer.java:766) ... ...it’s due to the compatibility of some versions of
sbt
, do the following:Add
export TERM=xterm-color
to the top of/usr/share/sbt/bin/sbt
.Now the errors above should be gone. And you can continue this guide. If this doesn’t solve the above errors, please refer to this discussion to try other solutions: https://stackoverflow.com/questions/44317384/sbt-error-failed-to-construct-terminal-falling-back-to-unsupported
Wait until the shell prints the messages shown as following:
$ sbt "project neo" "run" [info] Loading global plugins from /Users/white/.sbt/0.13/plugins [info] Loading project definition from /Users/white/cloudberry/cloudberry/project [info] Set current project to cloudberry (in build file:/Users/white/cloudberry/cloudberry/) [info] Set current project to neo (in build file:/Users/white/cloudberry/cloudberry/) --- (Running the application, auto-reloading is enabled) --- [info] p.c.s.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000 (Server started, use Ctrl+D to stop and go back to the console...)
Step 2.3: Download and ingest the synthetic sample tweets (about 1M) data into AsterixDB.
Open a new terminal window
(1) Download the synthetic sample tweets (about 1M) data:
~/cloudberry> cd ../examples/twittermap/script/ ~/script> wget http://cloudberry.ics.uci.edu/img/sample.adm.gz
(2) Ingest the data into AsterixDB.
~/script> cd .. ~/twittermap> ./script/ingestAllTwitterToLocalCluster.sh
When it finishes you should see the messages shown as following:
Socket 127.0.0.1:10005 - # of ingested records: 260000 Socket 127.0.0.1:10005 - # of total ingested records: 268497 >>> # of ingested records: 268497 Elapsed (s) : 2 (m) : 0 record/sec : 134248.5 >>> An ingestion process is done. [success] Total time: 3 s, completed Nov 19, 2018 8:44:51 PM Ingested city population dataset.
Step 2.4: Start the TwitterMap Web server (in port 9001) by running the following command in another shell:
~/twittermap> sbt "project web" "run 9001"
Wait until the shell prints the messages shown as following:
$ sbt "project web" "run 9001" [info] Loading global plugins from /Users/white/.sbt/0.13/plugins ... --- (Running the application, auto-reloading is enabled) --- [info] p.c.s.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 (Server started, use Ctrl+D to stop and go back to the console...)
Step 2.5: Open a browser to access http://localhost:9001 to see the TwitterMap frontend. The first time you open the page, it could take up to several minutes (depending on your machine’s speed) to show the following Web page:
(Note: Firefox users have to go to about:config
and change privacy.trackingprotection.enabled
to false
)
Congratulations! You have successfully set up TwitterMap using Cloudberry and AsterixDB!