BIG DATA GURUKUL | April 2017

MongoDB logs on ELK(Elasticsearch Logstash Kibana)

14. April 2017 bigdata-admin ELK (0)

Prerequisites-

1) ElasticSearch

2) Kibana

3) Logstash

4) Mongodb

5) Set JAVA_HOME

Steps to store MongoDB logs on ElasticSearch and visualize them on Kibana

Step1- Download latest version:

Elasticsearch
Kibana
Logstash

It can be downloaded from “www.elasticsearch.org”

Step2- Run elasticsearch (bin/elasticsearch)

Step3- Create a configuration file in the bin folder of Logstash and save it as “logstash.conf”

logstash.conf

input {

file {

path => "C:\Data\log\filter-mongologs-2017-03-18\mongodb-current.log"

start_position => "beginning"

}

filter {

grok {

match => {message => "%{TIMESTAMP_ISO8601:timestamp} %{MONGO3_SEVERITY:severity} % {MONGO3_COMPONENT:component}%{SPACE}(?:\[%{DATA:context}\])? % {GREEDYDATA:content}.*%{NUMBER:duration}+ms"}

}

mutate {

remove_field => ["message" ,"timestamp" , "tags" , "@version" ]

}

if [component] != "COMMAND" {

drop { }

}

Output :

{

elasticsearch{ hosts => ["localhost:9200"] index => "mongolog7" }

tdout{codec => "rubydebug"}

}

(Add the path of mongodb log file in input of logstash configuration file.)

Step4- Install Mongodb input plugin

> logstash-plugin install logstash-input-mongodb

Step5- Install elasticsearch output plugin

> logstash-plugin install logstash-output-elasticsearch

Step6- In command prompt run the following command under /bin directory

> /bin/logstash –f logstash.conf

It will create an index in elasticsearch.

Step7-

KIBANA (bin/kibana) (in cmd)

It is running on port (localhost:5601)

In console window(Dev Tools) check your index is created or not.

Run – GET (index name)/_search

Step8-

Go to the management window in kibana.
It will ask to configure index pattern.
Configure the index by entering the index name (same index name which is in conf file).

Click on Create.

It will show the following result.

Step9- Go to the “Discover” window in Kibana

Click on New and add index “Index_name”.(refresh the index as per time given in the corner of window if it is giving error while discovering the log)

It will give following output:

Step10- Now Visualize your index.

Go to visualize window and select the parameters.

After selecting the parameters then select the add metrics to visualize your data.
Add the aggregation which you want to display in your visualization.
After adding the aggregation, It will show the following output:

After that save this visualization. Click on the “Save” button on menu bar.

Step11- Create Dashboard in Kibana.

Go to dashboard window. It will give “ Ready to get started?” Window

Then click on “Add” on the upper menu bar.

And then add your saved visualization on Dashboard.

If you want to expand your dashboard to get more information then click on the small arrow at the below corner of the visualization.

By doing this we can load the MongoDB logs into the Elasticsearch using Logstash and then visualize them using Kibana.

MONGODB INSTALLATION ON AWS CLUSTER

13. April 2017 bigdata-admin MONGO DB (0)

Following are step by step commands which can be used to configure MongoDB on AWS EC2 instance. To configure MongoDB we will require:

Red Hat Linux Or Windows(as per your choice)
MongoDB installed via yum
EBS volume for data and log

# Login to aws console.

Here I have Hadoop- cluster with 4 instances. I have selected Red Hat OS.

To install mongodb on Red hat follow the steps below:

#Select one node on which you want mongodb to be installed.

#Create /data/db folder:

>sudo mkdir /data/mongodb

#Createlog directory:

>sudo mkdir /log

#Update installed packages, add the MongoDB yum repo:

echo "[mongodb-org-3.2]

name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.2/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc" |
sudo tee -a /etc/yum.repos.d/mongodb-org-3.2.repo

#Install MongoDB:

>sudo yum -y update && sudo yum install -y mongodb-org-server \ mongodb-org-shell mongodb-org-tools

#Now you have to configure mongodb parameter inside /ect/mongod.conf file

# Change permission of mongod.conf file:

>sudo chmod 777 mongod.conf

#Edit parameters of mongod.conf file:

> vi /etc/mongod.conf

Change following parameters.

path : /log/mongod.log

dbPath : /data/db

Save the file and exit vi editor.

# Start mongodb service

> sudo mongod

It will give you output such as 'waiting for connection on port 27017', which means you have successfully installed mongodb on aws instance.

Now, connect to the MongoDB database using the mongo shell:

>mongo

# Follow the same procedure for all instances.

→ To start replication on instances ←

Make sure you have stopped all running mongod instances.

#Edit mongod.conf file

>vi /etc/mongod.conf

Enable replication and add replSetName and Remove bindIP under network interfaces.

Save the file and exit vi editor.

# Run below command to start instance as replica

>sudo mongod --config /etc/mongod.conf

It will give the below output.

** Follow the same procedure on all mongodb instance.

#Open mongo shell and type below command to initiate replication.

>rs.initiate({_id:”bdg-mongodb”,version:1,members:[{_id:0,host:”bdg-hdp-admin:27017”]}});

To check status:

>rs.status();

#Then add other members as below:

>rs.add(‘bdg-hdp-master:27017’);

Add all other members similarly.

#To add arbiter

Arbiters are mongod instances that are part of a replica set but they do not hold data. Arbiters participate in elections in order to break ties. If a replica set has an even number of members, then add an arbiter.

>rs.addArb(‘bdg-hdp-datanode2:27017’);

Again check status rs.status()

Your mongoDB cluster is up now. You can start working on your database.

Set up AWS Cluster & Configure Hadoop/Hive/Spark/Mongodb using Ambari On Windows Client Machine

13. April 2017 bigdata-admin HADOOP (1)

Prerequisites:  PuttyGen, Putty, Winscp and Account on AWS

-- Amazon Web Services (AWS)
www.aws.amazon.com

--To Download Putty and Putty gen:
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

--To Download WinScp:
https://winscp.net/eng/docs/guide_install

Following are the Steps we need to follow:

Step 1:-  Create Instances on AMAZON EC2

To create 4 Node cluster on AMAZON (Architecture)

Hadoop-Admin Node (Ambari, Hue, Jupyter, Mongo Primary, Tez, Pig, Sqoop)
Hadoop-Master(Spark-c, Mongo-s, Hive-1, Elastic-s)
Hadoop-Datanode1 (mongo-s, spark- m, Elastic-s, Hive-2)
Hadoop-Datanode2 (Mongo Arbital, Spark-C, Elastic-M)

Step 2:- Go to AMAZON Console:

https://aws.amazon.com/console

Step 3:-  Sign in to the Console and go to EC2 instance and then Launch instances as per the requirements

-> How to Launch Instances

Click on launch Instance.
Choose an Amazon Machine Image (AMI):
In our case it is Red Hat Enterprise Linux, you can select as per your requirement.

Choose An Instance Type :- We have selected 8gb RAM (m4large)

Configure Instance details :-

Add Root Volume as shown in the below Screen

Add Tag as per below Screen

Configure Security Group :- (In our Eg. We are using our Existing Security group

Review Instance Launch

In our Eg. We are using Existing key-pair(i.e bdghadoopkey) to launch the instance.

To create a new key pair use this link:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html

Step 4:- Repeat the Step-3 to create another 3 Instances i.e(Hadoop-Master,Hadoop-Datanode1, Hadoop-datanode2)

Step 5:-

After Creating all the instances -

Click on Services(Top Left)
Click on EC2
Click on Instances(top Left Sitemap)
And you will find all the instances as below Screen

(Make Sure you are creating instances on Different Regional Zones)

Step 6:- Attaching extra Volume to the Instances.

In our Eg. We are adding 100 GB of Volume for each Instance.

When we click on Volumes (left side Sitemap) we will see the below screen, where 4 volumes of 20 GB is created by default while creating the instance.

To Attach New Volume to any Instance Create 4 Another Volumes by Clicking on Create Volume(make Sure you are giving availability zone in respect with Instances)

Click on create and follow the same steps to create another 3 volumes for other 3 Instance.
After creating All 4 Volumes, attach those volumes to the Instances (*Newly created volumes are not attached)

Step 7: - Attaching volume to the Instances

Click on Specific Available Volume
Click on attach Volume
Type instance Name
Select Instance
Click on Attach

Repeat above steps to attach another 3 volumes.

Step 8: - Connect your instances with putty

Go to Homepage
Click on S3
Click on keypair that we created E.g:- (Bigdatagurukul)
Left click on Download

Open Puttygen
Click on Load
Upload the downloaded .pem file
Click on Save private key
Upload same .pem file as .ppk File
Overwrite yes

Now Open Putty

Click on SSH (Left side)
Click on Auth
Browse the .ppk File

Step 9: - Configuration in putty

Click on Session
In hostname type ‘ec2-user@ ’ and copy the public IP of Instance from aws which we are going to connect through putty
Give Name to your connection in Save Session and click on save
Click on Open

A New Window will appear

Type ‘df –h’ to see the disk usage in the New Window

To attach the 100 GB Additional Volume Type the command below:

>cat /proc/partitions (this command will show you no of available volume and their actual names)

>sudomkfs -F -t ext4 /dev/xvdf (Formats the partition)
>sudomkdir /data (Create root Directry)
>sudo mount /dev/xvdf /data (mount root Directory)
>sudo vim /etc/fstab (open nd configures the file system)

Add following line in it

/dev/xvdf/data ext4 defaults 00

After mounting the Partitions we can check it by the command: (df -h)
After creating and mounting partitions, we need to configure certain files to communicate through instances into cluster viz (/etc/hosts,/etc/hostname,/etc/sysconfig/network)
Sudo vi /etc/hosts
Now, type private IP's of all instances and give names to those IP's

Type (:wq!) followed by Esc Key, to save and exit the Editor
Type sudo vi /etc/hostname
Now give names to present the instance

Type sudo vi /etc/syconfig/network
Now, put the below configurations

Type sudo vi /etc/cloud/cloud.cfg
Now, set the following configuration

Type sudo reboot
To crosscheck the connection, type the command: (getent hosts)

Repeat step 9 for all the instances
And check if all instances are communicating or not by pinging each other

Mon	Tue	Wed	Thu	Fri	Sat	Sun
25	26	27	28	29	30	31
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	1	2	3	4	5