BIG DATA GURUKUL

Open the world of Knowledge

MongoDB logs on ELK(Elasticsearch Logstash Kibana)

Prerequisites-

1) ElasticSearch

2) Kibana

3) Logstash

4) Mongodb

5) Set JAVA_HOME


Steps to store MongoDB logs on ElasticSearch and visualize them on Kibana


Step1- Download latest version:


  • Elasticsearch
  • Kibana
  • Logstash

  It can be downloaded from “www.elasticsearch.org”


Step2- Run elasticsearch (bin/elasticsearch)


           



Step3-   Create a configuration file in the bin folder of Logstash and save it as “logstash.conf”


logstash.conf

      

       input {

                      file {

                              path => "C:\Data\log\filter-mongologs-2017-03-18\mongodb-current.log"

                              start_position => "beginning"

                            }

                  }

          filter {

                    grok {

                              match => {message => "%{TIMESTAMP_ISO8601:timestamp} %{MONGO3_SEVERITY:severity} %                                                             {MONGO3_COMPONENT:component}%{SPACE}(?:\[%{DATA:context}\])? %                                                                                 {GREEDYDATA:content}.*%{NUMBER:duration}+ms"}

                            }

               mutate {

                              remove_field => ["message" ,"timestamp" , "tags" , "@version" ]

                           }

                   

                   if  [component] != "COMMAND"  {

                                                                          drop { }

                                                                        }

                 }


Output :


             {

                   elasticsearch{ hosts => ["localhost:9200"] index => "mongolog7" }

                                          tdout{codec => "rubydebug"}

                }

 


(Add the path of mongodb log file in input of  logstash configuration file.)



Step4-  Install Mongodb input plugin


                           >  logstash-plugin install logstash-input-mongodb



                 



Step5-   Install elasticsearch output plugin


                      >  logstash-plugin install logstash-output-elasticsearch



                

 


Step6- In command prompt run the following command under /bin directory


                      > /bin/logstash –f logstash.conf



                

   It will create an index in elasticsearch.

 

Step7- 


            KIBANA (bin/kibana)  (in cmd)

            It is running on port (localhost:5601)

            In console window(Dev Tools) check your index is created or not.

            Run  – GET  (index name)/_search



               



Step8- 


  • Go to the management window in kibana
  • It will ask to configure index pattern. 
  • Configure the index by entering the index name (same index name which is in conf file).

                


     Click on Create.

     It will show the following result.


            

              



Step9- Go to the “Discover” window in Kibana

 

       Click on New and add index “Index_name”.(refresh the index as per time given in the corner of window if it is giving error while discovering the log)

 


               



It will give following output:


               


Step10- Now Visualize your index.


          Go to visualize window and select the parameters.


               



  • After selecting the parameters then select the add metrics to visualize your data.
  • Add the aggregation which you want to display in your visualization.
  • After adding the aggregation, It will show the following output:

              


     After that save this visualization. Click on the “Save” button on menu bar.


Step11- Create Dashboard in Kibana.

              

            Go to dashboard window. It will give “ Ready to get started?” Window


           



Then click on “Add” on the upper menu bar.

And then add your saved visualization on Dashboard.



            



       If you want to expand your dashboard to get more information then click on the small arrow at the below corner of the visualization.


             



By doing this we can load the MongoDB logs into the Elasticsearch using Logstash and then visualize them using Kibana.


MONGODB INSTALLATION ON AWS CLUSTER

Following are step by step commands which can be used to configure MongoDB on AWS EC2 instance. To configure MongoDB we will require:


  1.            Red Hat Linux Or Windows(as per your choice)
  2.            MongoDB installed via yum
  3.            EBS volume for data and log

# Login to aws console.

Here I have Hadoop- cluster with 4 instances. I have selected Red Hat OS.


To install mongodb on Red hat follow the steps below:


#Select one node on which you want mongodb to be installed.

 

  #Create /data/db folder:

      >sudo mkdir /data/mongodb


  #Createlog directory:

      >sudo mkdir /log


  #Update installed packages, add the MongoDB yum repo:

       echo "[mongodb-org-3.2]

       name=MongoDB Repository
       baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.2/x86_64/
       gpgcheck=1
       enabled=1
       gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc" |
       sudo tee -a /etc/yum.repos.d/mongodb-org-3.2.repo

 #Install MongoDB:
        >sudo yum -y update && sudo yum install -y mongodb-org-server \ mongodb-org-shell mongodb-org-tools

 #Now you have to configure mongodb parameter inside /ect/mongod.conf file

 

         


  Change permission of mongod.conf file:

           >sudo chmod 777 mongod.conf

 

       


  #Edit parameters of mongod.conf file:

            > vi /etc/mongod.conf

 

      Change following parameters.

        path : /log/mongod.log

       dbPath : /data/db


       


        Save the file and exit vi editor.

 

  # Start mongodb service

          > sudo mongod

 

 It will give you output such as 'waiting for connection on port 27017', which means you have successfully installed mongodb on  aws instance.


Now, connect to the MongoDB database using the mongo shell:

          >mongo

 

   # Follow the same procedure for all instances.

 

            → To start replication on instances ←

 

    Make sure you have stopped all running mongod instances.

 

       #Edit mongod.conf file

            >vi /etc/mongod.conf


    Enable replication and add replSetName and Remove bindIP under network interfaces.

 

           


           Save the file and exit vi editor.


     # Run below command to start instance as replica

              >sudo mongod --config /etc/mongod.conf

 

       It will give the below output.


           


  ** Follow the same procedure on all mongodb instance.

 

    #Open mongo shell and type below command to initiate replication.

              >rs.initiate({_id:”bdg-mongodb”,version:1,members:[{_id:0,host:”bdg-hdp-admin:27017”]}});

 

        To check status:

              >rs.status();

  

      #Then add other members as below:

              >rs.add(‘bdg-hdp-master:27017’);


              


              Add all other members similarly.

  

       #To add arbiter

 

         Arbiters are mongod instances that are part of a replica set but they do not hold data. Arbiters participate in elections in              order to break ties. If a replica set has an even number of members, then add an arbiter.

                  >rs.addArb(‘bdg-hdp-datanode2:27017’);

 

         Again check status rs.status()


           


            Your mongoDB cluster is up now. You can start working on your database.

Set up AWS Cluster & Configure Hadoop/Hive/Spark/Mongodb using Ambari On Windows Client Machine

Prerequisites:  PuttyGen, Putty,  Winscp and Account on AWS


      -- Amazon Web Services (AWS)

         www.aws.amazon.com

 

      --To Download Putty and Putty gen:

         http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

 

      --To Download WinScp:

         https://winscp.net/eng/docs/guide_install



Following are the Steps we need to follow:


Step 1:-  Create Instances on AMAZON  EC2


   To create 4 Node cluster on AMAZON (Architecture)

 

  1. Hadoop-Admin Node (Ambari, Hue, Jupyter, Mongo Primary, Tez, Pig, Sqoop)
  2. Hadoop-Master(Spark-c, Mongo-s, Hive-1, Elastic-s)
  3. Hadoop-Datanode1 (mongo-s, spark- m, Elastic-s, Hive-2)
  4. Hadoop-Datanode2 (Mongo Arbital, Spark-C, Elastic-M)


Step 2:-  Go to AMAZON Console:  


      https://aws.amazon.com/console

 


Step 3:-  Sign in to the Console and go to EC2 instance and then Launch instances as per the                          requirements


     -> How to Launch Instances

     


  •  Click on launch Instance.
  •  Choose an Amazon Machine Image (AMI): 
               In our case it is Red Hat Enterprise Linux, you can select as per your requirement.
      
     

  • Choose An Instance Type :- We have selected 8gb RAM (m4large)
        


  • Configure Instance details :- 
         


  • Add Root Volume as shown in the below Screen
         


  • Add Tag as per below Screen
         


  • Configure Security Group :- (In our Eg. We are using our Existing Security group
         


  • Review Instance Launch
          


  • In our Eg. We are using Existing key-pair(i.e bdghadoopkey) to launch the instance.

  • To create a new key pair use this link:
                     http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html

          


Step 4:-  Repeat the Step-3 to create another 3 Instances i.e(Hadoop-Master,Hadoop-Datanode1,                    Hadoop-datanode2)


Step 5:-  

    

     After Creating all the instances -

  • Click on Services(Top Left)
  • Click on EC2
  • Click on Instances(top Left Sitemap)
  • And you will find all the instances as below Screen

    (Make Sure you are creating instances on Different Regional Zones)


          



Step 6:-  Attaching extra Volume to the Instances.  


       In our Eg. We are adding 100 GB of Volume for each Instance.

       
       When we click on Volumes (left side Sitemap) we will see the below screen, where 4 volumes of 20 GB  is created by                  default while creating the instance.

         


  • To Attach New Volume to any Instance Create 4 Another Volumes by Clicking  on Create Volume(make Sure you are giving availability zone in respect with Instances)
 
         


  • Click on create and follow the same steps to create another 3 volumes for other 3 Instance.
  • After creating All 4 Volumes, attach those volumes to the Instances (*Newly created volumes are not attached)

Step 7: -   Attaching volume to the Instances

          

  • Click on Specific Available Volume
  • Click on attach Volume
  • Type instance Name
  • Select Instance
  • Click on Attach
    Repeat above steps to attach another 3 volumes.

Step 8: -  Connect your instances with putty

  • Go to Homepage
  • Click on S3
  • Click on keypair that we created E.g:- (Bigdatagurukul)
  • Left click on Download

         


  • Open Puttygen
  • Click on Load
  • Upload the downloaded .pem file
  • Click on Save private key
  • Upload same .pem file as .ppk File
  • Overwrite yes

          


   Now Open Putty


  • Click on SSH  (Left side)
  • Click on Auth
  • Browse the .ppk File

Step 9: -   Configuration in putty

          


  • Click on Session
  • In hostname type ‘[email protected] ’ and copy the public IP of Instance from aws which we are going to connect through putty
  • Give Name to your connection in Save Session and click on save
  • Click on Open

         


         

  • A New Window will appear

  • Type ‘df –h’  to see the disk usage in the New Window

        


  • To attach the 100 GB Additional Volume Type the command below:
                     >cat /proc/partitions  (this command will show you no of available volume and their actual names)

       


                     >sudomkfs  -F  -t ext4 /dev/xvdf     (Formats the partition)
                     >sudomkdir /data       (Create root Directry)
                     >sudo mount /dev/xvdf  /data    (mount root Directory)
                     >sudo vim /etc/fstab (open nd configures the file system)
              Add following line in it
                      /dev/xvdf/data         ext4 defaults  00


         


  • After mounting the Partitions we can check it by the command: (df -h)
  • After creating and mounting partitions, we need to configure certain files to communicate through instances into cluster  viz (/etc/hosts,/etc/hostname,/etc/sysconfig/network)
  • Sudo vi /etc/hosts
  • Now, type private IP's of all instances and give names to those IP's

        


  • Type (:wq!) followed by Esc Key, to save and exit the Editor
  • Type sudo vi /etc/hostname
  • Now give names to present the instance

       


  • Type sudo vi /etc/syconfig/network
  • Now, put the below configurations

            


  • Type sudo vi /etc/cloud/cloud.cfg
  • Now, set the following configuration

             

  • Type sudo reboot
  • To crosscheck the connection, type the command: (getent hosts)

          


  • Repeat step 9 for all the instances
  • And check if all instances are communicating or not by pinging each other