Lab 1: Introduction to Kibana

Objective: In this lab, you will learn how to create an Index Pattern. The lab environment for this training consists of a single-node Elasticsearch cluster running on server1 and a single Kibana instance also running on server1 as shown in the following diagram:

lab01 architecture

You will create two index patterns for the user_messages index and one index pattern for the users index in Elasticsearch:

lab01 architecture index patterns

  1. Click to view your Kibana instance.

  2. If this is the first time you are accessing your Kibana instance, a "Welcome to Kibana" page will be displayed. To continue, click on the "Explore on my own" link.

    lab01 kibana welcome

  3. Your Elasticsearch cluster has an index named user_messages. Create an index pattern in Kibana that satisfies the following requirements:

    • The index pattern is user_messages* (note you will be using a wildcard * in the name)

    • The Time Filter field name is date

    Once the index pattern is created, you will see a table with the 46 fields of the index, along with details like Type, Format, and whether or not the field is Searchable or Aggregatable. In the next lab, you will see how to browse the documents in the user_messages index.

  4. Create another index pattern in Kibana that satisfies the following requirements:

    1. The index pattern is user_messages (notice this time you are not using a wildcard * in the name)

    2. Do not select a Time Filter field for this index pattern

  5. Create a third index pattern in Kibana that satisfies the following requirements:

    1. The name of the index pattern is user_*

    2. Select the date field as the Time Filter field

    You should now see 3 index patterns listed on the Index Patterns page of Kibana.

  6. You can manage your index patterns in Kibana in various ways. For example, you can choose a default index pattern, refresh the field list, or delete an index pattern. To demonstrate these tasks, complete the following steps:

    1. Click on the index pattern user_* , then click on the star icon in the upper-right corner. This will denote user_* as your choice for the default index, which can save you some mouse-clicks when building visualizations for your favorite indices

    2. If your index in Elasticsearch is updated with new fields or different data types, you can click on the "Refresh field list" icon (next to the star icon) to refresh an index pattern. Click on the icon now to refresh your user_* index pattern

    3. You can delete index patterns by clicking on the garbage can icon. Delete your user_* index pattern

    4. Make user_messages your new default index pattern

  7. Data comes in many shapes and forms. Index Patterns are useful for exploring the different attributes of your data. Let’s look at an example. Consider the following data from a spreadsheet:

    id first name last name occupation salary hashtags likes

    1

    Bill

    Smith

    Marketing

    110000

    dogs, cute

    124

    2

    Samatha

    Lee

    Engineer

    120000

    sunday, relax

    98

    In a spreadsheet, data are in a table. In Elasticsearch, data are in an index! Before you can index a document into Elasticsearch, it must be in the JavaScript Object Notation (JSON). JSON documents look like the following:

    # Document with the id 1
    {
      "first_name": "Bill",
      "last_name": "Smith",
      "salary": 110000,
      "occupation": "Marketing",
      "hashtags": [
        "dogs", "cute"
        ],
      "likes": 124
    }
    
    
    # Document with the id 2
    {
      "first_name": "Samantha",
      "last_name": "Lee",
      "occupation": "Engineer",
      "salary": 120000,
      "hashtags": [
        "sunday", "relax"
        ],
      "likes": 98
    }

    An attribute (or a column) in a table becomes a field in an Elasticsearch document. In this simple example, our data have only a few fields, but it is not unusual to have hundreds of fields in your documents! Index Patterns enable you to explore the names and types of the fields in your indices.

    To demonstrate, click on your user_messages index pattern. How many fields does the user_messages index have?

    There are 46 fields

  8. Page through all the fields using the pagination arrows or increase the number of rows displayed per page.

  9. Search for the field geo.location. Notice this field is Searchable and Aggregatable. This means you can search and build visualizations using this field.

  10. Search for the field user.occupation. It is searchable, but not aggregatable. This means that you cannot build visualizations from this field.

  11. Create an index pattern for the users index.

    There are many patterns that can find the users index – users*, user*, u*, just to name a few. But notice that an index pattern like user* will match the index user_messages as well, so you have to be careful which indices you really want for your index pattern. If you want your index pattern to match one index, use the full name of the index without a wildcard.

    users

Summary: You learned how to create index patterns in Kibana. You have two index patterns that are pointing to the same Elasticsearch index: one that uses the time filter and one that does not. You should also have an index pattern that is pointing to the users index.

End of Lab 1


Lab 2: Discover Interface

Objective: The Discover interface allows you to explore and discover your data inside Elasicsearch. In this lab you will see how this interface is used to explore your datasets.

  1. You currently have 2 index patterns that point to the same data. One does not have a time filter (user_messages), while the other (user_messages*) does have a time filter configured.

    1. From the ManagementIndex Patterns page, select user_messages* as the default index pattern.

    2. To view the Discover interface, click on the Discover link in the left-hand toolbar of Kibana.

    3. On the left-hand side panel you should see user_messages*, because it is your default index pattern. Notice there no documents to view.

    4. Click on user_messages* and change the index pattern to user_messages. Now you will see a list of documents.

    5. What are the differences between the two index patterns?

    The index pattern without a time filter (user_messages) displays document, whereas the index pattern with a time filter (user_messages*) does not. This is because there is a time picker in the upper-right corner that is set up to displays the data from the last 15 minutes, but user_messages does not have any documents that are less than 15 minutes old.

  2. Anytime you view an index in the Discover interface and do not see any documents, make sure you check the time interval on the time picker. Complete the following steps:

    1. Delete the user_messages index pattern (the one without the Time Filter field set)

    2. Go back to the Discover interface and increase the value of the time picker to the previous year by clicking on the agenda icon, then select "Last 1 year". You should now see a long list of documents in the user_messages* index.

    Changing the time filter actually filters documents based on the value of the date field that you defined as a time field.

  3. Another quick way to select a time interval is to "select" a section of the histogram by drawing a box around the dates you are interested in. Select the last 5 years and try narrowing down the interval by using your mouse to draw a box from September, 2018, to December, 2018.

  4. Now that you are familiar with the time picker, let’s start using the search bar. Start by searching for all messages that contain "Smith".

    Type "Smith" in the search bar. You should get 283 hits. (The number of hits appears just above where you typed in "Smith".)

  5. The documents are sorted by the time filter field that you provided when creating the index pattern. By default, Kibana displays the first 500 documents, but the number of documents matching your request is displayed. that match your search.

  6. Let’s look at the details of one of the documents that was returned from your search.

    1. Click on the small arrow next to the first document. The arrow is just to the left of the "Time" column in the list of documents.

    2. You should see all of the attributes (fields) of the document displayed in a table format.

    3. What is the number of likes that this message received?

      There are 27 likes (This number may change based on the document you selected)

    4. What is the job of the person that has written this message?

      The person is a QA Engineer (This number may change based on the document you selected)

    5. Notice you can switch to a JSON representation of the document by clicking on the "JSON" tab.

    6. On the right-side of the interface, notice two buttons: "View surrounding documents" and "View single document". Click on the "View surrounding documents" button to view similar matching documents where the date is "near" the current document.

  7. We previously typed Smith in the search bar, but notice that some of the documents are actually not containing "Smith" on the user.last_name field. Let’s improve this search to avoid that scenario. In Kibana you can query a specific field (or attribute) by specifying which field you are searching on using the following syntax: field:value. Search for the value "Smith" on the field user.last_name field.

    Type into the search bar:

    user.last_name:Smith
    1. How many documents match the query?

      Notice there are only 261 hits on your refined search

    2. Using the selected fields panel, answer the following question: Out of all people having the name Smith, what is the first name of the person that sent the highest amount of messages?

      Blondell Smith sent the most messages. By clicking on the user.first_name field on the left panel we can see that Blondell is the most frequent first name.

  8. As you can see the selected fields panel provides an idea of the distribution of the data. Select the likes field in this panel. How helpful is this result when it comes to having an idea of the data distribution?

    Notice that text fields like user.first_name provide a fairly useful distribution, because you can view the top 5 values of a field. However, when it comes to numeric values like the numbers of "likes", seeing the top 5 values does not reveal much about the distribution of the data.

  9. Let’s see how we can improve the discovery of the data by using the Machine Learning feature of Kibana.

    1. Click on the Machine Learning icon in the left-hand toolbar

    2. From the Data Visualizer tab, click on the "Select index" button

    3. Choose the users_messages* index pattern

    4. Notice you will not see any data or charts. This is because the default time interval is "Last 15 minutes"

    5. Change the time interval to "Last 1 year" using the time picker. Notice that Kibana displays various charts and tables of the numeric fields in the "users_messages" index, and you can now answer more interesting questions about your dataset.

    6. What is the median number of "likes"?

    7. What is the distribution of the number of "subjects"?

      The number of "likes" seems to follow an exponential distribution.

      The number of "subjects" seems to follow a normal distribution.

  10. Go back to the Discover interface and let’s view a few more features of the UI.

    1. First, notice that Kibana remember your prior search and you are only viewing the documents that have "Smith" in the last name.

    2. Notice the search results show all of the fields of each document. Modify the display of the search results so that only the user.first_name, user.last_name, user.age, user.occupation, subjects and geo.country fields are displayed.

  11. Let’s do a more complex search. Write a search that satisfies the following requirements:

    • Search only on the field "subjects" for messages containing canon or both pic and outdoor.

    • The messages must come from "Europe".

    • The messages need to have at least 200 likes

      1. How many documents match your query?

    subjects:(canon or (pic and outdoor)) and geo.continent:EU and likes>200

    This query matches 498 messages.

  12. Let’s save the search:

    1. Click "Save" on the menu in the top-right of the screen and name the search custom_search.

    2. Click twice on the Discover tab in the left panel. This will clear your search results and you should now see 900,000 hits

    3. Open your saved search by clicking on the "Open" menu item and then selecting custom_search. This will run your saved search and you should now only see the 498 hits.

Optional: Select one of the documents in the list that has been returned. Click on the arrow next to it to view the document’s fields. Next to the first_name field there is a small magnifying glass with a "+" in it - click on this icon. What happens?

A filter appears under the search bar, filtering only the documents that have the first_name you selected. We will discuss this feature again later in the training!

Summary: In this lab, you learned how the search interface can be used to select subsets of an index pattern; how to analyze some of the fields of your dataset using both the Discover interface and the Machine Learning feature; and how to filter documents using the search bar of Kibana. You also learned how to create a search andsave it for later.

End of Lab 2


Lab 3: Aggregations

Objective: In this lab, you will use Kibana to understand Elasticsearch aggregations and create some visualizations.

  1. When we speak about data inside Elasticsearch, imagine that your data belongs to a big bucket defined by your index pattern.

    lab03 index pattern

    Let’s create a visualization to understand what is going on.

    1. Click on Visualize in the left-hand toolbar of Kibana.

    2. Click on the small icon + to create a new visualization.

    3. Select the Metric visualization

    4. Pick the index pattern user_messages*

    5. Verify that the time window defined by your time picker encompasses all the data from the index pattern.

  2. What does 900,000 really represent? Remember, your data is in a big bucket and by default aggregations calculate the count of the number documents in the bucket.Let’s change the count metric to something else:

    1. On the left panel, change the Count metric to Average

    2. Apply the average to the field likes

    3. Press the "Play" button to execute the changes

    4. What does the output represent?

    The output represents the average "likes" for all the documents in the target index pattern.

  3. Let’s refine this average, by only selecting the documents that match a certain criterion. To do that we need to add a query or a filter to the visualization.

    lab01 search criterion

    Add a query in the search bar that selects only documents that have a number of "likes" higher than 9000. What does the metric represent now?

    The query should look like the following:

    likes>9000

    The output visualization represent the average of "likes" of the documents that have more than 9000 likes.

  4. When running a search, you can focus on a subset of your data in the index pattern. Sometimes you may want to create multiple subsets (or buckets) and this is exactly what aggregations provide.

    lab01 buckets

    There are many ways of creating these buckets. Here is a list of the most common bucket aggregations:

    • Histogram

    • Date histogram

    • Terms

    • Filters

    • Range

  5. Let’s take a look at some of these different types of bucket aggregations. Start with the terms aggregation. The terms aggregation splits your documents based on the terms present inside a specific field. For instance:

    lab03 terms

    The terms aggregation is expecting the user to specify how many buckets will be generated - the default is 5 buckets. What if your documents have more than 5 distinct terms in a field? Which buckets are we going to return? The default behavior is to return the buckets with the highest document count.

    Let’s use a terms aggregation to split the documents based on the occupation field; specifically, the user.occupation.keyword field.

    1. First, remove any query that may be in your search bar so that you are performing the aggregation on the entire dataset

    2. At the bottom of the left panel click on Split Group.

    3. Select a Terms aggregations.

    4. Apply it to the user.occupation.keyword field

    5. Select the average of likes as the metric

    6. Specify a size of 2 (the size is how many buckets are going to be generated: by default Elasticsearch returns the buckets with the highest value first). What do these two metrics represent?

    Each metric displayed is the average of likes of all the documents inside the bucket corresponding to a given occupation. For instance: Managers have on average 1096.735 likes on their messages.

  6. It is important to understand that the two buckets are independent of each other. Aggregations are only grouping documents together and the metric is running an independent computation on the documents inside each buckets. For example, change the metric to be the count instead of the average of the number of like.

    1. We only asked for 2 buckets - that means that a lot of documents were actually ignored (because they don’t belong to the 2 first buckets). See if you can figure out how to create a third bucket that contains all the "other" documents that do not fall into the top 2 buckets.

  7. Two very useful aggregations are histogram and date histogram, which work on fields that are numeric values and dates, respectively. These aggregations expect a user-defined interval. For example, the following image represents histogram buckets with an interval set to 100:

    lab03 histogram

    Here is a date histogram with an interval of 1 month:

    lab03 date histogram

    Let’s create a visualization of the same date histogram that you can see in the Discovery interface:

    1. Create a new visualization of type vertical bar.

    2. Use the index pattern user_messages* On the X-axis, specify a date histogram applied on the field date.

    3. Define a daily interval

    4. The Y-axis is defined by your metric, which is count by default

  8. In a similar fashion, let’s create a histogram visualization:

    1. On the X-axis use the histogram aggregation applied on the number_of_subjects field with an interval of 1

    2. Keep the metric to count. It should display distribution of the number of subjects.

  9. The range and filters aggregations are interesting because they allow you to define your own buckets using specific queries. Let’s give it a try - suppose you want to create a pie chart that compare the number of messages from France and the number of messages from Germany:

    1. Create a new Pie visualization

    2. Select the index pattern user_messages

    3. Make sure that the metric is count

    4. Split the slices using a filters aggregation

    5. Define two filters:

      geo.country:Germany
      geo.country:France
  10. Now that you have seen both bucket and metric aggregations, let’s see how to create a more complex visualization. We are going to look at an XY visualizations, which is built as follows:

    lab03 XY visualizations

    1. Click on Visualize

    2. Create a new visualization

    3. Select the Vertical Bar chart.

    4. Select the user_messages* index pattern. You should now see a single bar with 900,000 documents in it. What does this mean? By default, there is only one bucket (the whole index pattern), and the metric computed on that bucket is the count.

    5. On the Y axis, display the average of likes in a bucket.

    6. On the X axis, display the top 6 terms of the geo.continent.keyword field.

    7. What is the continent that has the highest average number of likes?

      SA (South America) has the highest average of likes.

    8. Change the Y axis to be the count of the number documents per continent. Notice that the order of the bar will change and EU has the most overall documents.

  11. It is possible to have buckets inside buckets. Let’s divide every bar (representing a continent) by the top 5 geo.country.keyword (top 5 being the countries from which a high number of documents has been sent). To do this, perform the following steps:

    1. Click Add sub-buckets to the X-axis

    2. Select Split series

    3. Select the Terms aggregation

    4. Select the field geo.country.keyword

    5. Choose a size of 5

    6. Run the visualization

    7. Based on the sub-buckets, which country in North America have users been sending the most messages from?

      The most frequent country is the United States with 275,380 messages

  12. So far in this lab you have built visualizations based on an index pattern. You can also create visualizations at the search level. Recall the following saved search that from the previous lab:

    subjects:(canon or (pic and outdoor)) and geo.continent:EU and likes>200

    Let’s build a visualization based on the hits from this search:

    1. Create a new Vertical Bar visualization.

    2. For the Choose search source screen, select the saved search named custom_search

    3. On the Y axis, display the Count metric

    4. On the X axis, display the top 5 values of the user.first_name.keyword field who authored messages matching the above query.

  13. It is possible to save a visualization. Save your previous visualization and name it custom_visualization.

  14. Notice that in our documents there is a field called user.age. Let’s use this field to compute the average age of all of our users:

    1. Create a metric visualization on the user_messages* index pattern

    2. Define the metric to be the average of the field user.age. You should get 36.905

    3. Is this computed average the actual average of the age of our users?

    The average of 36.905 is not the actual average of the users. The actual average is 36.917, which in this specific case is quite similar to our metric visualization - but this is only a coincidence based of the normal distribution of our data.

    The data that we are analysing is event centric data. To better understand why the average was incorrect, let’s think about how it is weighted to events, rather than users with this simple example. Suppose we have a user named John Smith who is 44 years old and sends 500 messages. Suppose another user named Marie Normand is 20 years old and only sends one message. The actual average of the age of John Smith and Marie Normand is 32, but if we created a metric visualization of the average of John Smith and Marie Normand, the calculated value would be (500 * 44 + 20 * 1)/501, which equals 43.9! This demonstrates the situation we are in right now.

  15. Calculating the actual average age of our users would require a substantial amount of computation on the cluster, because the distinct users would have to be determined somehow from our 900,000 messages. Sometimes your data is simply not represented in a way that it can be easily leveraged for certain analytics. In our scenario, computing the average age would be much simpler if we had the distinct users already determined somehow and stored in a separate index. Thankfully, the team in charge of our Elasticsearch cluster was ahead of the game and helped us out by extracting unique users and storing them in an index named users. This type of data modeling in Elasticsearch is referred to as entity centric modeling. The messages from our users are in an event-centric index named user_messages, and our unique users are in an entity-centric index named users.

    Let’s take a look at the users index:

    1. Create an index pattern that matches the users index

    2. Take some time to analyze the users index and familiarize yourself with its different fields

    3. Create a metric visualization using the users index pattern to compute the average of your users' age. You should get 36.917, which is the actual average of the users of our application

Summary: In this lab you learned how to build basic visualizations using Kibana, which hopefully helped you understand the concept of aggregations in Elasticsearch. You also learned that sometimes the way the data is indexed inside Elasticsearch can impede you from executing certain analyses. This can often be fixed by using a different view of your data.

End of Lab 3


Lab 4: The Query Bar

Objective: Search is everywhere, and especially in Kibana. The search bar will follow you in almost every single interface in Kibana, so it is extremely important to understand how you can best leverage searching.

  1. Let’s start this exercise by creating a couple of index patterns (if you already have those index patterns you can skip this step):

    1. Create an index pattern named user_messages* using the date field as a time filter.

    2. Create an index pattern called users that does not have a time filter field.

  2. Remember that Kibana support multiple syntax for the query bar. Change the query bar to the Lucene syntax:

  3. The < or > operators can be used when searching for ranges. You can also represent a range query using the following syntax [LOWER_BOUND TO UPPER_BOUND]. If you want to exclude the bounding values you can use the symbol '{}' instead of [] Let’s search on the index pattern user_messages*.

    1. Write and execute a query that searches for all the documents that have a number of likes higher than 200 (note that using the Lucene syntax you need to add a column : before the operator).

      likes:>200
    2. Write and execute a query that retrieves all the messages that have from 200 to 400 likes.

      likes:[200 TO 400]
    3. Write the same query but do not include documents that have 400 likes.

      likes:[200 TO 400}
    4. You can use an asterisk * when the upper or lower bound is undefined. Write and execute a query that selects all the documents that have more than 400 likes.

      likes:[400 TO *]
  4. Make sure your time filter is selecting all the data from the index pattern users (you should see 5000 documents). Write a query that fulfills the following requirements:

    • In the search table, only displays the field first_name

    • The first_name field contains users named Cassandra

    The search request should look like the following, and it should match 2 documents:

    first_name:Cassandra
  5. Using the OR operator, add Kassandra to your previous query. Note that another difference between KQL and Lucene, in Lucene operators need to be written in uppercase.

    first_name:(Cassandra OR Kassandra)
  6. It is possible to search for name variations using a fuzziness parameter. Write a fuzzy search with a fuzziness value of one on the name Cassandra.

    first_name:Cassandra~1
  7. How many documents does this query match? What are the different name variations found?

    There are 10 documents that matches the query. The different matched values include Cassandra and Kassandra.

    first_name:Cassandra~1
  8. Increase the fuzziness value to 2. How many hits do you get now, and what are the different name variations returned by this query?

    first_name:Cassandra~2

    This query matches 25 documents and it matches the following values: Kassandra, Cassandra, Lashandra, Lasandra

  9. As we don’t know ahead of time the different variations that a name can take, it is hard to define how big the fuzziness parameter should be. You can let Elasticsearch decide the best value to use by setting the fuzziness parameter to auto. Use auto instead of 2 in the previous query. What valued did Elasticsearch use for the fuzziness parameter?

    first_name:Cassandra~auto

    Elasticsearch is returning 10 documents, so based on the previous queries we can see that the value chosen by auto was 1 for this particular query. If the query string is longer than 5 the value is 2; and if the query string shorter than 2, the value 0 is used for the fuzziness.

  10. Using a wildcard, write a search that matches people with a first_name of Julia or Julio. Your query should not return any documents that have more than 5 characters.

    first_name:Juli?

    This should match only 8 documents.

  11. Now search for all users that have a name starting by Jul

    first_name:Jul*

    This query should match 24 documents.

  12. Regex stand for regular expression. A regex is the definition of a pattern. To define a regex query, in Kibana the query needs to be surrounded by forward slashes (/). Regex has a specific syntax, let’s try to break it down. When searching in text, sometimes a pattern is needed to fill a gap. For example, one character may be missing.

    Suppose you want to write a query for Cassandra but you do not know if it is spelled Kassandra or Cassandra. A regex using the . can be used in this kind of scenario. Write a query using regex that will match Cassandra and Kassandra (but will not match for example Assandra) on the field first_name.keyword.

    first_name.keyword:/.assandra/
  13. In regular expressions, use the .? syntax when you want to express that a character may or may not be present.For example, in the previous query if you used .? instead of ., then Assandra could be a match. Using regex, write a query that matches the following pattern: Juli followed by 0 or one character. For example, this request could match Julia, Julio, Juli, and so on.

    first_name.keyword:/Juli.?/

    It should match people called Julia and Julio. Nobody in our dataset is called Juli, but if there were, they would be a match for this query.

  14. When you know that a word starts with certain characters and ends by another set of characters, but have no clue what and how many characters are in the middle, you can use .* which means it will match any character 0 or more times. For instance, if you want to find all the names that start by Jo and end by n and the word may have as many characters in between, the pattern would be: Jo.*n, which will will match names like: Joaquin, Johnson, Jordan, and so on.

    Using a regex, search for users that have a first name starting with Jul and having a n anywhere after Jul. There may be 0 or more characters after the n. For example, Juliana should be a hit.

    first_name.keyword:/Jul.*n.*/

    Keep in mind that regex and wildcard expressions are expensive and should be used with care.

Summary: In this lab, you learned more about the search bar, how to search on patterns, how to write queries with regular expressions, and how to use the auto completion feature of the Kibana search bar.

End of Lab 4


Lab 5: Searching on Text

Objective: In order to understand how to use correctly use the search bar in Kibana, it is important to understand how textual values work in Elasticsearch.

  1. Let’s start this lab with a simple but revealing query. In the users index, search for the string #dog on the field favorite_subjects_as_string. Review the values of the favorite_subjects_as_string field that are hits for your query.

    favorite_subjects_as_string:#dog
  2. Let’s slightly change the query. Write two searches: one for #Dog and another for dog on the favorite_subjects_as_string field.

    favorite_subjects_as_string:#Dog
    favorite_subjects_as_string:dog
  3. What can we learn from the two previous queries? Well, apparently uppercase and lowercase do not appear to impact the search, and some characters are even ignored (the hastag #). Interesting! Let’s dive deeper into it. Search for the value #dog on the field favorite_subjects. How many documents does this query match?

    The query should match the same documents than before:

    favorite_subjects:#dog
  4. Write two new queries on the favorite_subjects field: for #Dog and dog. How many hits do you get from each query?

    The queries do not match any documents.

    favorite_subjects:#Dog
    favorite_subjects:dog
  5. What is happening here? In the first scenario the search was case insensitive, but now the search is case sensitive. Let’s compare the two fields that we just searched on. Go to ManagementIndex Pattern and select the users index pattern. Search for the field favorite_subjects. What type of field is favorite_subjects? Compare this with the field favorite_subjects_as_string.

    We can see that favorite_subjects is a string and that the field is searchable and aggregatable. On the other hand favorite_subjects_as_string is only searchable.

  6. So the favorite_subjects field is searchable and aggregatable = what does that means? It means that this field is optimized for exact matches, meaning that this field is case sensitive. The fact that this field is aggregatable means that it will be possible to build visualizations on this specific field. The field favorite_subjects_as_string on the other hand is only searchable, which means that this field is indexed in a way that is best for full-text search.

    Review the following image, which demonstrates what happened to the field favorite_subjects_as_string when the documents were put inside Elasticsearch, and compares it with what happened to the query strings when we sent them to Elasticsearch.

    lab02 dog text

    Now compare the favorite_subjects_as_string field with how the favorite_subjects field is indexed:

    lab02 dog keyword

    This analysis process is configured at the index level when the index is created, and defining this process can be a tricky task! If you want to know more about text analysis, our Fundamentals course covers the topic in detail.

  7. Sometimes it is difficult to determine if a field should be optimized for full-text search or for exact matches. Let’s look at a few examples to illustrate the problem better:

    1. From the Discovery interface in Kibana, look at the values of the user_id field. Would you optimize this field for full-text search or for exact match?

      Imagine you have the following user_id: mF1Bt2lRoVl31uRNueWQpVJXmPguqSsrxV1m2sq4f2s. If it was indexed for full-text search, the value would be lowercased and broken down into several different strings, which does not seem like a useful way to index a user’s ID value. You would not want to lose case-sensitivity, and you would not want the value to be split into multiple strings, so for a user_id it appears that indexing the field for exact matches seems like the best option. Moreover, one of the main advantages of a field optimized for exact match is it can be used in visualizations.

    2. Now view some of the values for the occupation field. Would you say that this field should be optimized for exact matches or for full-text search?

      This one is not as obvious: in one way, exact matches would be great for building visualizations on top of the field to answer interesting questions like how the number of messages may vary across occupations. But ideally, when searching for engineer it would be great to find Software Engineer and Sales Engineer - so there would be an advantage to having the field optimized for full-text search as well. In this case, you would actually index the occupation field twice: once for exact matches, and a second time for full-text search.

  8. The scenario of the engineer string field is a common one. Let’s try something:

    1. Write a query that searches for users that have engineer in the occupation field.

      occupation:engineer

      Great - it seems to works! This field is obviously optimized for full-text search.

    2. Note that we shouldn’t be able to create a visualization on the occupation field though…​but let’s still give it a shot! Create a visualization of type Vertical bar on the users index pattern. On the Y axis, configure the number of users in the bucket. On the X axis, try to add a terms aggregation on the occupation field. Do you see the occupation available for a terms aggregation?

      No! Kibana will not allow you to do a terms aggregation on the occupation field. However, notice there is another field called occupation.keyword! Use this field instead for your terms aggregation:

    3. We could not use occupation, but we could use occupation.keyword for the terms agg. What is this keyword field? When a field can have multiple use cases (full text and exact match), it is possible to "duplicate" this specific field and optimize the duplicated field differently. Click on the users index pattern in Management and search for the field: occupation. Notice the field is indexed twice: once as full text (occupation) and once for exact matches (occupation.keyword).

Summary: In this lesson, you learned that Elasticsearch analyzes text in different ways. To avoid a complexity layer, Kibana simplifies the terminology of how strings are indexed, but what you need to remember is that a field that is optimized for full-text search is indexed as a text field in Elasticsearch; and a field that is optimized for exact matches is indexed as a keyword field.

End of Lab 5


Lab 6: Query DSL

Objective: The Query DSL defines a way to write search queries using a JSON format. It’s good to be familiar wit the Query DSL because it will actually help you when you need to perform advanced tasks like filter customizations.

  1. Let’s start with simple query:

    1. On the left-hand side panel, click on the Dev Tools interface.

    2. Click on the Get to work button

    3. You should see the following query:

      GET _search
      {
        "query": {
          "match_all": {}
        }
      }
    4. Click on the query and then click on the small play button next to the query. You should see the response of this query on the right side.

    5. The previous query returns the 10 first documents present inside the Elasticsearch cluster, and you can see from which index the documents come from. Let’s slightly change the query to search only amongst a specific index pattern. Replace GET _search with GET users*/_search and execute the query again. Look at the response and try to guess how many documents match the query.

      GET users*/_search
      {
        "query": {
          "match_all": {}
        }
      }

      The number of documents matching the query can be found at the top of the response "total": 5000:

      {
        "took" : 3,
        "timed_out" : false,
        "_shards" : {
          "total" : 5,
          "successful" : 5,
          "skipped" : 0,
          "failed" : 0
        },
        "hits" : {
          "total" : 5000,
          "max_score" : 1.0,
          "hits" : [
          ...

      Inside the square bracket following hits, you can see the first 10 documents that match the query.

  2. The match_all query you just executed causes every document from the specified index pattern to be a hit. Let’s slightly change the query: replace match_all by match, letting Kibana auto-complete the query. Your query should now look like the following:

    GET users*/_search
    {
      "query": {
        "match": {
          "FIELD": "TEXT"
        }
      }
    }
  3. You replace FIELD by the field you want to search on, and TEXT with the string you want to search for. In your query, replace FIELD with favorite_subjects_as_string and replace TEXT with dog. Run the query. How many documents match the query?

    GET users*/_search
    {
      "query": {
        "match": {
          "favorite_subjects_as_string": "dog"
        }
      }
    }

    The total in the response indicates that the query matches 2 documents.

  4. When using the Lucene syntax, it is possible to search for multiple words. Let’s try the same technique using the Query DSL syntax: search for dog or liking or american on the favorite_subjects_as_string field.

    GET users*/_search
    {
      "query": {
        "match": {
          "favorite_subjects_as_string": "dog liking american"
        }
      }
    }
  5. By default, the match query searches for dog OR liking OR american. It is possible to change the behavior of the match query to use AND logic. Because you need to specify the operator, the syntax of the query is slightly different. Remove "dog liking american" from the previous query and replace it by {}. Inside those brackets, define two key/values: one named query with the value "dog liking american"; and the other named operator with the value and. Let Kibana autocomplete any of the syntax along the way as you are typing.

    GET users*/_search
    {
      "query": {
        "match": {
          "favorite_subjects_as_string": {
            "query": "dog liking american",
            "operator": "and"
          }
        }
      }
    }
  6. The and operator is very restrictive and did not bring a lot of results. Perhaps it would be nice to build a query that selects documents that have 2 of the 3 words specified in the query? You can do this by replacing the operator and with another key/value pair: minimum_should_match and the number 2. Also, change the query the following terms: "mensfashion camera amor dog liking". How many documents match this query?

    GET users*/_search
    {
      "query": {
        "match": {
          "favorite_subjects_as_string": {
            "query": "mensfashion camera amor dog liking",
            "minimum_should_match": 2
          }
        }
      }
    }

    This query should match 2 documents.

  7. OPTIONAL: If you want learn more about the Query DSL syntax, view the documentation. For example, use the range query to query the salary field and find all users that have a salary between $100,000 and $120,000.

    GET users*/_search
    {
      "query": {
        "range": {
          "salary": {
            "gte": 100000,
            "lte": 120000
          }
        }
      }
    }
  8. OPTIONAL: Using the documentation for the prefix query, create a query that hits all the document that start with Un on the field geoip.country_name.keyword.

    GET users*/_search
    {
      "query": {
        "prefix": {
          "geoip.country_name.keyword": "Un"
        }
      }
    }

Summary: In this lab, you learned how to use the Query DSL syntax. If you are wondering when you should use the Query DSL instead of the Lucene syntax, we are going to discuss that in more detail in the next lesson.

End of Lab 6


Lab 7: Filters

Objective: In this lesson, you are going to see how to use filters to improve the search experience and how filters can be used to navigate through the Kibana interface.

  1. From the Discover interface in Kibana, perform the following steps using the user_messages* index pattern:

    1. Manually create a filter that selects all the documents that contain one of the values tequila, red, dream or car in the subjects field.

    2. From within the list of documents in the Discover interface, find one that has an user.occupation of Software Engineering. Use that document to filter out all documents where the user.occupation is not Software Engineer.

    3. Disable the filters

  2. Create a filter that selects all the documents that have between 300 and 700 likes. Then disable the filter.

  3. Create 3 filters that select a specific country from the geo.country.keyword field:

    • United States

    • France

    • China

    Change their labels to be the name of their respective country and disable them.

  4. You should now have 6 disabled filters:

    • All the documents that are not from a software engineer

    • All the documents that have between 300 and 700 likes

    • All the documents that have at least 1 of the following tequila red dream car

    • All the documents that come from the United States

    • All the documents that come from the China

    • All the documents that come from the France

    Use those filters to discover how many messages were sent from each of the countries United States, China, and France that satisfies the following characteristics:

    • It contains at least one of the following terms: tequila red dream car.

    • The message is not from a software engineer

    • The message has between 300 and 700 likes

    • The message was sent from people with the last_name of Fillingim (use the query bar)

    Enable all the filters except the one referring to the countries. In the query bar type the following:

    user.last_name:Fillingim

    To count the number of documents from the United States, enable the United States filter - you should find 9 documents. For France, disable the United States filter and enable the France one - you should find 0 documents. Apply the same concept to China - you should find 0 documents again.

  5. Let’s see how we can edit a filter:

    1. Remove the query in the search bar

    2. Disable all the filters except the one that contains tequila red dream car.

    3. How many documents match this specific filter?

      There are 30 805 documents matching the filter.

    4. Edit the query DSL of the filter to filter out documents that have less than two of the specified words. Change the minimum_should_match parameter from 1 to 2.

      {
        "query": {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "subjects": "tequila"
                }
              },
              {
                "match_phrase": {
                  "subjects": "red"
                }
              },
              {
                "match_phrase": {
                  "subjects": "dream"
                }
              },
              {
                "match_phrase": {
                  "subjects": "car"
                }
              }
            ],
            "minimum_should_match": 2
          }
        }
      }
    5. How many documents match this edited filter?

      229 documents match the filter.

    6. Remove the filter.

  6. Save the search and name it custom_filter.

    1. Click on the Discovery tab. All the filters should disappear.

    2. Open the search that you just saved, all the filters should be back.

  7. Let’s see how pinned filters are used in Kibana:

    1. Pin the filter NOT user.occupation:"Software Engineer".

    2. Create a new Vertical Bar visualization, using the user_messages* index (you should see the filter in the list of filters)

    3. On the Y-axis, display the number of documents in the bucket.

    4. On the X-axis, create a histogram based on the user.age field. Define an interval of 1, then press Play

    5. Enable and disable the filter to see how it impacts the visualization

  8. An alternative way of creating a filter is through a visualization:

    1. Create a visualization of type "Vertical Bar" that uses the user_messages* index

    2. On the Y-axis, display the number of documents in the bucket

    3. On the X-axis, use the terms aggregation to group documents using the field geo.continent.keyword.

    4. Display 6 buckets

    5. Click on the EU bucket.

    6. Pin the filter

    7. View the Discovery interface and make sure that the filter follows you

Summary: In this lab you learned how filters can be created and customized, as well as how they can be used to navigate through Kibana using pinned filters.

End of Lab 7


Lab 8: A Small Refresher

Objective: This lab is a review of how to create basic visualizations in Kibana

  1. Before creating any visualizations, you will need to define a couple of index patterns (if you already have those index patterns you can skip this step):

    1. Create an index pattern called user_messages* using the date field as a time filter.

    2. Create an index pattern called users.

  2. Metrics aggregations are used to compute a size, which could be the size of a bar in a bar chart, the size of a slice in a pie chart, the size of a word in a tag cloud, and so on. Bucket aggregations are the criterion used to split the documents of an index into buckets. For instance, when working with a bar chart you will have the following:

    lab00 XY visualizations

    Let’s create this type of visualization. Create a Vertical Bar visualization that satisfies the following requirements:

    1. Uses the users index pattern

    2. The Y axis shows the number of buckets from the X-axis

    3. The X axis consists of the 3 most frequent countries that our users come from.

      In order to create the desired X-axis, you will need to use a Terms aggregation. When defining a terms aggregation, you specify how many buckets you want to generate. By default, the first bucket returned is the term with the highest frequency. This is what the terms aggregation is going to do on your dataset:

    lab00 terms

  3. Using the index pattern user_messages and the field geo.country.keyword, create a vertical bar chart that displays the top 3 countries from which messages are sent. In addition, create a bucket for messages from the "other" countries.

  4. Create another vertical bar chart using the user_messages index pattern. This time, instead of creating buckets based on a specific term in our index pattern, you will generate buckets base on an interval:

    1. On the X-axis, display the distribution of the number of likes (likes) using a histogram aggregation. When defining a histogram, you need to define how big the buckets will be. Create a histogram with an interval of 1000:

      lab00 histogram

  5. In a similar fashion, you can create a date histogram and define the interval using a date expression like a month, a week, a day, etc. It is possible to set the interval to auto to actually let Kibana define the optimal size of the buckets:

    lab00 date histogram

    Using a date histogram, create a visualization that will display the number of messages over time using the user_messages* index pattern. Create the visualization using one day as the time interval.

  6. The filters aggregation allows you to create your own buckets base on custom criteria, like defining in a query which documents should shape a bucket. Let’s group documents base on the salary field, use the users* index pattern.

    1. In the Discovery interface, write a separate query for each of the following criteria:

      • All the documents that have a salary higher than 180000

      • All the documents that have a salary between 100000 and 180000 (180000 excluded)

      • All the document that have a salary lower than 100000 (100000 excluded)

        salary:[180000 TO *]
        salary:[100000 TO 180000}
        salary:[* TO 100000}
    2. Next, you are going to use those queries to create custom buckets:

    lab00 filters

    1. Create a pie chart and split the slice using a filters aggregation. Create 3 filters, one for each of your three queries above.

    2. Keep in mind that it is possible to actually change the metric aggregation at any point in time. For example, change the slice size to be the sum of the salary field.

  7. Let’s try to learn more about the jobs that people are doing in the different salary ranges. It is possible to add a sub-bucket aggregation to the current aggregations to have a break down of the slices based on other criteria:

    lab00 subbuckets

    Add a sub-bucket aggregation to your pie chart. To add a sub-bucket, click on Split Slices and then select the sub-aggregation you want to use. Define two filters: one that selects all the documents with a number of followers higher than 500; and a second filter that selects all the documents with a number of followers lower or equal than 500.

Summary: This lab was just a warm-up for the next chapter, so hopefully you are comfortable with the concepts of aggregations and visualizations. In the next lesson you are going to see how you can use Kibana to create pipeline aggregations.

End of Lab 8


Lab 9: Pipeline Aggregations

Objective: You will run in to many scenarios where you ask yourself "How can I solve this problem using Kibana?" In many cases, the answer will involve using pipeline aggregations. An aggregation works on top of documents; a pipeline aggregation works on top of the results from another aggregation. In this lab you will see how to use pipeline aggregations in Kibana.

  1. Suppose you want to answer the following question: When did we start receiving more than 500,000 messages? To answer this question, perform the following steps:

    1. Create a vertical bar chart for the index pattern user_messages*

    2. On the X-axis, create a date histogram with a daily bucket

      This aggregation shows many messages were sent on a day-to-day basis, but it does not answer our question. We need to sum all the buckets until we reach 500000 documents, which can be done using a pipeline aggregation named Cumulative Sum.

    3. Change the metric aggregation on the Y-axis to be Cumulative Sum instead of Count. Notice the vertical bar chart now shows the overall total number of messages received, but on a daily basis. Viewing the chart you can now easily find the day in which the total number of messages surpassed 500,000.

    4. Save this visualization, giving it the name cumulative_sum.

  2. Suppose you want a visualization that shows the variation of the number of messages received on a daily basis. Complete the following steps to build this visualization using the derivative pipeline aggregation:

    1. Create a new vertical bar chart with a daily date histogram on the X-axis

    2. Change the metric aggregation on the Y-axis to Derivative. Notice the bar chart now shows the difference in the number of messages from day to day.

    3. Save the visualization, giving it the name derivative.

  3. The previous derivative visualization compared buckets with their neighbors, which is a useful day-to-day observation. But those buckets only compared one day to the next, when ideally we might want to compare a bucket from a particular day of the week, like compare one Monday the following Monday. To accomplish this, we will need to use the Serial Diff pipeline aggregation:

    1. Using your derivate visualization from the previous step, change the metric to Serial Diff on top of the count. Is there a difference with the previous derivative visualization?

      No. The two visualizations should be identical. By default, serial diff calculates the difference from the previous bucket, in the same fashion as the derivative aggregation.

    2. It is possible to define a lag when working with the Serial Diff aggregation. Click on Advanced under Custom Label. It should display a text box that is used for defining parameters that are not in the UI. You are going to add a lag parameter. Type the following in the text box:

      {"lag":7}

      You can see more parameters related to the serial difference aggregation by viewing the documentation here.

      Note that the visualization starts later because more bars are required before being able to run the aggregation.

    3. Save the visualization, giving it the name serial diff.

  4. Now let’s create a visualization that shows the moving average on the number of likes per day:

    1. Create a vertical bar on the index user_messages*

    2. Define a date histogram with a daily interval.

    3. Change the metric to Moving Average.

    4. When applying the moving average, an important factor is the size of the window. Similar to how you configured the lag for Serial Diff, add a window parameter that sets the size of the window to 7:

      {"window": 7}

    More parameters for the moving average can be found here

Summary: In this lab, you learned how to use pipeline aggregations on top of other aggregations to build more complex visualizations. We are going to dive more into this topic in the "Improving Visualizations" lesson.

End of Lab 9


Lab 10: More Types of Visualizations

Objective: So far we have been working a lot of bar charts in our visualizations. In this lab, we will take a look at some of the other visualizations available in Kibana.

  1. Let’s start by creating a Data Table:

    1. Create a new Data Table visualization based on the users* index pattern.

    2. You should see a single row that is doing the count of all the document in the index pattern. Split the row using the Terms aggregation on the field occupation.keyword.

    3. Display the 5 buckets having the highest number of documents.

    4. Click on Add sub-buckets and add a filters aggregation. Create two filters:

      • a bucket with all the documents having a salary higher than 145000

      • a bucket with all the documents having a salary lower or equal to 145000

      salary:>=145000
      salary:<145000

      This could work as well:

      salary:[* TO 145000]
      salary:{145000 TO *]
    5. You should see 10 rows and 3 columns. Change the metric aggregation to be the average of the salary field instead of count.

    6. It is possible to add an additional column by adding a metric. Click on Add metrics and select the average of the age field.

  2. Now let’s create a Heat Map.

    1. Create a visualization of type Heat Map that uses the users* index pattern.

    2. On the X-axis use a Terms aggregation to display the 5 most frequent occupations.

    3. On the Y-axis use a histogram based on the number of followers. Use an interval of 1000 on the field followers.

    4. Change the metrics to be the Average of the average_likes

    5. Click on the Options tab and change the color of your heat map.

    6. Increase the number of colors for the heat map to 5.

  3. Kibana has a Tag Cloud visualization that is very useful for displaying words in a dashboard. The advantage of this visualization is that you can use it to actually filters documents. Let’s create a Tag Cloud base on the index pattern called users*.

    1. The Tag Cloud is an extremely simple visualization - there are only two aggregations available: the terms aggregation and the significant terms aggregations. Click on Tag and select the Terms aggregation. Apply it on the geo.country.keyword to display the 10 top countries.

    2. You can go into the Options of the visualization and change the way the visualization is displayed. For example, change the Orientations to be right angled.

  4. The user_messages index has a lot of geo data, so let’s take a look at some of the Kibana visualizations that leverage this geo data. Let’s view on a map where the messages we are receiving are coming from.

    1. Create a Coordinate Map visualization using the user_messages* index pattern.

    2. The value should be the Average of the likes field.

    3. Click on Geo Coordinates and then select the GeoHash aggregation (an aggregation specific for geo points). You will need to select a field, select the field geo.location (it is the field that contains the position where the messages were sent). The dots represent the position where the message are being sent.

    4. Zoom on the map to see how the zoom affects the visualization

    5. Let’s see how we can customize this visualization. Go into Options and change the "Map type* to Heatmap instead of the scaled circle markers.

  5. Geo visualization are more than dots on a map. Sometimes it is required to highlight specific areas of the world.

    1. Create a new visualization of type Region Map using the users index

    2. Only the Terms aggregation is available for the region map. Apply it on the field geo.country_code2. This field actually contains the code of a country, for instance FR for France. Select a size of 10 and look closely at the the visualization.

    3. Go to the Discover interface and view some of the values of the field geo.region in the users index. What if we want to only work on the data that is coming from the United States?

    4. Go back to your Region Map visualization and add a query that selects only the users that have a geo.country equal to "United States".

      geo.country:"United States"
    5. What happens? Well …​ only messages from the United States are displayed - no surprise. But it is interesting to zoom in on the different states of the United states. In the Options tab, there are two important settings: the vector map and the join field. The vector map is the map to use, and the join field is how the area is represented in your documents. For instance, is California represented as California or as CA? Change the vector map to be US states, make sure that the join field is State name, and change the aggregation to use the geo.region.keyword field.

Summary: In this lab, you learned how to some of the other types of visualizations in Kibana, and you also learned how to customize those visualizations. We are going to dive deeper into customzing visualizations in the next lesson.

End of Lab 10


Lab 11: Improving Visualizations

Objective: In this lab, we are going to dive further in how we can actually use Kibana to create more complex visualizations, including multiple metrics, multiple charts and bubble charts.

  1. In a previous lab you saved a visualization named derivative. Open your derivative visualization. If we want to spot an anomaly, we need to check the length of the bar. The longer the bar is, the bigger the anomaly. But if we want to monitor trend, we will need something more complex. Trend will be characterized by bars that slowly grow or shrink, but trends are hard to detects because if the growth or the shrinking is too small from one bar to the other, then we may miss it. Moving average is good at spotting trends.

    Apply the moving average to your derivative by following these steps:

    1. Click on add metric, select the Y-axis, and then pick the moving average. Instead of applying the moving average on a custom metric, apply it to metric: Derivative of count.

    2. Set the window size to 7 by adding a parameter to your moving average.

      In the advanced tab make add the following:

      {"window": 7}
    3. You should now see two series on the same visualization, but they are somewhat difficult to view. Let’s customize the visualization a little. On the Metrics and Axes tab, you should see your two metrics. Keep the derivative type as bar, but make sure that the mode is normal and not stacked. Stacked will stack the different metrics, whereas normal will just separate the two.

    4. For the moving average, make sure that the Type is line and the Mode is normal as well.

    5. Now you should see one series as bars and another as lines. At this point, it will probably be hard to see a trend because of the axis. The derivative series has big values (up to 1200), while the moving average has very low values. To solve this issue, we can create two different axes for the two metrics. Assign the derivative as the left axis and moving average as the right axis. You should now see that the moving average is stationary - meaning that there is no specific trend right now.

    6. Change the color of the two series using the palette picker.

  2. In Kibana there are many ways of viewing data, and there is no "good" or "bad" way - sometimes it is just a question of preference. Let’s build a visualization that will compare the number of people in the different continents and their salary for a specific job.

    1. Create a bar chart on the users* index pattern.

    2. Define two metrics: an average of the field salary and a count.

    3. Let’s customize the metrics. Make sure that for the two metrics the mode is normal and not stacked.

    4. Make sure that the count and the average have two separate axes.

    5. On the X-axis, display 10 different jobs (including "other").

    6. Let’s break down the geo data by continent. Split the series using the field geo.continent.keyword, and set the size to 6.

    7. That seems like a lot of series right now! And to be honest it is kind of hard to read. Instead of displaying everything as separated series, let’s try to create more charts. Remove the last split that you did.

    8. Now select split chart instead, and split the chart using the field geo.continent.keyword, setting the size to 6.

    9. Now you can see very quickly that the salary seems to be uniform across the different continents for a given position. You can also see other details, like there are not a lot of employees in Africa.

  3. Bubble charts are visualizations that summarize two metrics in one. Let’s see an example to try to understand how they work. Suppose we want to analyze the number of messages and the number of likes that the messages received based on the different continents.

    1. Create a vertical bar charts using the user_messages* index pattern.

    2. Define two metrics: the count of messages, and a dot size based on the average of likes.

    3. Customize the metrics: instead of displaying a bar, display a line and uncheck the Show line button. You should now see only a point.

    4. Split the visualization: add a date histogram on the X-Axis, using auto as the date interval.

    5. Add a sub bucket to split the chart based on the continent, using a Terms aggregation on the geo.continent.keyword field.

      This visualization shows how many messages are sent per continent, as well as how much people liked those messages - all in a single visualization.

  4. Try to answer the following questions using visualizations. The solution provided is only one way to answer each question - you may find a perfectly good solution using a different approach.

    1. What is the percentage of users that are Software Engineers?

    2. What is the age distribution of our users?

    3. What is the maximum and minimum age of the users?

    4. What are the top 10 countries where the users are located?

    5. What are the 10 countries with the messages having the highest average number of likes?

    6. What is the salary distribution of the users? Using the search bar, display the salary distribution of only the Sales users.

    7. Based on this dataset, could you validate the following hypothesis: "The more followers a user has, the more their messages are going to be liked"?

    8. What are the 10 regions from China that have sent the highest number of messages?

    Summary: In this lab, you learned how to create advanced visualizations using multiple metrics and charts. You also learned how the different metrics can be customized.

End of Lab 11


Lab 12: Introduction to Dashboards

Objective: In this labs we are going to work with a new dataset! The first task, will be to understand the dataset and create a lot of visualizations, in order to gather all of them inside a single dashboard.

  1. In Elasticsearch we have a bunch of indices that look like the following: apachelogs-2019-01-19, apachelogs-2019-01-20, …​

    1. Create an index pattern that will match all of them. Use the field @timestamp as a time filter.

      lab01 index pattern

    There is a lot of patterns that can match all the indices in Elasticsearch, but a reasonable one will be the following:

    apachelogs*
    1. Look at the different fields inside the index pattern, try to look at the name of the fields and their respective data type. Especially try to look for fields that have some specific datatypes, like geo points for instance.

  2. Go in the discover interface and try to look at the documents itself. The document may look odd but don’t worry, we are going to analyze them and understand them. But before everything else you need to understand, what apache logs are. When you have a website (or any web application) it is very common to have what is called a web server like Apache. The Apache web server will help handling and delivering web pages to the users. This Apache server will generate what we call logs which are records of the different events happening on the Apache server. An event will be for example someone accessing a specific web page, we are going to see what kind of information are logged, later on. This is typically how it is deployed:

    lab01 apache server deployment

    Let’s have a look at the different fields present in the apache logs.

    • Let’s start with the field @timestamp, this field is simply the date when the event happened.

    • The field starting by _ are what we call meta field, there is for example the id, the name of the index, etc …​

    • The agent field is containing what we call the user agent. When someone is navigating on internet, it is very useful for the web server (the server hosting the website) to know what kind of browser the user is using in order to adapt the content to a specific browser. So you will see information like: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 this user agent mean that someone is accessing web pages from the website using Firefox on windows 7. As you can see the agent is not easy to read. So it is possible to parse the agent field to extract information:

      • useragent.device which is what kind of device the user is using to access our website.

      • useragent.os to know what is the operating system used by the user, windows for instance.

      • useragent.name which is the name of the browser used, Firefox for instance.

    • The field auth is used when a user is authenticating themselves when accessing the website, it allows us to know which user access the website. In those logs there will be no auth, as nobody used the authentication.

    • bytes is the field that contains the number of bytes that has been used to process the user request (for example how many bytes where used to process the request of the users)

    • The field client.ip is the ip of the user that went to the website. Based on the ip used by the client it is possible to know the geo information of the user:

      • The geoip.latitude and the geoip.longitude for instance, which are the coordinate of the user.

      • The geoip.location which contain both, the latitude and the longitude. You should have see that this field is mapped as a geo point (meaning we can use the coordinates map).

      • There is additional information about the geo localization like the region, geoip.region_name of the ip address and the geoip.country_name

    • the http_version field is containing the version of http used.

    • There is a field called referrer, this field can give us indication about how the user went to our website.

    • The field request contains information relative to what the user requested. To summarize this field is the URL requested by the user.

    • verb is the verb that has been used to access the resources.

  3. Now that you have an overall understanding of the different fields. Let’s try to create some visualizations to represent those data. Keep in mind that every visualization should be saved. But before creating visualizations make sure that your time range encompass all of your data from apachelogs*.

    1. Let’s start by creating a visualization that represent the number of logs over time.

    2. Create another visualization that represent two metrics:

      • The cumulative sum of all the bytes

      • The cumulative sum of the number of logs

    3. Create one visualization with two metrics in it:

      • The average of bytes over time

      • The sum of bytes over time

    4. Create 4 geo visualizations:

      • One visualization that represent the most frequent countries in the logs

      • One visualization that represent the exact localization of the different requests going on our websites

      • One visualization that represent the countries with the highest average of bytes downloaded

      • One visualization that represent the point with the average of bytes downloaded

    5. Create a horizontal bar chart that will display the most frequent agent:

    6. In the same fashion, create a horizontal bar chart that will display the most frequent IP adresses, clientip.keyword:

    7. Create a tag cloud that will display the most frequent useragent.name

  4. Let’s put it all together:

    1. Create a dashboard and add all the visualizations you just built in it.

    1. Save the dashboard and name it: Apache logs dashboard

  5. Now that we have a dashboard, let’s share it with other users.

    1. You have two options when it comes to sharing:

      • Embedded code (Iframe)

      • Permalink

        Select the permalink

    2. Select saved object and then you can copy the link, and send it to someone.

      Optional Exercice: To do this exercice you need to be able edit a file using the command line in a terminal.

    3. Let’s use the iframe, to see the difference between a snapshot and a saved object.

    4. In the virtual environment, use the terminal (you don not need to ssh into server1).

      Edit the html file of the web page:

      vim dashboard_iframe/index.html

      Add the two iframes under:

      <!-- Iframe using saved object -->
      
      <!-- Iframe using snapshot -->

      Remove the attribute of the iframes tags and replace them by the following:

      src="" width="100%" style="height: 100em"

      The iframe tags should look like the following (URL should replaced with the correct URL):

      <iframe src="URL" src="" width="100%" style="height: 100em"></iframe>
    5. Go to the dashboard. The two dashboards should be identical.

    6. Go back to kibana add some filters and save the visualization. Go back to the web page. You should see that the saved object iframe has changed and the snapshot one has not.

    Summary: Congratulation! You just built your first dashboard, and now you can share it using iframes and permalink. In the next lesson we are going to see how we can actually customize our dashboard to add text and user input.

End of Lab 12


Lab 13: Markdown and User Input

Objective: In this lab we are going to see how you can use the markdown visualization to add labels to your dashboard and navigate through multiple dashboards. We are going to see as well how we can create visualization that let the user create filter dynamically.

  1. Let’s first create a markdown visualization. In this lab feel free to use this page as a reference to build your markdown visualization. We want to create a markdown visualization that summarize what the dashboard is about.

    1. In a big font it should be written the following: Apache Logs

      ### Apache Logs
    2. You should have a line to separate the title from the body.

      ______
    3. Add some text to the visualization explaining the purpose of the dashboard.

      This dashboard aims to present the logs from our apache server. The index pattern use for those visualization is the following: *apachelogs**
    4. Save the visualization and give it the name: Dashboard introduction

  2. Ideally we would like to have other dashboards that breakdown this dashboard per region. Let’s create more dashboard:

    1. Create a region map. In this visualization create a filter that display only the documents that have geoip.country_code2 equal to US. Pick the map USA States and make sure you are using the 2 letters abbreviation. The visualization should display the 20 most frequent geoip.region_code.keyword.

    2. Create another region map but this time apply it to France. So create a filter on the field geoip.country_code2 should be equal to FR. Pick the map France Departments and make sure you are using the INSEE Department identifier. The visualization should display the 20 most frequent geoip.region_code.keyword.

    3. Create a second dashboard called Apache logs dashboard: US that contains the visualization Dashboard introduction and the the region map that focus on the US.

    4. Create a third dashboard called Apache logs dashboard: France that contains the visualization Dashboard introduction and the the region map that focus on France.

    5. Let’s now edit the visualization: Dashboard introduction. Draw a line after the text and add 3 links to the visualization. One link called Main Dashboard that point to the dashboard that we created in the previous lesson. One link called US Dashboard that point to the saved object of the US Dashboard. And a last link called France Dashboard that point to the dashboard called Apache logs dashboard: France. Not that you have an option that you can check if you want to open link in a new tab.

    6. Save the visualization and use this visualization inside your dashboard to navigate through your dashboards.

  3. Let’s now see how we can create a visualization to let your users create filter easily inside the dashboard.

    1. Create a visualization called Controls.

    2. You have two choices:

      • Option list, if you want to have a list of terms that the users can pick from.

      • Range Slider, if you want a slider as an input, to control numerical values.

      Create an option list. Click on add. Then select the index pattern that you are going to use and then the field that you want to use to create a list of terms (this rely on a terms aggregation). You can un-switch the option Dynamic Options, and you will be able to select how many terms you want to display). Use the field: geoip.country_code2. Un-switch the multiselection option as a log can come from only a single country.

    3. On the same visualization add a range sliders on the bytes field and use steps of 10000

    4. Save the visualization

    5. Add this visualization to the dashboard Apache logs dashboard and use it to interact with your dashboard.

Summary: In this lesson you learned how you can use the markdown visualization to navigate through multiple dashboard and to add description inside your dashboards. You learned as well how controls can be used to create filters. In the next lesson we are going to see how the dashboard can be used to hunt anomaly.

End of Lab 13


Lab 14: Anomaly Hunt

Objective: In this lab we are going to see how we can actually use a dashboard to hunt anomalies in our data!

We are going to use the Apache logs dashboard to do the hunt. If you have successfully done the dashboard that was requested in the lab number one, the two pictures below should be familiar. Here is some of the anomalies that we have in our data, and we want to investigate them!

lab03 anomalies bytes

lab03 anomalies nb logs

Try to dive into those anomalies using your dashboard to find out what is causing the issue. Try to generate the corresponding filters that will remove those anomalies. Below you can find the source of all the anomalies, but before looking at it, try to resolve it by yourself! Solving the issue in the order they are given will probably make things easier for you, as removing one anomaly will highlight others.

  1. First anomaly

    The Chef issue!

    To find this anomaly you need to look closely at the user agent (UA) before and after that the anomaly occurs. Before the anomaly there was never a mention of chef in the user agent, after the anomaly it is becoming the main UA. It is as well possible to spot this anomaly by noticing that the chef UA is the most frequent user agent, once you click on it and generate a filter, the anomaly is becoming clear. For the small story that was due to a bug in one of the release of the software.

  2. Second anomaly

    The Brazilian attack!

    To find out what is going on you need to zoom on the anomaly. Then you will realize that in this scenario there are a lot of events coming from Brazil. Then by creating a filter on Brazil you can see that the main user agent is Java and that’s it’s coming from a single IP address. Java is a programming language, meaning that a person is probably programmatically accessing our website and moreover this person doesn’t respect good practices, putting a lot of load on our system in a short period of time. Filter out this address IP.

  3. Third and fourth anomaly

    Abnormal activity using Sogou Explorer and attack from Indonesia!

    When zooming on the anomaly we can note that a lot of events actually comes from Indonesia, but we can see as well that in average a lot of bytes are requested from China. Let’s first look at Indonesia. Click on the country and then look at the IP addresses? You can see that there is more than 2000 requests sent from the same IP address and the request doesn’t have any UA, which may be suspicious and may means that the website is again programmatically accessed. Filter out this anomaly. Let’s look at China now. There is a lot of bytes getting requested from this country. Filter all the document coming from China around the anomaly that we are tracking. We should be able to see that one UA stand out, Sogou Explorer. A quick google search tell us that Sogou Explorer is actually a web browser. That doesn’t seems like a bot but it still look like a suspicious activity based on the number of requests that we are receiving. Create a filter using the agent name and then filter out the IP address that is sending a lot of requests.

  4. Fifth anomaly

    High level of bytes request from an IP address in China!

    After zooming on the anomaly, if you are looking at the most frequent UA you can see that a lot of them actually contain bot or spider. Those are UA that are used by search engines to reference our website. Most of the time, we don’t want to ban such robots because they make our website searchable on the world wide web. But here they are adding a layer of complexity when hunting anomalies. Use the search bar to remove the bots and spiders.

    -agent:*bot* OR -agent:*spider*

    Now that we remove the noise created by the bots we can focus on the anomaly that we are tracking. Since we are tracking an anomaly in the quantity of bytes requested so let’s look at the countries that request a lot of bytes. We can see again that China is requesting a lot. Create a filter to keep only the documents coming from China. We can see again that one IP address is standing out. Filter out this address and the anomaly should be gone.

  5. Sixth anomaly

    Attack from the UK!

    This anomaly is fairly simple to hunt. This is an anomaly in the number of requests sent. If you click on the anomaly and then look at the map that reference the top countries that are sending requests you are going to see that the UK are sending a lot of them. Create a filter by clicking on the UK, then you can see that an IP address is sending more than 2000 request in a small amount of time, moreover you can see that the requests sent by this IP address do not contain a UA. Filter out this IP address.

  6. Seventh anomaly

    Another high level of bytes request from an IP address in China!

    You can solve this issue in a similar way than for the 5th anomaly.

Summary: In this lab you learned how you can use a dashboard to hunt anomaly! Dashboards are really useful when it comes to technical analysis, but some other time you may want to have a dashboard that is more visual, a dashboard you could present to a customer for instance. This is what we are going to see in the next lesson, with Canvas.

End of Lab 14


Lab 15: Visual Builder for Time Series

Objective: In this lab we are going to see how we can use Time Series Visual Builder (TSVB), to create visualization to represent time series. It’s going to cover offset, overriding indices, customizing the visualization, having multiple time series in the same chart, adding annotations.

  1. Let’s create a visualization that is going to display, the number of logs broken down by response

    1. Click on Visualize from the navigation menu on the left side bar and create a new visualization of type Visual Builder. Make sure that the visualization is displaying bars instead of lines Data → Options → Chart Type. Switch to Panel Options, on the right-hand side change the interval from auto to 1d. For Index Pattern, change * into apachelogs* Use the Time Picker to change the time window in order to have all the data contained inside the apachelogs* index.

    2. To subdivide each bar by response codes, switch back to the Metrics tab for the current series. For Group By, change Everything into Terms. For By choose the response field. The resulting chart shows each bar broken up in different shades of green. You could select another color to get shades of that color, for example orange:

    3. Instead of coloring the bars in different shades of the same color, we can also break up the bars in different colors, depending on the response code. Switch back to the Options tab of this series and change the value of Split Color Theme from Gradient into Rainbow. Make sure as well that the bars are stacked.

    4. Save the visualization.

  2. Load two datasets using Kibana. To do so, go into the Kibana home page, then "add sample data". Pick the ecommerce and flight dataset.

  3. Let’s now try to have the ecommerce (using the order_date field) time series start approximately in the same time than the one from the flight dataset, display the count of both series starting around the same time:

    1. Create two series (one for the ecommerce dataset and another one for the flight dataset)

    2. You will need to override an index pattern

    3. Make sure that timefield is correctly defined for your data

    4. You will need to use the offset to make sure that your two series are starting around the same date

  4. In this exercice we want to create a visualization that display two time series that represent the sum of bytes downloaded from Chinese IP address and Russian IP address. Use annotations to display when a documents coming from one of the above country has a 4XX error code. The annotations should look like: COUNTRY has a ERROR_CODE error where COUNTRY and ERROR_CODE will dynamically change based on the fields: geoip.country_name and response.

    • In annotations, add a Data Source

    • Make sure that the Index Pattern * is used

    • Query the field geoip.country_name for all the documents that have a value China or Russia and a error 4XX

    • The fields response, `geoip.country_name`will be used

Optional Exercice: If you want to learn a bit about Timelion, try to do the following exercise.

  1. Timelion rely on functions that you can chain. Every function start by a dot . the dot is then followed by the name of the function. You have functions for styling, retrieving data, defining conditions, etc …​ When starting with Timelion it is best to use the auto-completion feature. Start by typing the dot then select the data source that you want to use. For instance:

    .es(index=apachelogs*)

    This will retrieve all the data from Elasticsearch, in the indices starting by apachelogs*. Pay attention to the different parameters that the functions have, it could help you during the exercice. To chain functions you can add another function after the first one:

    .es(index=apachelogs*).color(blue)
  2. Using Timelion (you can find it in the many different visualization) let’s create a visualization that will compare the average of bytes for logs generated by androids and for logs generated by IOS. But first of all, make sure that the time windows encompass all the data from the apachelogs* index pattern.

    1. Create a first time series that is displaying the data that have the value android on the field useragent.os_name from the index apachelogs*, make sure that you are using the time field @timestamp (use the different parameters of the es function).

      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp)
    2. Change the metric of the previous time series and apply the average of the field bytes (use the different parameters of the es function)

      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes)
    3. Create a second time series that is displaying the average of all the documents that have the value ios on the field useragent.os_name. You can separate two time series by adding a comma between two functions. For instance:

      .es(...), .es(...)
      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes),
      .es(index=apachelogs*, q=useragent.os_name:ios, timefield=@timestamp, metric=avg:bytes)
    4. Change the two time series to display them as bars, the result should not be stacked (chain functions to do so)

      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes).bars(stack=false),
      .es(index=apachelogs*, q=useragent.os_name:ios, timefield=@timestamp, metric=avg:bytes).bars(stack=false)
    5. Define the interval to be 1d. You can do that directly in through the UI.

    6. The time series related to ios should be blue, the time series related to android should be red.

      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(red),
      .es(index=apachelogs*, q=useragent.os_name:ios, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(blue)
    7. Change the label of the time series to be respectively IOS and Android.

      .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(red).label("Android"),
      .es(index=apachelogs*, q=useragent.os_name:ios, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(blue).label("IOS")
    8. Add the following legend to the chart: Average Bytes IOS vs Android

    .es(index=apachelogs*, q=useragent.os_name:android, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(red).label("Android"),
    .es(index=apachelogs*, q=useragent.os_name:ios, timefield=@timestamp, metric=avg:bytes).bars(stack=false).color(blue).label("IOS").title(title="Average Bytes IOS vs Android")
  3. Let’s create another visualization, that this time will compare the ratio of logs coming from germany and the logs coming from france.

    1. Make sure the bucket interval is 1 day.

    2. Create a time series that will display the number of logs that contain fr on the field geoip.country_code2 from the index pattern: apachelogs*, make sure to use the time field @timestamp.

      .es(index=apachelogs*, q=geoip.country_code2:fr, timefield=@timestamp)
    3. Create another time series that will display the number of logs that contain de on the field geoip.country_code2 from the index pattern: apachelogs*, make sure to use the time field @timestamp.

      .es(index=apachelogs*, q=geoip.country_code2:de, timefield=@timestamp)
    4. Divide both time series by another time series that is retrieving all the documents from the index pattern apachelogs*, then multiply the resulting time series by 100.

      (.es(index=apachelogs*, q=geoip.country_code2:fr, timefield=@timestamp), .es(index=apachelogs*, q=geoip.country_code2:de, timefield=@timestamp)).divide(.es(index=apachelogs*, timefield=@timestamp)).multiply(100)
    5. Add a static value of 10 that should be of color green.

      (.es(index=apachelogs*, q=geoip.country_code2:fr, timefield=@timestamp), .es(index=apachelogs*, q=geoip.country_code2:de, timefield=@timestamp)).divide(.es(index=apachelogs*, timefield=@timestamp)).multiply(100),
      .static(10).color(green)

    Summary: Well done on finishing this lab session! You should start understanding and being familiar with TSVB by now. But in the next lab session, we are going to see how we can use TSVB for other kind of visualization, than time series representation.

End of Lab 15


Lab 16: Visual Builder Aggregations

  1. Let’s create a chart that would be hard, if not impossible, to create with a regular visualization: the hourly percentage of traffic coming from the United States.

    1. Under Panel Options set apachelogs* as the index pattern and set 1h as the interval. Make sure the visualization is displaying bars.

    2. Change the Aggregation to Filter Ratio

    3. The numerator should be all the documents that contain US on the field geoip.country_code2

      You should see a nice daily pattern here: when it’s night in the US, people there are asleep, and the percentage of traffic coming from the US is just 20%, but during the day, the percentage can get as high as 90%.

    4. Let’s make the chart a bit smoother by plotting the moving average instead.

    5. Add a metric. For the new metric, use the moving average. And select the metric filter ratio. It should result in a much smoother graph.

    6. Add annotations, that reflect when the field response has a value of 400.

      • In annotations, add a Data Source

      • Make sure that the Index Pattern * is used

      • Query the field response for all the documents that have a value 400

      • The fields response, geoip.city_name, geoip.country_code2 will be used

      • The annotations should display: 400 response in CITY_NAME

  2. In the same chart as the previous exercise, plot a second chart showing the moving average of the ratio of European to all traffic (use the field geoip.continent_code).

  3. Let’s now use the math aggregation. The goal here will be to display a first time series, that represent the percentage of documents that are coming from EU on the positive Y-axis and the percentage of documents that are coming from NA on the negative Y-axis. Add a negative threshold to highlight value that are higher than 70% or lower than than -70%.

End of Lab 16


Lab 17: Visual Builder and Other Visualizations

Objective: In the previous lab you learned how to use Time Series Visual Builder (TSVB) to create chart that represent time series. Now we are going to see how it can be used to represent other kind of visualization like metrics, gauges, table, top N, …​

  1. First of all you need to understand that that when working with TSVB on other visualizations than time series, the data that are going to be displayed will be the one from the last in progress bucket. For instance, in the following scenario the last bucket have zero documents in it, which will be reflected on all the different visualizations that will have zero value in it:

    lab02 no last bucket

    Now if you change the time window, in order to have data in the last bucket of your time window, your other visualizations will be based on the documents inside the last bucket. You can have an idea of what will be the last value of your bucket based on the default value in the legend:

    lab02 with last bucket

    Define your time window in a way to have data in the last bucket.

  2. Let’s start by creating and save a Metric visualization, that will have the following characteristics:

    • It should display two metrics:

      • The primary metric should be the average of the field bytes for all the documents that have the value de on the field geoip.country_code2

      • The secondary metric should be the sum of the field bytes for all the documents in the bucket

    • The metrics should change to green if the metric is between 0 and 300000

    • The metric should change to red if the metric is above 300 000

  3. Let’s now create and save a Top N visualization, that will have the following characteristics:

    • Filter all the documents that have a response outside the range 200-299

    • Display the top 5 of geoip.country_code2 that generate the highest amount of such logs.

    • If the number of logs of one of the country is higher than 30 then the bar should turn red.

    • The background color of the visualization should be black

  4. Let’s work with Table. Try to create a table that will summarize the following information:

    • For the 10 geoip.country_code2 that have the highest number of logs:

      • Display the average of bytes requested from those countries

      • The number of requests sent from those countries

    • The text should be red if the values are:

      • Above 200 000 for the average of bytes

      • Above 1 000 for the number of requests

  5. Markdown is normally displaying a static text, with TSVB it is possible to actually have markdown visualization containing dynamic elements. To do so the mustache syntax can be used. Create and save a visualization that will do the following:

    • Make sure that the bucket interval is set to 1h

    • The visualization should display the following:

      In the last hour NUMBER_OF_LOGS has be received.
      In average, per request, we served AVG_NUMBER_OF_BYTES
      For a total of SUM_OF_BYTES bytes

Summary: In this lab session we saw how TSVB can be used to represent more than time series. In the next lesson, we are going to dive into how Timelion can be used to analyze and easily apply time series related operation on your data.

End of Lab 17


Lab 18: Advanced Settings

Objective: In this lab you will see how Kibana can customize the search experience. You will also create and use scripted field.

  1. Login to your Kibana instance as the user training with password kibana_management. We will be adding some of the sample data Kibana provides for us in the next step (if you are already logged in you can skip this step).

  2. On Kibana’s main menu (which you can get to if you click on the Kibana logo on the left side panel) look for Add sample data: Load a data set and a Kibana dashboard. Load the Sample eCommerce orders dataset.

  3. Using the Discovery* interface (click on Discover on the left side panel), explore the documents in the eCommerce dataset. Pay attention to the data formats of the different fields, especially the field taxless_total_price.

  4. Scripted fields are fields that are computed at query time rather than during indexing time. Let’s create a scripted field that will compute the price including taxes at the French VAT tax rate, which is 20%.

    1. Go to Management and then to Index Patterns. Select the index pattern: kibana_sample_data_ecommerce. Go to the tab Scripted fields and then Add scripted field.

    2. Create a new field called price_with_vat that applies a 20% tax rate to the field taxless_total_price. The syntax in Painless to retrieve the value of a field is: doc[name_of_the_field].value.

      (doc['taxless_total_price'].value * 20)/100 + doc['taxless_total_price'].value
    3. Preview results to make sure the fields are being computed as expected. (Click on Get help with the syntax and preview the results of your script and click on the "Preview results tab.)

    4. Go back to the Discovery interface and search for the newly created field price_with_vat. You should be able to see the field in your documents.

    5. Create a Metric visualization to compute the average of price_with_vat.

    6. The visualization displays three decimals places, which is not useful for monetary values. Also, it would be nice to display the value with a dollar symbol before the numerical value. Make sure that the number is displayed with two decimal places and preceded by a $ symbol in the Metric visualization. Go back to the scripted field definition to edit the format pattern of the number.

      $0,0.[00]
    7. Say we wanted to display the number using French currency. Go to Management and then Advanced Settings. Change the Format locale setting (which sets the parameter format:number:defaultLocale) to French (fr). Check the Metric visualization again. Notice that the dollar symbol has transformed to .

  5. Let’s add another level of complexity to the script. If a document’s geoip.country_iso_code has the value US then the sales tax should be 8.5%, FR should be 20%, and otherwise the VAT should be 15%. To write a conditional statement in Painless, use the following syntax:

    if (CONDITION_1) {
    	return VALUE_1;
    } if (CONDITION_2) {
    	return VALUE_2;
    } else {
    	return VALUE_0;
    }
    if (doc['geoip.country_iso_code'].value == "FR") {
    	return (doc['taxless_total_price'].value * 20)/100 + doc['taxless_total_price'].value;
    } if (doc['geoip.country_iso_code'].value == "US") {
    	return (doc['taxless_total_price'].value * 10.5)/100 + doc['taxless_total_price'].value;
    } else {
    	return (doc['taxless_total_price'].value * 15)/100 + doc['taxless_total_price'].value;
    }
  6. Kibana’s Time Picker is a handy tool to define the time range of the documents you are analyzing.

    1. Go to Management and then Advanced Settings.

    2. Find Time picker quick ranges. Replace the long default array with an array containing three new Quick time ranges. Define a Quick time range for the last week, the last year, and the current year. Use section to make sure that the 3 quick time windows defined are displayed in the same column.

      [
        {
          "from": "now/y",
          "to": "now",
          "display": "Current year",
          "section": 0
        },
        {
          "from": "now-7d",
          "to": "now",
          "display": "Last 7 days",
          "section": 0
        },
        {
          "from": "now-1y",
          "to": "now",
          "display": "Last 1 year",
          "section": 0
        }
      ]
  7. Searching using leading wildcards can be very expensive. To avoid such expensive queries in the search bar, you can disallow it. Let’s disallow leading wildcards in basic Lucene queries. Find Allow leading wildcards in query and then add allow_leading_wildcard: false in the Query string options (which uses the parameter query:queryString:options).

Summary: In this lab, you created a fairly simple scripted field. Knowing how to use Painless will be an asset if there is a need to create more complex scripted fields. You also learned how to localize Kibana by changing, for example, the way numbers are formatted. And finally you learned how to customize Kibana by defining, for example, your own custom list of Quick time ranges in the Time Picker.

End of Lab 18


Lab 19: Reporting and Saved Objects

Objective: In this lab we will see how Reports and CSVs can be generated using Kibana. Then we will dive into Saved Objects, how they can be imported and exported, and what precautions should be taken when working with Saved Objects.

  1. First we will generate a PDF of the eCommerce Dashboard. Select the [eCommerce] Revenue Dashboard from Dashboard on the left side panel.

    1. Generate a PDF of the eCommerce Dashboard by going to Share then PDF Reports. Make sure that the generated PDF is optimized for printing.

    2. Retrieve and download this report from the Reporting section of the Management interface. It may take a few moments for the report to complete.

    3. Now that we have generated our first PDF let’s start customizing the PDF. Go to Management then Advanced Settings. Find PDF footer image (which uses the parameter xpackReporting:customPdfLogo). Add the colorful elastic logo (png format) which you can find here. Generate a new report and make sure that the new logo appears in the footer of the PDF.

  2. It can become cumbersome to generate a report every week, or every day. It would be nice to automate this action. There is a nice tool called Watcher, which is available with a Gold or Platinum Elastic license. Watcher will enable you to create cron-like tasks that run periodically and execute actions, like generating a PDF. Let’s generate a PDF using Watcher.

    Before we go to Watcher, we need to retrieve something first. When you generated a PDF earlier, there was a another button called Copy POST URL under the Generate PDF button. Click on it now.

    Go to Management then Watcher. Click on Create advanced watch. Give it any name and id. Add the following into Watch JSON but replace POST_URL with the POST URL that you copied to your clipboard earlier.

    {
      "trigger": {
        "schedule": {
          "interval": "5m"
        }
      },
      "input": {
        "none": {}
      },
      "condition": {
        "always": {}
      },
      "actions": {
        "my_webhook": {
          "throttle_period_in_millis": 300000,
          "webhook": {
            "url": "POST_URL",
            "method": "post",
            "headers": {
              "kbn-xsrf": "reporting"
            },
            "auth": {
              "basic": {
                "username": "training",
                "password": "kibana_management"
              }
            }
          }
        }
      }
    }

    Save the watch. Now every 5 minutes a new report will be generated.

  3. Next let’s try to generate a CSV from our documents.

    1. Go to the discovery interface and write a query that will select all the documents that have Men’s Shoes in the category field.

    2. Save the search and name it category_search.

    3. Generate a CSV out of this saved search. Go to Share then CSV Reports.

    4. Make sure that the CSV has been generated properly. Note that it is possible to change the separator for the CSV in Advanced Settings.

  4. Next let’s export and import visualizations using Kibana.

    1. Go to Management then Saved Objects.

    2. Search for the saved object category_search. (Don’t click on it yet!) Look at its relationships. There are two icons under the Action column. Click on the one on the right. Does category_search have any relationships with other saved objects?

    No, category_search does not have any relationships with other saved objects. But you can see that it is tied to the index pattern kibana_sample_data_ecommerce.

    1. Edit the eCommerce dashboard and Add category_search to it.

    2. Next, create any visualization in Visualize using category_search and save it.

    3. Go to the eCommerce dashboard again and add the new visualization.

    4. Go back to Saved Objects and check the relationships for category_search. It should now display a warning that the visualization we just created would not work properly if this saved search were to be deleted. Let’s do it anyway. Delete the saved object category_search.

    5. Try to open the visualization that you created in Visualize. You should get an error.

    6. Go back to the dashboard, you should see an error where you had placed the visualization that used category_search. Always be aware of dependencies when working with the Saved Objects!

  5. Now let’s learn about exporting your work in Kibana.

    1. Go to Saved Objects in Management. Export all the saved objects.

    2. Wait until the export completes. Then delete all of the saved objects.

    3. Go to Visualize. Do you see any visualizations? Any dashboards?

    Everything has been deleted! There aren’t even any index patterns remaining, so Kibana should be asking you to create one.

    1. Now let’s import back all the saved objects we exported earlier. It should all be contained in a file called export.json.

    2. Go back to Visualize and you should be able to see all the visualizations again.

Summary: In this lab you learned how to generate reports, automate report generation, and make sure that you don’t break anything when working with Saved Objects.

End of Lab 19


Lab 20: Security and Spaces

Objective: This lab will about Security and Spaces – You are going to create two spaces, one for the flight dataset, and the other one for the ecommerce dataset. Three users are going to be created. Two users should have access to their respective space only but the third user will need to have access to both, but with restrictions on what they can do in the spaces.

  1. First we can add some more data to create more interesting permissions. Load the dataset called Sample flight data. This dataset will load with several saved objects.

  2. Let’s start with a simple task. Create two spaces. The first space will be called Flight analyst, and the second space will be called Ecommerce analyst. Spend some time to customize the space.

  3. Now create one role called flight_analyst. The role must have the following privileges:

    1. It needs to have read permission on the kibana_sample_data_flight index

    2. It needs to have read and write access on the Flight Analyst space

  4. Create one role called ecommerce_analyst. The role must have the following privileges:

    1. It needs to have read permission on the kibana_sample_data_ecommerce index

    2. It needs to have read and write access on the Ecommerce Analyst

  5. Create one more role manager. The role must have the following privileges:

    1. It needs to have read permission on the kibana_sample_data_flight and kibana_sample_data_flight

    2. It needs to have read access on the Flight Analyst and Ecommerce Analyst space.

  6. Create a user called Marie:

    1. Marie should have the flight_analyst role

    2. Marie’s username is "marie"

    3. Marie’s password is Xfm:X ;AYOZ

    4. Marie’s email is marie.de.broglie@gmail.com

    5. Marie’s full name is Marie De Broglie

  7. Create a user called Hans:

    1. Hans should have the ecommerce_analyst role

    2. Hans' username is "hans"

    3. Hans' password is aPfel+Pomme=aPPle

    4. Hans' email is hans.turing@gmail.com

    5. Hans' full name is Hans Turing

  8. Create a user called Nathan:

    1. Nathan should have the manager role

    2. Nathan’s username is "nathan"

    3. Nathan’s password is pAULbOCUSE

    4. Nathan’s email is nathan.buczak@gmail.com

    5. Nathan’s full name is Nathan Buczak

  9. Now you need to start migrating the visualizations between spaces. Export the flight visualizations from the default space to the Flight Analyst space.

  10. Do the same thing with the Ecommerce visualizations but move them to the Ecommerce space.

  11. Now login as a different user and make sure that you have access to the right resources:

    1. Login as Marie. You should only see the Flight space and only see the visualization related to the flight data. You should be able to create visualization in this space as well.

    2. Login as Hans. You should only see the Ecommerce space and only see the visualizations related to the Ecommerce data. You should be able to create visualizations in this space as well.

    3. Login as Nathan. You should only see all the spaces. You should not be able to create visualizations.

  12. If you don’t want Nathan to see so many things in the Kibana interface, you can change the user Nathan and add the role kibana_dashboard_only_user. Login as Nathan again and you should only see the the dashboard tab. Be careful, to determine a user privilege, it is the union of all his roles kibana_dashboard_only_user has permission over all spaces. If you really want to secure your visualizations, you would need to recreate a kibana_dashboard_only_user with less space privileges.

Summary: In this lab, we saw that there is a lot of flexibility when creating users and roles. By combining security ands space you will be able to make sure users don’t see visualization they are not supposed to see.

End of Lab 20